No announcement yet.

Design of a SPECTRE-Resistant High-Performance CPU

  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Well, anyway, the CPU should be able to turn off the entire speculation-resolving mechanism. For best security, it should be able to turn it off separately for each speculation-sensitive unit and separately for each speculation-resolving unit, on all available CPU models. This way the CPU can be easily probed for unknown variations of SPECTRE.


    • #12
      Another note: the actual difference between the "speculative cache buffer" and the simpler speculation-resolving unit is that the latter doesn't handle ejected entries and reordering. So, simpler speculation-resolving is only partially-resolving. In other words, it cannot undo completely, but hopefully, it can undo sufficiently well.


      • #13
        This "enhanced coloring scheme" doesn't stop SPECTRE v1, apparently.

        Whatever difference there is between SE and non-SE behavior of a CPU, the attacker can exploit it, no matter how complex that might be. Whether it is ejected entries or different counting, it doesn't matter, its exploitable.

        So there are only two options remaining for each predictor:
        1) use one "speculative cache buffer" as an interface for each predictor (lots of additional transistors, but not impossible)
        2) there is some special property of a particular predictor that allows a simpler speculation-resolving scheme to be employed


        • #14
          Explanation of SPECTRE - Overview (E0)

          The design of a SPECTRE-resistant CPU cannot be understood without a good understanding of SPECTRE. In my view, SPECTRE is a simple problem which is easy to understand when it is properly explained. The few following post attempt to explain SPECTRE in the simplest possible way.

          The explanation proceeds as follows:
          • E1: Highlight related and trivial issues

            A small number of trivial issues and unrelated problems might be complicating the understanding of SPECTRE. Those problems must be highlighted in order to stop digressing on the main issue.
          • E2: What causes SPECTRE

            Provides explanation of the origin of SPECTRE with examples. Argues that the SPECTRE is a single problem that manifest itself in several ways.
          • E3: Definition of SPECTRE

            Attempts to clearly define what SPECTRE is. A clear understanding of SPECTRE automatically suggests the proper solution.
          The post below contains my initial thoughts on how to define SPECTRE. Unfortunately, there are a few mistakes that I made, so a rewritten version of the chapter "Definition of Spectre (E3)" will be posted later as soon as I have the time to write it.
          Last edited by xfcemint; 26 September 2021, 05:13 PM.


          • #15
            E3: Definition of SPECTRE

            --- The following terms must be defined and understood:

            accepted execution path: an execution path that a CPU has taken in the past. At any particular moment in time, there is only a single accepted execution path. The initial point of the accepted execution path is the first intruction that the CPU has executed on the computer startup. The endpoint of the accepted execution path at a particular moment in time is the last instruction that the CPU has executed.

            speculated execution path: an execution path that a CPU might take in the future time, in the sense that instructions from this execution path might be executed and accepted by the CPU after a particular moment in time. There may be multiple different speculated execution paths at any particular moment in time.

            branching point: a vertice on the execution path tree; a point where two execution paths diverge.

            rejected execution path: an execution path that CPU was considering in the past, but at the initial branching point of the rejected execution path the CPU took a different execution path. Initial branching point of a rejected execution path is always on the accepted execution path.

            measurable point in time: a point in time that can be given by an external reference timer that always ticks at an average rate of 1000 CPU clock cycles and a random jitter of +-10 clock cycles on each tick.

            external behavior in time: contents of CPU memory at any measurable moment in time. It is assumed that the CPU is connected only to memory (of a fixed capacity), to an external reference timer and to a keyboard.

            A SPECTRE-proof CPU is a CPU whose external behavior in time can be implemented by an in-order-nonspeculative CPU connected to the memory of the same capacity, same external reference timer and the same keyboard.

            A SPECTRE-vulnerable CPU is a CPU whose external behavior in time cannot be implemeted by an in-order-nonspeculative CPU connected to the memory of the same capacity, same external reference timer and the same keyboard.

            SPECTRE definition 1: A microarchitectural CPU design error that converts a SPECTRE-proof CPU design into a SPECTRE-vulnerable CPU design.

            system state (in the model system including a timer and a keyboard): at a particular moment in time, corresponding to a particular "well-defined" point on the execution path tree (i.e. CPU clock tick, i.e. synchronization primitive for the hardware of entire model system):
            contents of memory attached to the CPU -and- any data stored in the CPU that can be transferred to the memory by any instruction sequence -and- location of accepted execution path endpoint.

            !!!!!!!!!!!! ########
            I'm trying to figure out why a SPECTRE-vulnerable CPU cannot be implemented by an in-order-nonspeculative CPU. Apparently, this might be because a SPECTRE-vulnerable CPU cannot have a meaningful "system state" corresponding to a particular moment in time and a "well defined" endpoint on the execution path tree. (That's my best current idea to explain this property of being "SPECTRE-vulnerable")

            !!!!!!!!!!!! ########
            I think I just got it (not sure, I'm just wildly gussing): it cannot have a meaningful "system state" because "accepted execution path endpoint" is just a single point (i.e. an end of a path). On the other hand, the definition of a "system state" of a "spectre-vulnerable CPU" requires multiple execution endpoints.

            SPECTRE definition 2: A microarchitectural feature of a CPU design with speculative execution that makes the CPU derive system state at an endpoint of accepted execution path from data derived from a system state at a rejected execution path.
            Last edited by xfcemint; 22 September 2021, 07:22 PM.


            • #16
              SPECTRE definition 1 seems relatively easy to understand: when a CPU is affected by SPECTRE, the system state of the affected sytsem will not be the same as on a similar system that uses an in-order-nonspeculative CPU. In other words, SPECTRE causes an unexpected system state; or: the spectre causes an unexpected difference beteween the expected and observed system state.

              It might not be immediatelly clear why the SPECTRE definition 2 is valid because it includes some unstated assumptions. The main assumption is that there are only two types of sensible CPUs: speculative and nonspeculative. Or, that other types of CPUs are not interesting (like, random error CPUs), or that other types of CPUs are not important; or that SPECTRE was discovered as an observable difference between speculative and nonspeculative CPUs.

              The point of SPECTRE definition 2 is to verify whether SPECTRE could actually be caused by data leaks from rejected execution paths. This same question can be stated like this: is there any difference between the two given definitions of SPECTRE?

              In a (flawless) in-order-nonspeculative CPU the data cannot leak from a rejected execution path because:

              (1a) the system state in a system with an in-order-nonspeculative CPU can depend only on instructions on the current and future accepted execution path. They are deterministic from:
              - initial memory contents,
              - the instructions on the current and future accepted execution path
              - the timer and keyboard events
              - the speed of execution
              - the execution semantics (in this model system the execution semantics are assumed to be deterministic, no random number generators)

              However, a SPECTRE-vulnerable system has only some very specific additional information available compared to the nonspeculative system: the architecture of a spectre-vulnereable CPU and the instructions from rejected execution paths.

              Since there is an observed difference in states between the nonspeculative and speculative systems, the only possible conclusion is that those differences are based on additional information that is available to the nonspeculative system.

              (1b) By intention of design, the speculative CPU should be implementing the same execution semantics as a nonspeculative CPU.

              (1c) Therefore, the only possible data source for SPECTRE are the instructions on rejected execution paths.

              But, if (1c) is true and (1a) is true, then it immediately follows that 1b) is false, more precisely: the speculative CPU is not implementing the same execution semantics in the described model system.

              Therefore, there are two incompatible types of CPUs:
              - SPECTRE-proof: same as normal in-order-nonspeculative execution
              - SPECTRE-vulnerable: a flawed CPU that takes into account instructions from rejected execution paths.

              The necessary conclusion is: any SPECTRE-vulnerable CPU leaks data derived both from rejected execution paths and from its architectural design.

              What I'm actually trying to say is that execution semantics should always be understood as nonspeculative and in-order, because any other interpretations necessarily produce contradictions. I'm not sure whether I successfully proved this conjecture.

              Of course, there exists a speculative CPU that is SPECTRE-proof. That's easy to imagine. This thread is about designing a high speed speculative CPU that is apparently SPECTRE-proof.
              Last edited by xfcemint; 21 September 2021, 09:18 PM.


              • #17
                Related and Trivial Issues (E1) - Explanation of SPECTRE

                in-time data: data derivable from: instructions on accepted execution path until current endpoint, results of I/O instructions, and from initial memory state.

                ahead-of-time data: data derivable from instructions on speculated execution paths and from in-time data, excluding data derivable only from in-time data.

                Clean SE design

                One issue that appears when analyzing SPECTRE is: what particular CPU is being discussed? Different CPU models are going to be affected by different versions of SPECTRE. Interestingly, SPECTRE v1 affects all current CPUs that feature speculative execution and a cache.

                My essential observations are based on an abstract CPU that is designed using some set of principles that include "clean SE design", or "clean SE". A clean SE design attempts not to store ahead-of-time data in a unit that cannot easily reverse its state to in-time data only. A clean SE is a principle behind the design rule D10 of a SPECTRE-resistant CPU, while the rule D9 is superfluous in clean SE.

                In clean SE, the only places where ahead-of-time data can be stored are those that can be easily tracked by the OoO/SE engine. Unfortunately, making the OoO/SE engine track all of the ahead-of-time data appears very complicated (but not impossible); no OoO/SE design has ever attempted this, to my knowledge.

                Speculation-Resolving Unit

                Which ahead-of-time data cannot be easily tracked by an OoO/SE engine? It's the memory loads and stores. Therefore, a clean SE can leak ahead-of-time data only at a single place: at an interface between the load/store unit and caches/memory/TLBs. In unclean SE, SPECTRE can leak data through multiple interfaces (e.g. from interface of a predictor). Today's common CPU designs are all unclean SE.

                To stop the SPECTRE in a clean SE design, a unit that stops the propagation of all the ahead-of-time data must be added at the load/store to memory interface. This unit can be called "speculation cache buffer". What it does is simple to understand: it buffers all the store instructions containing ahead-of-time data, waits for the CPU to decide on a branch direction, and then forwards to caches and memory all the store instructions from the execution path that was just accepted. The stores have to be forwarded to memory/cache in a proper order.

                When it receives an ahead-of-time load instruction, it immediately executes a speculative load from the speculation-aware L1 cache. At the same time, it must search its own buffered data for matching store instructions with appropriate temporal tag (corresponding to the time before load instruction time). This is essentially the same as memory address disambiguation, with temporal aspect taken into account. If a result is found, it overrides the result from L1 cache. Therefore, latency increase is very small for most load instructions.

                The OoO/SE engine must be careful to first send to the load/store unit all the store instructions preceding a load instruction in the instruction stream. There are also many other details which I am going to skip.

                This "speculation cache buffer" turns ahead-of-time data from load/store interface into in-time data on the cache/memory interface. Such units that convert ahead-of-time data to in-time data can be called "speculation resolving"
                Last edited by xfcemint; 21 September 2021, 05:33 AM.


                • #18
                  Unclean SE design

                  Today's most common CPUs feature unclean SE. In such designs, ahead-of-time data is stored in many units that cannot easily reverse their state to in-time data only. The examples include at least these two predictors: return stack buffer and branch target buffer. The branch predictor for common conditional jumps is usually clean from ahead-of-time data.

                  The return stack buffer and branch target buffer do not really need to store any ahead-of-time data, and neither does any other predictor. Instead, the OoO/SE engine itself may be used as a speculation-resolving unit for those buffers/predictors; unfortunately, this might require substantial modifications to the current OoO/SE engines. A simpler solution at the current time might be to use a "speculation cache buffer" as a speculation-resolving unit for those buffers. Unfortunately, one "speculation cache buffer" is required for each predictor, and each additional unit requires a lot of transistors, which results in increasing power consumption of the CPU.

                  Unclean SE design may feature many more unclean units, as many current designs do. Whether a unit is clean or unclean was not very important before the discovery of SPECTRE. Each unclean unit requires its own speculation-resolving unit to stop the ahead-of-time data from being stored in the unclean unit, or a redisign of the said unit and modifications to the OoO/SE engine.

                  Other Security Vulnerabilities

                  A CPU design may feature a security weakness which cannot be effectively exploited without speculative execution. As speculative execution increases the possible ways for data to leak from a CPU, it can make a previously unexploitable weakness to become exploitable.

                  The examples of such security weaknesses can be found in design rules D7 and D8; those two have nothing to do with speculative execution; the same weakness appears in common nonspeculative designs of the past.

                  Security problems might also be caused by errors in implementation of OoO/SE unrelated to SPECTRE, but SPECTRE allows them to effectively leak. Such errors should be corrected by usual engineering methods other than the ones described in this thread.
                  Last edited by xfcemint; 22 September 2021, 10:20 PM.


                  • #19
                    What causes SPECTRE (E2) - Explanation of SPECTRE

                    According to the SPECTRE definition 1 and SPECTRE definition 2, the origin of SPECTRE is any possible leak of the ahead-of-time data to the accepted execution path. Source of ahead-of-time data are always instructions from the code that will become a part of a rejected execution path.

                    Of course, it doesn't matter much whether the code from rejected execution paths is leaking, the problem is that the said code gets executed by the speculative execution engine. Maybe the results of such execution can simply be ignored?

                    The first problem is that each and every security check is a branch on the execution path. Speculative execution allows the CPU to defer a decision on a branch direction and to speculatively execute instructions past the security check. Therefore, speculative execution can easily access data that is supposed to be inaccesible; this inaccessible data has became a part of ahead-of-time data.

                    Can this be stopped in this manner: whenever a security check is encountered, the speculative execution is stopped?

                    Unfortunately, too many common instructions must perform a security check; for example, each memory load and each memory store instruction are usually disallowed from accessing memory areas assigned to other programs and to the OS. If a store instruction was allowed to write anywhere in the memory, then it can intentionally overwrite and expand its own privileges. If a load is allowed to read from anywhere, then it can read passwords or credit card number used by another running program.

                    Therefore, security checks are common; stopping speculative execution is infeasible. This ensures that a high-performance CPU must be able to speculatively execute instructions past security checks.

                    As a consequence, ahead-of-time data can easily reveal security-sensitive data. Ahead-of-time data is supposed to never be accessible to the executing program. To stop the running program from accessing ahead-of-time data, the CPU has to be able to discern which data is ahead-of-time and which data is in-time.

                    Tracking ahead-of-time data

                    The next possible solution would be to attempt to track all ahead-of-time data and its spread through the CPU core. The first observation is that if this data doesn't ever get stored anywhere, it will automatically be lost. Ahead-of-time data can be stored either inside the CPU, or to the memory.

                    The second case seems easy enough: if any ahead-of time data appears on the CPU-to-memory interface, the CPU stops and waits for the branch direction at the endpoint of the accepted execution path to be decided. But, how to track the spread of the ahead-of time data inside the CPU?

                    This task is partially performed by the OoO/SE unit. It can track the results of speculatively executed instructions and note all the places where results are stored. To stop the propagation of ahead-of-time data, the OoO/SE unit simply clears all the noted places.

                    SPECTRE can appear when ahead-of time data is stored inside a CPU at a place that is apparently inaccessible to any CPU instructions. Hopefully, this data just gets overwritten in the future, so let's assume that there is no need to track it. However, if a CPU stored the data somewhere, anywhere, can this affect the execution in the future? If the buffer where the data is stored doesn't affect the execution at all, than what is the purpose of such a buffer? Why does it even exist? Of course, such purposeless buffers seldom exist, so practically any stored data affects the future execution in some way. If the execution is affected, the most common difference can usually be discerned in execution speed, therefore, a difference in speed can reveal what data is stored in the inaccessible buffer. Since our model CPU is connected to a timer, a difference in speed can be determined by measuring the elapsed time.

                    Therefore, any ahead-of time data stored in a high-performance spectre-proof CPU must be tracked (except for the data stored in units with no purpose).

                    The purpose of tracking is to be able to discard or ignore ahead-of-time data when desired. When ahead-of time data is discarded or ignored, the CPU reverts to the behavior based on in-time data.

                    SPECTRE definition 3: A microarchitectural feature of a CPU design with speculative execution that allows the system state to be derived from ahead-of-time data.

                    (to be continued)
                    Last edited by xfcemint; 22 September 2021, 10:28 PM.


                    • #20
                      Clarification and Corrections for "Related and Trivial Issues" (E1) - Explanation of SPECTRE

                      After reading the section "Tracking ahead-of-time data" from the previous post, the dangers of storing ahead-of-time data anywhere inside a SPECTRE-resistant CPU should become obvious. The goal of previously described "clean SE design" is to store ahead-of-time data into as few places as possible. Therefore, "clean SE design" avoids storing ahead-of-time data into caches, predictors, and into any other units besides the main OoO/SE engine and one "speculation cache buffer" that acts as an interface to caches and memory.

                      "How can ahead-of-time data leak from a predictor that isn't directly connected to the memory, cache or I/O", one may ask. Answer is simple: It can leak if ahead-of-time data gets embedded into other data, not tracked by the CPU. This embedding gets performed at the interface between a predictor and the OoO/SE engine in this way: when OoO/SE engine does not track ahead-of-time data stored in the predictor, then there is a semantic mismatch between the ahead-of-time data stored in the predictor and the OoO/SE engine. It happens because, without tracking, the OoO/SE engine requires in-time-data. That's why inserting a speculation-resolving unit at the said interface prevents SPECTRE from appearing, since a speculation-resolving unit can by definition resolve a mismatch between ahead-of-time data in the predictor and in-time semantics of the interface to the OoO/SE engine.

                      A speculation-sensitive unit is any unit inside a CPU that stores ahead-of-time data. All speculation-sensitive units require data tracking, performed either by the core OoO/SE engine or by another speculation-resolving unit. Without data-tracking, a semantic mismatch occurs at the interface of the speculation-sensitive unit; this mismatch allows ahead-of-time data to leak.

                      Therefore, the source of leaked ahead-of-time data is ahead-of-time data stored in a predictor (or some other speculation-sensitive unit). This data leaks trough the predictor's interface, no matter how far this interface is from the I/O units. The semantic mismatch of the interface itself is the reason why data is able to leak, so the interface is the "security hole", not the predictor/speculation-sensitive unit. It can be said that the origin of the security leak is the erroneous interface, in other words, that the interface itself causes the leak, or that ahead-of-time data leaks from the interface.

                      Why does a clean SE design still require a speculation-resolving unit at the interface between load/store unit and the cache/memory? Actually, this unit is not strictly necessary if the OoO/SE engine does not modify any data stored in the caches/memory before speculation is decided (the moment when branch direction at the relevant branching point is decided). In the previous sentence, the meaning of phrase "any data" is literal: "any data" includes any stored data, including stored metadata. To satisfy this requirement, the OoO/SE engine must not perform any speculative cache/memory modifying operation, but cache loads modify the metadata stored in the cahe. If all these requirements are to be satisfied, the OoO/SE engine would have to wait too often for speculation to be decided, which would severely impact the CPU performance.

                      To allow for faster operation, a speculation-resolving "speculation cache buffer" can be added as an interface to cache/memory. This enables the OoO/SE engine to perform speculative loads/stores as long as the "speculation cache buffer" has sufficient internal memory and speed to track all performed speculative loads/stores until a point in time when OoO/SE engine is able to decide the speculation.

                      All current CPU designs feature unclean SE, meaning that ahead-of-time data is stored in many units inside a CPU. What assurances do CPU design companies offer about stopping the propagation of ahead-of-time data? They offer no assurances at the moment. Even worse, they openly claim that all their current and future designs will remain vulnerable to SPECTRE v1, which is an open admission that ahead-of-time data is leaking from the CPU.


                      About a dozen of the following posts contain my initial thoughts on how to define "speculative execution". Unfortunately, at one point I made an error. I have figured out how to solve this error, and I plan to write the explanation as a rewritten chapter "Definition of SPECTRE (E3)", which will be posted below.


                      To do:
                      - write and post rewritten chapter "Definition of SPECTRE (E3)"
                      - why ad-hoc tracking is likely to fail
                      - why it is desirable to use general methods of stopping the propagation of ahead-of time data
                      - are such general methods fast enough, and are they feasible in terms of power and complexity
                      Last edited by xfcemint; 26 September 2021, 04:53 PM.