Announcement

Collapse
No announcement yet.

Design of a SPECTRE-Resistant High-Performance CPU

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Design of a SPECTRE-Resistant High-Performance CPU

    The purpose of this forum thread is to demonstrate the feasibility of producing a SPECTRE-resistant CPU.

    Numerous posts on this forum indicate that even the advanced computer users are generally misinformed on this topic. The prevalent opinion is that such a CPU design cannot be made.

    This misconception apparently stems from intentional refusal of current major CPU-design companies (Intel, Apple, IBM, AMD and ARM) to produce such a design. The reason for their refusal is most likely of economic nature.

    The current plan for this thread is:
    Last edited by xfcemint; 10 September 2021, 02:44 PM.

  • #2
    Notes





    If you have any questions, or if you are interested in this topic, be free to post in this thread.

    - 2021/09/19 : bullet D8 added in "More details on design of SPECTRE-resistant CPU" to clarify an aspect of thread coloring relevant to the OS exploits.
    - 2021/09/20 : bullet D9 added in "More details on design of SPECTRE-resistant CPU" to prevent variants of SPECTRE v1 from storing protected data in a CPU unit other than caches.
    - 2021/09/20 : bullet D10 added in "More details on design of SPECTRE-resistant CPU" to clarify bullet D9.
    Last edited by xfcemint; 20 September 2021, 05:37 AM.

    Comment


    • #3
      FAQ (Frequently Asked Questions)


      1) === Q: How fast would this SPECTRE-resistant CPU be?

      A: To estimate the performance of this CPU compared to the current designs is not easy. If all other CPU properties are unchanged (die size, technology, power budget, consumer pricing) then my best estimate it that the performance loss would be between 4-8%.

      2) === Q: Is this SPECTRE-resistant CPU design also resistant to Meltdown-type exploits?

      A: Yes.

      3) === Q: Is this CPU design also resistant to SPECTRE v1 and v4? Those two variants can also defeat purely software-based privilege boundaries.

      A: Yes, the proposed CPU design is completely resistant to all versions of SPECTRE, and completely annuls all consequences of SPECTRE.

      4) === Q: Why are CPU producers currently not making such secure CPUs?

      A: The primary reason seems to be the 4-8% loss of performance. In the current market conditions, a 6% faster CPU sells at 20% higher price. Their estimate is that most consumers would not buy a CPU with performance degradation of 6%. In other words, their estimate is that security is worth less than 20% of the CPU price.

      5) === Q: Is this decision of CPU producers to not produce SPECTRE-resistant CPU ethical?

      A: No, their decisions and actions so far have not been ethical.

      6) === Q: How can you tell that CPU makers do not intend to fix security issues caused by SPECTRE in the near future?

      A: All the biggest CPU makers on their informational web pages openly claim that SPECTRE v1 and SPECTRE v4 are a software issue and that the solution to those two problems should be only software-based in the future.
      Last edited by xfcemint; 10 September 2021, 12:25 AM.

      Comment


      • #4
        Details



        This thread currently contains the following chapters:

        1. More details on design of SPECTRE-resistant CPU (immediately below)

        - Contains 10 CPU design guidelines marked D1 - D10.

        2. Design of a speculation-aware cache (can be skipped, not essential for understanding of the problem)

        - A dozen posts below that contains some of my thoughts on new variations of SPECTRE. Those post can be skipped if they are not interesting or if they are too complex to read.

        3. Explanation of SPECTRE
        • E3 - Draft of "Definition of SPECTRE" (can be skipped)
        • E1 - Trivial issues and unrelated problems
        • E2 - Where SPECTRE comes from?
        • E1/E2 - Clarifications of "Trivial issues and unrelated problems (E1)"
        • - followed by some of my initial thoughts on definition of "speculative execution" (can be skipped)
        • E3 - Definition of SPECTRE (rewriten)


        Original discussion was started in this forum. Be advised that the discussion is quite harsh:
        https://www.phoronix.com/forums/foru...47#post1276347

        I'll post more details in the following days. Currently I have time for only a few posts per day.
        Last edited by xfcemint; 26 September 2021, 05:49 PM.

        Comment


        • #5
          More details on design of SPECTRE-resistant CPU



          For better understanding of this post, read:

          Explanation of SPECTRE - Overview:
          https://www.phoronix.com/forums/foru...66#post1280166

          A high-performance design which is "very secure":
          • D1) uses OoO (out-of-order execution)
            -
          • D2) uses SE (speculative execution)
            -
          • D3) the CPU must be able to thoroughly undo all the effects of speculative execution; this is a major requirement which is not satisfied by any of the current high-performance CPUs
            -
          • D4) has speculation-aware caches (at least L1 cache, possibly L2 cache)
            Implementation details here: Speculation-Aware Cache Design
            -
          • D5) has a speculation-resolving unit which can be called "speculative cache buffer"; this unit sits between the CPU core load/store unit and the L1 cache
            -
          • D6) no effects of speculative execution are allowed to spill to memory and I/O; the function of "speculative cache buffer" is to ensure this; some effects of SE are allowed to be visible to the speculation aware caches, but not to ordinary caches
            -
          • D7) all the sensitive buffers and predictors in the CPU core must implement coloring; the purpose of coloring is to stop the different threads from influencing one another; the threads that execute on a particular core must have one of several different colors assigned to them by the OS
            -
          • D8) whenever a thread enters the privileged mode by calling the OS API, or whenever a thread runs in a privileged mode, the OS assigns a color BLACK to the thread untill it exits the privileged mode. When a thread is not running in a privileged mode, it cannot have a BLACK color. This ensures that CPU predictors and buffers cannot be influenced by unprivileged thread to modify speculative execution in privileged mode, or to leak microarchitectural data from the privileged mode.
            -
          • D9) each speculation-sensitive unit must use a speculation-resolving unit as the interface. In most cases, this requires each speculation-sensitive unit to have its own speculation-resolving unit. A general implementation of a speculation-resolving unit is the "speculation cache buffer". Speculation-sensitive unit is any CPU unit that accepts ahead-of-time modifications, cannot completely undo speculative execution, while there is a possibility that it can leak data. For additional performance, a speculation-sensitive unit should be converted into a speculation-aware unit, and this is in most cases required for adequate performance.
            -
          • D10) an easier approach is to attempt to redesign a speculation-sensitive unit in such a way that it accepts updates in-time instead of ahead-of-time, to remove speculation-sensitivity.
          Last edited by xfcemint; 20 September 2021, 10:24 PM.

          Comment


          • #6
            Design of a Speculation-Aware Cache



            The amount of modifications needed to caches is lower then expected. This is because most of the required additional functionality belongs to the "speculation cache buffer". The caches do need to provide additional bandwidth so that additional operations can be performed on time. This extra bandwidth requires the cache area to increase by 5-20%.

            The speculative caches remain mostly identical to the current design. The only change needed is that the cache has to support two additional operations:
            • CSR) speculative read from cache to ALU
              (more precisely: issued by the load/store unit, but comming through "speculation cache buffer")
              -
            • CSU) update after speculative read
              (more precisely: issued by the load/store unit, but comming through "speculation cache buffer")
            The operation CSR is performed when the CPU wants to speculatively read data from memory. No "normal" cache read is to be performed due to a speculative memory read request. Instead, the speculative cache read must be performed. The basic property of a speculative cache read is that it must not modify the cache.

            The operation CSU is performed when the CPU closes a portion of speculation window (for example, when a branch is resolved) by accepting speculative results (i.e. on the accepted execution path). All the CSR cache reads that were previously performed in the portion of the closing speculation window need to be followed by a CSU operation on the cache. The cache performs a CSU operation by updating the last-access metadata of a particular cache line (in the case of a multi-way associative cache, as is common on modern CPUs).

            If the CPU closes a portion of speculation window belonging to the rejected execution path, then no CSU operations are to be performed. So, in this case, CSR is not followed by CSU.

            There are other ways to solve this speculative-cache problem, but this one is probably the best given the current state of affairs.
            Last edited by xfcemint; 13 September 2021, 09:10 PM.

            Comment


            • #7
              Question!

              Originally posted by xfcemint View Post
              A: The primary reason seems to be the 4-8% loss of performance. In the current market conditions, a 6% faster CPU sells at 20% higher price. Their estimate is that most consumers would not buy a CPU with performance degradation of 6%. In other words, their estimate is that security is worth less than 20% of the CPU price.
              That's in the current market conditions, but what about the future? We are moving into 7nm and 5nm which should allow for lower prices (assuming enough factories)...

              Comment


              • #8
                Originally posted by tildearrow View Post
                Question!
                That's in the current market conditions, but what about the future? We are moving into 7nm and 5nm which should allow for lower prices (assuming enough factories)...
                Um, as far as I know, 7nm and 5nm are quite expensive. You have to constantly maintain the delicate EUV equipent, and that is expensive. 7nm and 5nm and less could lower prices, but only by decreasing the die sizes significantly (like: smaller caches, or at least unchanged cache size compared to current-gen). Most manufacturers seems to be chasing the performance crown (while the performance increases are the smallest in the history of computing) so that they can raise the CPU price by 20% .

                I don't know why would the market conditions change for 7nm and 5nm processes. The issue is that same model CPUs in the last 10 years are all having similar clock rate (for example, the spread is from 4GHz to 5GHz, because of some property of current technology). So if that wont change, then the 4.7 GHz vs 4.3 GHz must carry a significant price premium).
                Last edited by xfcemint; 11 September 2021, 04:15 AM.

                Comment


                • #9
                  I just thought of another variation of SPECTRE v1.

                  What if a program uses SPECTRE v1 (for example, in javascript) to read some privileged data, but in a speculation window it doesn't save privileged data to the cache. Instead, it saves the privileged data to some CPU buffer/predictor. Later, it attempts to read this data by timing.

                  The coloring is unable to stop this attack since the modified entries in the destination buffer/predictor have the same color as the attacker's thread. In this case, let's call the destination buffer/predictor "speculation-sensitive".

                  Apparently, the solution is to make all the speculation-sensitive CPU units to be speculation aware. This can be done by using one color, let's say WHITE, to mark all the entries of the speculation-sensitive unit that have been modified before the speculation window is closed (more precisely: modified until the relevant checkpoint in the speculation window is reached). In a moment when a relevant part of the speculation window is closed, the CPU changes the color of all the WHITE entries in all speculation-sensitive units. The color change must be from WHITE to the (attackers) thread color on the accepted execution path. On a rejected execution path, the color change is from WHITE to CLEANED.

                  EDIT: but, this doesn't work because there can be multiple checkpoints in the speculation window, so a single WHITE color is insufficient. To solve this problem completely, every speculation-sensitive unit must have its own "speculative cache buffer".

                  Another way is to add some simpler speculation-resolving mechanism and use more bits for colors, let's say about 6-8 bits.

                  EDIT-2: perhaps a solution like this: there is just a single other speculation-resolving mechanism in the entire CPU core (besides the "speculative cache buffer"). It assigns one variation of a WHITE color to each checkpoint in a tree-like sliding speculation window. When a checkpoint is reached, it issues a color change from relevant variation(s) of WHITE color to either CLEANED or to the (attacker's) thread color, to all the speculation-sensitive units. All the speculation-sensitive units must use consistent coloring scheme.
                  Last edited by xfcemint; 19 September 2021, 06:24 PM.

                  Comment


                  • #10
                    ...
                    And, to make this really efficient and fast, the variations of WHITE color must have a tree-like structure. So, about 8 bits for color tags is a likely requirement.

                    EDIT: and perhaps, this mechanism can be turned off when a thread is in a privileged mode to gain some extra speed. Or, maybe this mechanism actually makes execution faster, since it allows more accurate predictions.
                    Last edited by xfcemint; 19 September 2021, 06:32 PM.

                    Comment

                    Working...
                    X