Announcement

Collapse
No announcement yet.

Intel Posts Linux Patches For Linear Address Space Separation (LASS)

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by brucethemoose View Post

    Neat! Is it standard in ARMv9?

    I keep finding all sorts of very useful things that are now being standardized in v9. It seems like one heck of a step, with perhaps the biggest ommision being matrix stuff that is getting added in later.
    Indeed it is! FEAT_E0PD (like x86 LASS but more flexible) is actually standard and required as of v8.5, which is of course a required subset of v9.0. The original FEAT_PAN (the direct counterpart to x86 SMAP) has been required since v8.1, and FEAT_PAN2 (integrating PAN with address translation system instructions) since v8.2. The only part not already required in v9.0 (but permitted optionally) is FEAT_PAN3 (control over whether PAN affects instruction fetches too), and that's required in v8.7, which is a required subset of v9.2—which is actually the same version where SME (that "matrix stuff") becomes available!

    EDIT:
    v9.2 (mostly via v8.7) is an awesome "release" because of how much other stuff is also in there, albeit some of them optional. Not only do you have SME including streaming SVE, you've also got:
    • (optional) FEAT_LS64, FEAT_LS64_V, and FEAT_LS64_ACCDATA: the building blocks for CXL and communication with direct accelerators of the type Intel is so proud of having in Sapphire Rapids
    • (required) FEAT_XS: more control over the scope of memory barrier and TLB maintenance instructions, with respect to device memory that might take a while to communicate with—super useful for, say, CXL.mem, or certain PCIe devices
    • (required) FEAT_WFxT: enabling efficient timeouts on wait-for-interrupt and wait-for-event instructions
    • (optional) FEAT_EBF16: allowing more control and flexibility in BFloat16 calculations
    • (required) FEAT_AFP and FEAT_RPRES: more efficient emulation of x86 floating point and vector operations for both software porting and direct translation a la Rosetta 2
    Last edited by pthariensflame; 14 January 2023, 10:08 PM.

    Comment


    • #12
      Originally posted by kylew77 View Post

      Noob question here, would going to 128bit computing give us the same speed benefits but memory growth that 32bit to 64bit brought?
      Increasing the bit depth does not usually give any speed benefits. The speed increase on from 32bit to 64bit on x86 is mostly due to the number of registers available in the CPU. E.g. the 64bit architecture has more general purpose registers.

      More registers in a CPU is like having more hands if you are going to do some work. With only two hands you need to put away stuff while you process something else. Same with a CPU , the more register it has the less need it is to push and pull from the stack. Of course the program needs to be compiled to support the extra registers.

      Also that programs use more memory on x64 is usually not right either. A program that is written for 32bit and is recompiled to 64bit will (usually) only use a marginally larger amount of space for pointers , not necessarily it's data.

      So 32bit to 64bit does not necessarily mean double the memory usage...
      And 32 to 64bit does not necessarily mean faster execution, but usually does due to the extra registers.
      Likewise - 64bit to 128bits mostly depends on the architecture the CPU will have.

      x86 has 8 general purpose registers
      x86-64 (amd64) has 15 general purpose registers...
      arm has 15
      arm64 has 31

      (Interestingly multi-threading is on the hardware level switching to another set of registers which is part of the trick.)

      Now both 15 and 31 registers are probably enough for most pieces of code, but it certainly is interesting how important these are. Why did CPU manufacturers build CPUS with such a low number of registers you might ask... the answer is simple. Money and bandwidth; fast memory cost money , and pushing/pulling data to the stack was costly as well. The more registers, the longer it tool to push/pull data on the slower bandwidth bus (cache(s) and main memory).

      http://www.dirtcellar.net

      Comment


      • #13
        Originally posted by waxhead View Post

        Increasing the bit depth does not usually give any speed benefits. The speed increase on from 32bit to 64bit on x86 is mostly due to the number of registers available in the CPU. E.g. the 64bit architecture has more general purpose registers.

        More registers in a CPU is like having more hands if you are going to do some work. With only two hands you need to put away stuff while you process something else. Same with a CPU , the more register it has the less need it is to push and pull from the stack. Of course the program needs to be compiled to support the extra registers.

        Also that programs use more memory on x64 is usually not right either. A program that is written for 32bit and is recompiled to 64bit will (usually) only use a marginally larger amount of space for pointers , not necessarily it's data.

        So 32bit to 64bit does not necessarily mean double the memory usage...
        And 32 to 64bit does not necessarily mean faster execution, but usually does due to the extra registers.
        Likewise - 64bit to 128bits mostly depends on the architecture the CPU will have.

        x86 has 8 general purpose registers
        x86-64 (amd64) has 15 general purpose registers...
        arm has 15
        arm64 has 31

        (Interestingly multi-threading is on the hardware level switching to another set of registers which is part of the trick.)

        Now both 15 and 31 registers are probably enough for most pieces of code, but it certainly is interesting how important these are. Why did CPU manufacturers build CPUS with such a low number of registers you might ask... the answer is simple. Money and bandwidth; fast memory cost money , and pushing/pulling data to the stack was costly as well. The more registers, the longer it tool to push/pull data on the slower bandwidth bus (cache(s) and main memory).
        Thanks so much, so it was the register growth that led to speed improvements not the doubling of memory address space.

        Comment


        • #14
          Originally posted by kylew77 View Post

          Thanks so much, so it was the register growth that led to speed improvements not the doubling of memory address space.
          In practice yes, but the proper answer is as always: it depends... And it's complicated... Very complicated...


          http://www.dirtcellar.net

          Comment


          • #15
            Originally posted by waxhead View Post

            In practice yes, but the proper answer is as always: it depends... And it's complicated... Very complicated...

            Thanks, I failed my first class in computer engineering school on architecture and went into IT instead so to me yeah the basics I understand but not very well. I understand fetch decode execute and if you have more registers you can execute more stuff before having to fetch from memory which is orders of magnitude slower than registers.

            Comment

            Working...
            X