Announcement

Collapse
No announcement yet.

Intel Posts Linux Patches For Linear Address Space Separation (LASS)

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • kylew77
    replied
    Originally posted by waxhead View Post

    In practice yes, but the proper answer is as always: it depends... And it's complicated... Very complicated...

    Thanks, I failed my first class in computer engineering school on architecture and went into IT instead so to me yeah the basics I understand but not very well. I understand fetch decode execute and if you have more registers you can execute more stuff before having to fetch from memory which is orders of magnitude slower than registers.

    Leave a comment:


  • waxhead
    replied
    Originally posted by kylew77 View Post

    Thanks so much, so it was the register growth that led to speed improvements not the doubling of memory address space.
    In practice yes, but the proper answer is as always: it depends... And it's complicated... Very complicated...

    Leave a comment:


  • kylew77
    replied
    Originally posted by waxhead View Post

    Increasing the bit depth does not usually give any speed benefits. The speed increase on from 32bit to 64bit on x86 is mostly due to the number of registers available in the CPU. E.g. the 64bit architecture has more general purpose registers.

    More registers in a CPU is like having more hands if you are going to do some work. With only two hands you need to put away stuff while you process something else. Same with a CPU , the more register it has the less need it is to push and pull from the stack. Of course the program needs to be compiled to support the extra registers.

    Also that programs use more memory on x64 is usually not right either. A program that is written for 32bit and is recompiled to 64bit will (usually) only use a marginally larger amount of space for pointers , not necessarily it's data.

    So 32bit to 64bit does not necessarily mean double the memory usage...
    And 32 to 64bit does not necessarily mean faster execution, but usually does due to the extra registers.
    Likewise - 64bit to 128bits mostly depends on the architecture the CPU will have.

    x86 has 8 general purpose registers
    x86-64 (amd64) has 15 general purpose registers...
    arm has 15
    arm64 has 31

    (Interestingly multi-threading is on the hardware level switching to another set of registers which is part of the trick.)

    Now both 15 and 31 registers are probably enough for most pieces of code, but it certainly is interesting how important these are. Why did CPU manufacturers build CPUS with such a low number of registers you might ask... the answer is simple. Money and bandwidth; fast memory cost money , and pushing/pulling data to the stack was costly as well. The more registers, the longer it tool to push/pull data on the slower bandwidth bus (cache(s) and main memory).
    Thanks so much, so it was the register growth that led to speed improvements not the doubling of memory address space.

    Leave a comment:


  • waxhead
    replied
    Originally posted by kylew77 View Post

    Noob question here, would going to 128bit computing give us the same speed benefits but memory growth that 32bit to 64bit brought?
    Increasing the bit depth does not usually give any speed benefits. The speed increase on from 32bit to 64bit on x86 is mostly due to the number of registers available in the CPU. E.g. the 64bit architecture has more general purpose registers.

    More registers in a CPU is like having more hands if you are going to do some work. With only two hands you need to put away stuff while you process something else. Same with a CPU , the more register it has the less need it is to push and pull from the stack. Of course the program needs to be compiled to support the extra registers.

    Also that programs use more memory on x64 is usually not right either. A program that is written for 32bit and is recompiled to 64bit will (usually) only use a marginally larger amount of space for pointers , not necessarily it's data.

    So 32bit to 64bit does not necessarily mean double the memory usage...
    And 32 to 64bit does not necessarily mean faster execution, but usually does due to the extra registers.
    Likewise - 64bit to 128bits mostly depends on the architecture the CPU will have.

    x86 has 8 general purpose registers
    x86-64 (amd64) has 15 general purpose registers...
    arm has 15
    arm64 has 31

    (Interestingly multi-threading is on the hardware level switching to another set of registers which is part of the trick.)

    Now both 15 and 31 registers are probably enough for most pieces of code, but it certainly is interesting how important these are. Why did CPU manufacturers build CPUS with such a low number of registers you might ask... the answer is simple. Money and bandwidth; fast memory cost money , and pushing/pulling data to the stack was costly as well. The more registers, the longer it tool to push/pull data on the slower bandwidth bus (cache(s) and main memory).

    Leave a comment:


  • pthariensflame
    replied
    Originally posted by brucethemoose View Post

    Neat! Is it standard in ARMv9?

    I keep finding all sorts of very useful things that are now being standardized in v9. It seems like one heck of a step, with perhaps the biggest ommision being matrix stuff that is getting added in later.
    Indeed it is! FEAT_E0PD (like x86 LASS but more flexible) is actually standard and required as of v8.5, which is of course a required subset of v9.0. The original FEAT_PAN (the direct counterpart to x86 SMAP) has been required since v8.1, and FEAT_PAN2 (integrating PAN with address translation system instructions) since v8.2. The only part not already required in v9.0 (but permitted optionally) is FEAT_PAN3 (control over whether PAN affects instruction fetches too), and that's required in v8.7, which is a required subset of v9.2—which is actually the same version where SME (that "matrix stuff") becomes available!

    EDIT:
    v9.2 (mostly via v8.7) is an awesome "release" because of how much other stuff is also in there, albeit some of them optional. Not only do you have SME including streaming SVE, you've also got:
    • (optional) FEAT_LS64, FEAT_LS64_V, and FEAT_LS64_ACCDATA: the building blocks for CXL and communication with direct accelerators of the type Intel is so proud of having in Sapphire Rapids
    • (required) FEAT_XS: more control over the scope of memory barrier and TLB maintenance instructions, with respect to device memory that might take a while to communicate with—super useful for, say, CXL.mem, or certain PCIe devices
    • (required) FEAT_WFxT: enabling efficient timeouts on wait-for-interrupt and wait-for-event instructions
    • (optional) FEAT_EBF16: allowing more control and flexibility in BFloat16 calculations
    • (required) FEAT_AFP and FEAT_RPRES: more efficient emulation of x86 floating point and vector operations for both software porting and direct translation a la Rosetta 2
    Last edited by pthariensflame; 14 January 2023, 10:08 PM.

    Leave a comment:


  • tildearrow
    replied
    Originally posted by phoronix View Post
    Phoronix: Intel Posts Linux Patches For Linear Address Space Separation (LASS)

    An interesting patch series posted by Intel this week for the Linux kernel is working on implementing Linear Address Space Separation (LASS) as a feature coming with future processors to help fend off speculative address accesses across..

    https://www.phoronix.com/news/Linear...ace-Separation
    It appears that an accidental new line cut out the short description of

    Leave a comment:


  • brucethemoose
    replied
    Originally posted by pthariensflame View Post
    Possibly worth noting that 64-bit Arm processors already have this feature as of at least a year ago in silicon and multiple in documentation. They call it EL0 Permission Deny (FEAT_E0PD), building on Privileged Access Never (FEAT_PAN, FEAT_PAN2, FEAT_PAN3), and it’s actually more flexible than LASS and SMAP, allowing you to selectively lock off only userspace from kernelspace, or only vice versa, or both, and allowing you to select whether instruction fetches are included in the ban or not, and decide which side of the address space is which freely.
    Neat! Is it standard in ARMv9?

    I keep finding all sorts of very useful things that are now being standardized in v9. It seems like one heck of a step, with perhaps the biggest ommision being matrix stuff that is getting added in later.

    Leave a comment:


  • uxmkt
    replied
    Originally posted by Spacefish View Post
    Physical addresses aren´t 1:1 mapped to address lines on hardware.. There aren´t even 64 individual address lines to the memory / from the CPU
    There should be 64 lines on sparc64, and there may be other 64-bit platforms of the 90s that did. Yeah, not quite relevant for a change to x86_64, but you spoke generically, so this is the reponse.

    Leave a comment:


  • erniv2
    replied
    Originally posted by kylew77 View Post

    Noob question here, would going to 128bit computing give us the same speed benefits but memory growth that 32bit to 64bit brought?
    The increased bitmap only increases the theoretical ram limit not the speed that is still determined by the real physical stuff and the ordering.

    32bit was limited to 4095mb but you cant use it all you need a pci address space wich limited the real usable ram with a 32 bit system to 3500mb, then you needed a kernel space where the mm operates wich limited the actual process size for userspace apps to 2048mb, a 64bit system can adress 16exabite, there is no speed benifit but alot more address space.

    And 128bit mm code would mean.

    "340,282,366,920,938,463,463,374,607,431,768,211,4 5 6 (corrected)

    vs 64 bits:

    18,446,744,073,709,551,616

    So, yeah, that would be like killing a fly with a tactical nuke—overkill doesn’t even begin to describe it!" copy pasted it ~~

    The speed comes from the ordering of the chips on your ram and the width the cpu can access, tricks are allready used nowdays like dual chanel ddr4 or quadchanel 64x2, but it has nothing to do with the internal OS order of things, hardware -> software, so no there will be no 128bit cpu soon, and no 128bt adress space OS it´s not needed and it does not provide speed benefits.​

    We all know that 64bit actually cause more ram useage, wich is handled by more caches and transistors thrown that way.

    Leave a comment:


  • pthariensflame
    replied
    Possibly worth noting that 64-bit Arm processors already have this feature as of at least a year ago in silicon and multiple in documentation. They call it EL0 Permission Deny (FEAT_E0PD), building on Privileged Access Never (FEAT_PAN, FEAT_PAN2, FEAT_PAN3), and it’s actually more flexible than LASS and SMAP, allowing you to selectively lock off only userspace from kernelspace, or only vice versa, or both, and allowing you to select whether instruction fetches are included in the ban or not, and decide which side of the address space is which freely.

    Leave a comment:

Working...
X