Announcement

Collapse
No announcement yet.

Intel Sends Out 11th Revision Of Linux Kernel Patches For AMX

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Intel Sends Out 11th Revision Of Linux Kernel Patches For AMX

    Phoronix: Intel Sends Out 11th Revision Of Linux Kernel Patches For AMX

    While Intel Xeon "Sapphire Rapids" processors with Advanced Matrix Extensions are set for a Q2'22 ramp in production, one of the key new features that has yet to be properly plumbed in the mainline Linux kernel is for supporting AMX...

    https://www.phoronix.com/scan.php?pa...nux-Kernel-v11

  • #2
    Why would we want the Linux kernel, or presumably the Windows kernel, to be the gatekeeper for this?

    Wouldn't it be simpler to just have the capability indicated in a syscall and the code can then execute amx instructions or not?

    I've never coded for AVX2/AVX512 or the like. Are they controlled in this manner also?

    Comment


    • #3
      Originally posted by hoohoo View Post
      Why would we want the Linux kernel, or presumably the Windows kernel, to be the gatekeeper for this?

      Wouldn't it be simpler to just have the capability indicated in a syscall and the code can then execute amx instructions or not?

      I've never coded for AVX2/AVX512 or the like. Are they controlled in this manner also?
      AMX registers are so huge that they require special handling by the kernel on task switches. An optimization had to be made in order not to waste memory and performance for vast majority of processes that don't use AMX. This is described in detail in the patch message.
      AVX2/512 are not like this, but the message states that this mechanism will be used by future instruction additions as well.

      Comment


      • #4
        Originally posted by hoohoo View Post
        Why would we want the Linux kernel, or presumably the Windows kernel, to be the gatekeeper for this?

        Wouldn't it be simpler to just have the capability indicated in a syscall and the code can then execute amx instructions or not?

        I've never coded for AVX2/AVX512 or the like. Are they controlled in this manner also?
        The kernel needs to know about extensions, because it needs to know how much memory to put aside per task when swapping registers in and out of a CPU during task switches. Up through AVX2/AVX512, this would just involve saving every register, every time, because there wasn't an easy way of saying whether or not a given task was using a particular set of registers. Intel is wanting to change that with this new instruction set extension because the number of added registers is so large, and the added registers won't be used by many processes. Having to save 8-64k per task of register space would be a lot to ask for an OS kernel, which is why Intel's making using the new registers a two step process. We didn't need it before, because even AVX512 only had 2k of register space, which could be handled just fine with the stack, etc, space that already needed to be saved per task switch. Were matrix math used more often in normal applications, I imagine it would be handled through the normal methods, but it's a niche enough feature that Intel doesn't want the kernel to have to do it for every single task.

        Comment


        • #5
          Originally posted by numacross View Post

          AMX registers are so huge that they require special handling by the kernel on task switches. An optimization had to be made in order not to waste memory and performance for vast majority of processes that don't use AMX. This is described in detail in the patch message.
          AVX2/512 are not like this, but the message states that this mechanism will be used by future instruction additions as well.
          Thanks very much, numacross!

          Comment


          • #6
            Originally posted by KesZerda View Post

            The kernel needs to know about extensions, because it needs to know how much memory to put aside per task when swapping registers in and out of a CPU during task switches. Up through AVX2/AVX512, this would just involve saving every register, every time, because there wasn't an easy way of saying whether or not a given task was using a particular set of registers. Intel is wanting to change that with this new instruction set extension because the number of added registers is so large, and the added registers won't be used by many processes. Having to save 8-64k per task of register space would be a lot to ask for an OS kernel, which is why Intel's making using the new registers a two step process. We didn't need it before, because even AVX512 only had 2k of register space, which could be handled just fine with the stack, etc, space that already needed to be saved per task switch. Were matrix math used more often in normal applications, I imagine it would be handled through the normal methods, but it's a niche enough feature that Intel doesn't want the kernel to have to do it for every single task.
            Thanks very much, KesZerda!

            Comment

            Working...
            X