Announcement

Collapse
No announcement yet.

Intel AMX Support Appears Ready For Linux 5.16

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Intel AMX Support Appears Ready For Linux 5.16

    Phoronix: Intel AMX Support Appears Ready For Linux 5.16

    It's been over one year since Intel disclosed Advanced Matrix Extensions and began posting patches for bringing up AMX support under Linux in anticipation of Xeon Scalable "Sapphire Rapids" processors. While the compiler-side work to GCC and LLVM/Clang has been landing, finally with the forthcoming Linux 5.16 cycle that AMX support appears ready for landing...

    https://www.phoronix.com/scan.php?pa...For-Linux-5.16

  • #2
    I know that Intel developed Advanced Matrix Extensions for AI work loads, but I wonder if it will also be useful for other scientific/engineering applications such as CFD. Would code need to be re-written to take advantage of it, or would it be done by compiler switches?

    Comment


    • #3
      Unlike AVX-512 and earlier, user-space applications actually need to request the support from the kernel to be able to use Advanced Matrix Extensions functionality.
      Can someone explain this to me. AVX-512, just like all SIMD instruction sets, including SSE/SSE2/SSSE3/AVX/AVX2/3DNOW!/AltiVec, are used via compiler intrinsics or hand-crafted assembler and the kernel, be it the Windows, Linux, Unix, MacOS kernel, have to support it.

      How is this any different from AMX?

      Comment


      • #4
        Originally posted by sophisticles View Post
        Can someone explain this to me. AVX-512, just like all SIMD instruction sets, including SSE/SSE2/SSSE3/AVX/AVX2/3DNOW!/AltiVec, are used via compiler intrinsics or hand-crafted assembler and the kernel, be it the Windows, Linux, Unix, MacOS kernel, have to support it.

        How is this any different from AMX?
        AFAIK the difference is not "how you use it" but "what you need to do before using it".

        With AVX et al you need to make sure HW support is present before using it, but with AMX you also have to make an OS call to say "I'm going to be using <feature> so enable some additional state save/restore functionality" or "I want to use <feature> so make sure nobody else is already using it" (I forget which). Something like that anyways.
        Test signature

        Comment


        • #5
          Thanks Bridgman. With AVX-512 there's also a (relatively) minor matter of memory alignment, i.e. that the target vectors/arrays must start on 32-byte boundaries. This isn't a huge deal, but does impact one's source code. I worked on a robotics project a few years ago where our workspace evaluation was a limiting step and our Principal Engineer asked me to see if AVX-512 might help. After some discussion (I didn't think the project was ready yet for such optimization) I agreed, and saw to it.

          We were using Eigen3 C++ template libraries at the time for our linear algebra, and all the computation was done on 4x4 rotation/translation matrices. So we both recognized the chances for substantial speedup were not large, and I was happy to eek out five or ten percent.

          The Eigen3 documentation clearly explains what needs to be done, and it isn't hard. But it wasn't worth it. That portion of the code was still under active development by at least one other engineer, who couldn't efficiently implement her own work without unduly stumbling over the new alignment issues. Speed is speed, and the speed of developing good robust working code took priority over a few percent runtime improvement. So we quickly backed out the AVX-512.

          Doesn't mean AVX-512 isn't worthwhile for larger matrices in more stable code. Michael has shown benchmark examples where it very clearly is. Perhaps future compilers can alleviate this, but three years ago at least, AVX-512 was a little bit more than just a new compiler switch and you're done.

          Comment


          • #6
            Originally posted by trueblue View Post
            I know that Intel developed Advanced Matrix Extensions for AI work loads, but I wonder if it will also be useful for other scientific/engineering applications such as CFD. Would code need to be re-written to take advantage of it, or would it be done by compiler switches?
            Intel AMX is a for low-precision computing bf16/int8, while the scientific/engineering applications need at least fp64/int32, even some of them need int64 or int128. Recently IBM added HW fp128 I think.

            Comment


            • #7
              Intel is only implementing AMX on Xeon, it would be interesting if AMD implemented it on consumer processors like Ryzen for laptop and desktop.

              Comment


              • #8
                Originally posted by uid313 View Post
                Intel is only implementing AMX on Xeon, it would be interesting if AMD implemented it on consumer processors like Ryzen for laptop and desktop.
                While Intel and AMD have a cross licensing agreement that allows one company to use technology the other company developed, there is a waiting period before either company can implement said competitor's technology.

                Look at AVX-512, Intel released it on Xeons and HEDT CPUs years ago, Intel now has it on laptop and desktop CPUs and AMD still hasn't released a single CPU with said instruction set. Hell, look how long it took for AMD to release a CPU with AVX2 and the first ones weren't that good anyway.

                Intel will have a laptop CPU with AMX long before AMD even thinks about implementing it in its high end offerings.

                Comment


                • #9
                  Originally posted by sophisticles View Post

                  While Intel and AMD have a cross licensing agreement that allows one company to use technology the other company developed, there is a waiting period before either company can implement said competitor's technology.
                  I don't believe there are any such restrictions. If you have a source, I'd love to see it.

                  The waiting period is just due to the fact that the design of a CPU takes place 2 years before it's released as a product available to buy, so once Intel announces a new instruction it's likely going to be at least 2 years you'll see it in an AMD product even if they pick it up right away. And in the case of AVX512, there were reasons (cost, yields, and power usage) to consider before AMD really wanted to even commit to including it in their products. Sometimes you don't want to add a new technology the moment you can, and it makes sense to wait for it to mature a bit first. Especially when you're an underdog with limited financial resources.

                  Comment


                  • #10
                    Originally posted by trueblue View Post
                    I know that Intel developed Advanced Matrix Extensions for AI work loads, but I wonder if it will also be useful for other scientific/engineering applications such as CFD.
                    No. The functionality being included in the upcoming Sapphire Rapids CPU supports operations only on 8-bit and BFloat16 data. Most scientific, engineering, and financial applications require double-precision arithmetic. At this precision it's not even capable of half-decent audio processing!

                    https://fuse.wikichip.org/news/3600/...pphire-rapids/

                    Comment

                    Working...
                    X