Announcement

Collapse
No announcement yet.

Intel AMX Programming Model Lands In LLVM Compiler

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Intel AMX Programming Model Lands In LLVM Compiler

    Phoronix: Intel AMX Programming Model Lands In LLVM Compiler

    One of the big features to look forward to with Intel's Xeon "Sapphire Rapids" is the introduction of AMX as the Advanced Matrix Extensions. While Sapphire Rapids looks to be at least one year out still, the company's open-source compiler engineers have already been hard at work on the software infrastructure support...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Typos:

    Originally posted by phoronix View Post
    Originally posted by phoronix View Post
    and support for morphing AMX psuedo instructions into real AMX instructions.

    Comment


    • #3
      Cool. So Intel has finally caught up to last year's Apple A13 SoC much less this year's A14X derived M1. And by caught up I mean...next year...on a 2,000 dollar server CPU...that probably will be delayed until early 2022. Even generic ARM has this in their extension of NEON called SVE which goes out to 512 bit without recompiling.

      And by 2022 Apple will probably be on nm with their A16X derived 32-128 core M3 or 4 on desktop with 512 bit AMX vector extensions.

      Good job Intel.

      Comment


      • #4
        Oh...BTW... here are some of the AMX extensions on Apple Silicon's A13...once again..that came out last year.

        Apple Math/Matrix (?) Extensions


        By: Maynard Handley ([email protected]), September 7, 2019 3:30 pmRoom: Moderated Discussions

        anon ([email protected]) on September 7, 2019 11:01 am wrote:
        > @never_released ([email protected]) on September 7, 2019 5:38 am wrote:
        > > Hello,
        > >
        > > In the Apple Lightning CPU that Apple is releasing in a few days, there's
        > > a quite odd SIMD/DSP extension which is nowhere near standard.
        > >
        > > The list of instructions for the extension:
        > >
        > > AMXFMA32
        > > AMXFMS32
        > > AMXFMA64
        > > AMXFMS64
        > > AMXFMA16
        > > AMXMAC16
        > > AMXFMS16
        > > AMXLDZI
        > > AMXSTZI
        > > AMXVECFP
        > > AMXMATFP
        > > AMXCLR
        > > AMXSET
        > > AMXVECINT
        > > AMXMATINT
        > > AMXGENLUT
        > > AMXLDX
        > > AMXEXTRX
        > > AMXSTX
        > > AMXLDY
        > > AMXEXTRY
        > > AMXSTY
        > > AMXLDZ
        > > AMXSTZ


        Comment


        • #5
          Originally posted by Jumbotron View Post
          Cool. So Intel has finally caught up to last year's Apple A13 SoC
          Really? Thanks for posting that instruction list, but it's not clear what those actually do. I clicked though a couple dozen messages in that thead and did a few quick web searchs, but found no more details. Please post 'em if you got 'em.

          From one of Michael's previous articles about Intel's AMX, it's very narrowly focused on accelerating convolutional neural networks, at least in this iteration. And to that end, it's much more about data shuffling and reorganization than raw computation.

          Originally posted by Jumbotron View Post
          Even generic ARM has this in their extension of NEON called SVE which goes out to 512 bit without recompiling.
          SVE has separate instructions, which were added in ARMv8.2-A. These are complementary to NEON and allow for vectors of up to 2048-bits:

          https://en.wikipedia.org/wiki/AArch6...Extension_(SVE)

          The part about not recompiling is that an implementation need not implement them as 2048-bit, natively. It can use a much narrower vector pipeline and split the operations into multiple passes (kind of like how AMD's GCN implements 2048-bit vectors with 512-bit pipelines). However, they presumably must all have 2048-bit registers, unless it's memory-to-memory.

          It's not necessarily great for deep learning, due to the fact that convolutional neural networks rely on smallish 2D, non-separable convolutions. Implementing that involves a lot of data shuffling, which Intel's AMX does in hardware.

          Originally posted by Jumbotron View Post
          And by 2022 Apple will probably be on nm with their A16X derived 32-128 core M3 or 4 on desktop with 512 bit AMX vector extensions.
          Intel's AMX registers are 8192 bits, each.

          Originally posted by Jumbotron View Post
          Good job Intel.
          Thanks!

          (just kidding, I have nothing to do with them.)

          Comment

          Working...
          X