Announcement

Collapse
No announcement yet.

Intel 5th Gen Xeon "Emerald Rapids" AVX-512 Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Intel 5th Gen Xeon "Emerald Rapids" AVX-512 Performance

    Phoronix: Intel 5th Gen Xeon "Emerald Rapids" AVX-512 Performance

    With Intel's 5th Gen Xeon Scalable "Emerald Rapids" processors that were released last month, in addition to the power efficiency improvements, faster DDR5 memory support, and other enhancements, one of the other notable enhancements talked up by Intel was improved AVX-512 support. Here are some benchmarks using the flagship Intel Xeon Platinum 8592+ looking at the performance and thermal/clock/power metrics when toggling AVX-512 support.

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Impressive.

    With AVX-512 these Xeons were in some cases 10x faster than without and were able to use less power and stay cooler.

    Just amazing the improvement Intel has made to AVX-512 since it was first introduced.

    If course when you have all the money in the world to hire the best engineers I guess this is to be expected.

    Comment


    • #3
      Originally posted by sophisticles View Post
      With AVX-512 these Xeons were in some cases 10x faster than without and were able to use less power and stay cooler.
      Only in OpenVINO, which is an Intel-developed deep learning framework. What quite likely happened there is that AMX support was somehow tied to AVX-512.

      It pays to have some understanding of what you're actually measuring. If straight AVX-512 were producing order of magnitude gains over AVX2, that would be a huge red flag, for most people.

      Of course, since we know you have an estranged relationship with humility, I'm not surprised you thought it normal that Intel could achieve such a seemingly miraculous accomplishment.

      Originally posted by sophisticles View Post
      Just amazing the improvement Intel has made to AVX-512 since it was first introduced.

      If course when you have all the money in the world to hire the best engineers I guess this is to be expected.
      Well, here's AVX-512 on/off on AMD's Genoa:

      Here's the same test case, on Emerald Rapids, from this article:


      Note the difference in scales. Not only does EPYC score more absolute performance, but also gets a bigger boost from AVX-512.

      Granted, it's not the norm that the EPYC 9654 outperforms the Xeon 8592+ on AVX-512, but I think it tells us this isn't only a story about Intel.
      Last edited by coder; 05 January 2024, 12:27 PM.

      Comment


      • #4
        Keep in mind the largest improvements have nothing to do with increasing the avx registers to 512 bit, but are due to specific instructions not present in avx2 at all.

        Comment


        • #5
          Originally posted by ddriver View Post
          Keep in mind the largest improvements have nothing to do with increasing the avx registers to 512 bit,
          Yeah, the largest increases are quite likely due to AMX.

          Originally posted by ddriver View Post
          but are due to specific instructions not present in avx2 at all.
          They also doubled the number of ISA-visible vector registers, from 16 to 32, which reduces spilling or enables greater loop unrolling.

          Regarding the claim that 512-bit processing isn't, itself, increasing throughput, it does sound to me as though that's not entirely true. This makes it sound like half of port 5's capacity is wasted, if the operands are only 256-bit.

          "One 512-bit FMA unit is created by fusing two 256-bit ones on port 0 and port 1. The other is added to port 5, as a server-specific core extension. The FMA units on port 0 and 1 are configured into 2×256-bit or 1×512-bit mode depending on whether 512-bit FMA instructions are present in the scheduler. That means a mix of 256-bit and 512-bit FMA instructions will not achieve higher IPC than executing 512-bit instructions alone."

          Source: https://chipsandcheese.com/2023/03/1...pphire-rapids/


          Note that the author says you can't beat 512-bit throughput with a mix of 256 + 512, but does not say that 512-bit isn't faster than 256-bit.

          BTW, AVX-512 includes support for 128-bit and 256-bit operands. So, the question of register count and instruction support can be separated from operand width.
          Last edited by coder; 05 January 2024, 01:58 PM.

          Comment


          • #6
            Originally posted by coder View Post
            Of course, since we know you have an estranged relationship with humility
            Yeah, I have been estranged from humility for a while now, I am not very happy about it, we tried couple's therapy but that didn't work out.

            Last I heard humility had found herself someone new and was playing hide the Genoa salami with him on a regular basis, so I guess she's doing Well, that's his name by the way.

            As for me, I think I need to find someone that understands me and Humility definitely was not it.

            Comment


            • #7
              Originally posted by sophisticles View Post
              Yeah, I have been estranged from humility for a while now, I am not very happy about it,
              Humility will motivate you to check your facts, to help you avoid sticking your foot in your mouth. And, when you inevitably do (we all do, at one time or another), makes it easier to climb down from an unsupportable position.

              What I've seen is that people with more experience and expertise tend to have more humility, not less. This makes me even more mistrustful of anyone who seems overconfident.

              Comment


              • #8
                Originally posted by sophisticles View Post
                Impressive.
                With AVX-512 these Xeons were in some cases 10x faster than without and were able to use less power and stay cooler.
                Just amazing the improvement Intel has made to AVX-512 since it was first introduced.
                If course when you have all the money in the world to hire the best engineers I guess this is to be expected.
                how can you make any meaning out of this if there is no comparison with AMD CPUs on that ?

                on intel cpus you will have a split in ISAs with the E cores only get 256bit AVX10

                on the AMD side with Zen4+Zen4C you only have 1 single ISA with full AVX512 support

                Zen5+Zen5c will be even more interesting with again 1 single ISA for both but with a full size AVX512 implementation on ZEN5
                and a double pump 256bit implementation of AVX512 on the Zen5c cores to save the tranistor count and make the cores as small as possible.

                AMD won the ISA war thats a fact and amd also already won the asymetric cpu scheduler war to because it is easy to just meassure cache miss calls and then move the application on the cores with more cache.

                intel E Cores instead is a hell on earth for the asymetric cpu scheduler without manual profiling you never know where to move the application threats.
                Phantom circuit Sequence Reducer Dyslexia

                Comment


                • #9
                  Originally posted by coder View Post
                  Humility will motivate you to check your facts, to help you avoid sticking your foot in your mouth. And, when you inevitably do (we all do, at one time or another), makes it easier to climb down from an unsupportable position.

                  What I've seen is that people with more experience and expertise tend to have more humility, not less. This makes me even more mistrustful of anyone who seems overconfident.
                  I take it you have never read any of Sir Isaac Newton's or Albert Einstein's writings.

                  BTW, i was not wrong, in fact you eventually conceded that my theory may have merit.

                  But I will throw you a bone(r) and admit that you did arrive at the correct conclusion as to why it was happening.

                  Now if you want to show me how much of a bad ass programmer you are, maybe you can crack an egg of knowledge on this noob's head and show me how an "expert" would go about reconstructing the code so that I can analyze FLT for values of A,B,C and N up to 1000.

                  Feel free to do in whatever language you are most proficient in.

                  Comment


                  • #10
                    Originally posted by qarium View Post
                    how can you make any meaning out of this if there is no comparison with AMD CPUs on that ?

                    on intel cpus you will have a split in ISAs with the E cores only get 256bit AVX10

                    on the AMD side with Zen4+Zen4C you only have 1 single ISA with full AVX512 support

                    Zen5+Zen5c will be even more interesting with again 1 single ISA for both but with a full size AVX512 implementation on ZEN5
                    and a double pump 256bit implementation of AVX512 on the Zen5c cores to save the tranistor count and make the cores as small as possible.

                    AMD won the ISA war thats a fact and amd also already won the asymetric cpu scheduler war to because it is easy to just meassure cache miss calls and then move the application on the cores with more cache.

                    intel E Cores instead is a hell on earth for the asymetric cpu scheduler without manual profiling you never know where to move the application threats.
                    I had promised myself that I wouldn't engage you in conversation because you are obviously mentally unbalanced, but damn you, your idiocy reeled me in.

                    The Intel Xeon Platinum 8592+ doesn't have any E-cores, which makes everything you just said really stupid:



                    Now go throw a tempter tantrum and claim that I am using Intel's website to infect your Fedfora box with a trojan.

                    Comment

                    Working...
                    X