Announcement

Collapse
No announcement yet.

AVX / AVX2 / AVX-512 Performance + Power On Intel Rocket Lake

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • AVX / AVX2 / AVX-512 Performance + Power On Intel Rocket Lake

    Phoronix: AVX / AVX2 / AVX-512 Performance + Power On Intel Rocket Lake

    Here is a look at the AVX / AVX2 / AVX-512 performance on the Intel Core i9 11900K "Rocket Lake" when building a set of relevant open-source benchmarks limited to AVX, AVX2, and AVX-512 caps each time while also monitoring the CPU package power consumption during the tests for looking at the performance-per-Watt in providing some fresh reference metrics over AVX-512 on Linux with the latest Intel "Rocket Lake" processors.

    https://www.phoronix.com/vr.php?view=30091

  • #2
    I'm beginning to understand more clearly why Linus really doesn't like AVX512.
    Last edited by schmidtbag; 07 April 2021, 11:24 AM.

    Comment


    • #3
      Originally posted by schmidtbag View Post
      I'm beginning to understand more clearly why Linux really doesn't like AVX512.
      Yeah. Seems like after AVX2 the results are: maybe faster with guaranteed higher power usage and heat generation.

      I also kind of get why x86_64_v2 to v3 skips the AVX-only generation from those results. Kind of screws those people over because Bulldozer and Sandy Bridge are still perfectly adequate if you don't play 2019+ games at 4K ultra. A 3080 probably wants a better CPU than those.

      Comment


      • #4
        Originally posted by schmidtbag View Post
        I'm beginning to understand more clearly why Linus really doesn't like AVX512.
        And that is not even the worst reason Linus hate it, see the API between AVX2 and AVX512 and it will become painfully obvious.

        Whoever at Intel decided to make AVX512 this way should be flagged publicly, that monstrosity should have never seen the day but i guess in their desperation QA went out of the window

        Comment


        • #5
          Well, Cryptominer-Garlicoin shows great benefits. If only more software would be as optimized for AVX-512.... any idea why that particular benchmark shows huge benefits while others do not? I can think of autovectorization issues in the compiler or workloads that don't suit well to vectorization at all.

          Comment


          • #6
            At least for dav1d I'm suspicious if the benchmark measured something meaningful.

            If you look into the sourcode, you'll notice, that there's a ton of handwritten assembler code (including AVX512):
            https://code.videolan.org/videolan/d...master/src/x86
            And there's is code to directly decode the cpuid to determine the available vector instructions. I guess that setting the usual -march/.. compiler settings are pointless, they are used anyway.

            Comment


            • #7
              Originally posted by schmidtbag View Post
              I'm beginning to understand more clearly why Linus really doesn't like AVX512.
              I don't think it is that. The thing is so far SIMD units have been fairly general purpose. Intel is cramming a bunch of highly purpose specific stuff into avx 512. His is not a problem with the width of execution or power efficiency, but with the support hell it is to keep introducing new niche use instructions and having no instruction set and features uniformity between platforms.

              Note that it is just coincidental that intel is adding all those custom instructions to avx 512, the same can be introduced on 256bit systems, which I think is how amd will initially support some of the more useful and general purpose avx instructions in its upcoming platforms. Similar to how they used 2x128 for avx 256 initially.

              Intel appears to have given up on improving general purpose performance and is doing a lot of work to boost purpose specific tasks both in hardware and on the software front as well. This also explains how come their cpus show disproportionately big gains in some corner cases, even if more or less stuck in general.
              Last edited by ddriver; 07 April 2021, 11:58 AM.

              Comment


              • #8
                Originally posted by mle86pho View Post
                At least for dav1d I'm suspicious if the benchmark measured something meaningful.

                If you look into the sourcode, you'll notice, that there's a ton of handwritten assembler code (including AVX512):
                https://code.videolan.org/videolan/d...master/src/x86
                And there's is code to directly decode the cpuid to determine the available vector instructions. I guess that setting the usual -march/.. compiler settings are pointless, they are used anyway.
                Yeah, this probably didn't disable runtime-CPU detection. Though the -march flags aren't ignored, they are overriden for the files with the special code. You can't use the intrinsics without right archs.

                Comment


                • #9
                  I wonder how much silicon avx 512 uses. Is it comparable to an extra core or two with avx 2?

                  Comment


                  • #10
                    AVX-512 is primarily aimed at software that has to perform a LOT of similar mathematical operations on large amounts of data. These kind of programs mostly fall into two categories:
                    - Math libraries like openBLAS or MKL, which use intrinsics
                    - Custom math kernels, mostly written in CUDA or other SPMD compilers like IntelĀ® Implicit SPMD Program Compiler (ispc.github.io)

                    These benchmarks only show that the software being compiled does not fall into either of these catagories (except for some of the mining stuff, probably hacky custom kernels...)

                    It is unfortunate that AVX-512 is getting a bad reputation because of these kind of benchmarks, because if you are making use of AVX-512 in the intended way, the performance benefints can be HUGE.

                    Comment

                    Working...
                    X