Announcement

Collapse
No announcement yet.

Linus Torvalds: "I Hope AVX512 Dies A Painful Death"

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Linus Torvalds: "I Hope AVX512 Dies A Painful Death"

    Phoronix: Linus Torvalds: "I Hope AVX512 Dies A Painful Death"

    Linux creator Linus Torvalds had some choice words today on Advanced Vector Extensions 512 (AVX-512) found on select Intel processors...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Even for FP workloads, there are two outstanding problems that Intel is unable to (or unwilling to) address
    1. According to the specification, AVX512 module may run below BASE frequency;
    2. Fragmented product line: Not all chips on sale have AVX-2 by far, nor to say AVX-512.

    Comment


    • #3
      Torvalds ought to look into the GFNI and VBMI as at least two major examples of where AVX-512 can massively improve the performance in critical areas of kernel code that have nothing to do with floating point.

      Here's an article that should be of great interest: https://branchfree.org/2019/05/29/wh...s-perspective/

      Here's a paper showing AVX-512 making base64 encoding basically as fast as the memcpy command can move the data: https://arxiv.org/pdf/1910.05109.pdf Of course, while base64 isn't a kernel-specific algorithm, it's the same type of non-floating point byte-processing that happens all the time in the kernel and that could be massively improved with the correct use of the architecture.

      Maybe he's just venting after having to politically-correct the kernel source all day.
      Last edited by chuckula; 11 July 2020, 09:28 PM. Reason: Added reference to base64 encoding paper.

      Comment


      • #4
        Originally posted by chuckula View Post
        Maybe he's just venting after having to politically-correct the kernel source all day.
        Very likely so.

        ​​​​​If he spoke 10 years ago, this would be a rant full of swear words.

        Comment


        • #5
          AVX512 is a nickname for snowflakes? I'm in.

          Comment


          • #6
            Originally posted by chuckula View Post
            Here's a paper showing AVX-512 making base64 encoding basically as fast as the memcpy command can move the data: https://arxiv.org/pdf/1910.05109.pdf Of course, while base64 isn't a kernel-specific algorithm, it's the same type of non-floating point byte-processing that happens all the time in the kernel and that could be massively improved with the correct use of the architecture.
            base64 never shows up in large lengths in consumer workloads. So if you use that AVX512 to process that chunk of data the processor is going to run reduced-clock for at least 32ms after, and I guarantee that will slow things down more than the AVX512 sped up the decode.

            Comment


            • #7
              Games do care about AVX512, though, and 3D tools like Blender will care. Big budget movies that buy Intel by the truckload will care, which is why Intel is going to do it. Also, Cloud companies care, as a feature for their competitive advantage.

              Comment


              • #8
                a noob question: how is avx512 implemented in the cores, is it per core or one for all?
                another noob question: is avx512 something that can be "emulated" by some kind of a tensor-coprocessor?

                Comment


                • #9
                  Originally posted by dragorth View Post
                  Games do care about AVX512, though, and 3D tools like Blender will care. Big budget movies that buy Intel by the truckload will care, which is why Intel is going to do it. Also, Cloud companies care, as a feature for their competitive advantage.
                  Games are definitely a bad use-case of AVX512, and are the worst thing to use it with. The reduced clock speed would be a huge impact. The time it takes to ramp back up is more than a few frames, and with a latency of anything more than a frame the processor would never run at a normal clock speed.

                  Blender is arguable. The only benefit would be during rendering on the CPU, but it’s already better to use a GPU there.

                  Comment


                  • #10
                    Same goes with Intel GPUs. Why waste transistor on something nobody cares?

                    Comment

                    Working...
                    X