Announcement

**zxy_thf** · 11 July 2020, 08:48 PM

Even for FP workloads, there are two outstanding problems that Intel is unable to (or unwilling to) address
1. According to the specification, AVX512 module may run below BASE frequency;
2. Fragmented product line: Not all chips on sale have AVX-2 by far, nor to say AVX-512.

**chuckula** · 11 July 2020, 09:10 PM

Torvalds ought to look into the GFNI and VBMI as at least two major examples of where AVX-512 can massively improve the performance in critical areas of kernel code that have nothing to do with floating point.

Here's an article that should be of great interest: https://branchfree.org/2019/05/29/wh...s-perspective/

Here's a paper showing AVX-512 making base64 encoding basically as fast as the memcpy command can move the data: https://arxiv.org/pdf/1910.05109.pdf Of course, while base64 isn't a kernel-specific algorithm, it's the same type of non-floating point byte-processing that happens all the time in the kernel and that could be massively improved with the correct use of the architecture.

Maybe he's just venting after having to politically-correct the kernel source all day.

**tildearrow** · 11 July 2020, 09:25 PM

Originally posted by chuckula View Post

Maybe he's just venting after having to politically-correct the kernel source all day.

Very likely so.

If he spoke 10 years ago, this would be a rant full of swear words.

**Duff~** · 11 July 2020, 09:30 PM

AVX512 is a nickname for snowflakes? I'm in.

**bearoso** · 11 July 2020, 09:49 PM

Originally posted by chuckula View Post

Here's a paper showing AVX-512 making base64 encoding basically as fast as the memcpy command can move the data: https://arxiv.org/pdf/1910.05109.pdf Of course, while base64 isn't a kernel-specific algorithm, it's the same type of non-floating point byte-processing that happens all the time in the kernel and that could be massively improved with the correct use of the architecture.

base64 never shows up in large lengths in consumer workloads. So if you use that AVX512 to process that chunk of data the processor is going to run reduced-clock for at least 32ms after, and I guarantee that will slow things down more than the AVX512 sped up the decode.

**dragorth** · 11 July 2020, 10:14 PM

Games do care about AVX512, though, and 3D tools like Blender will care. Big budget movies that buy Intel by the truckload will care, which is why Intel is going to do it. Also, Cloud companies care, as a feature for their competitive advantage.

**Anarchy** · 11 July 2020, 10:26 PM

a noob question: how is avx512 implemented in the cores, is it per core or one for all?
another noob question: is avx512 something that can be "emulated" by some kind of a tensor-coprocessor?

**bearoso** · 11 July 2020, 10:29 PM

Originally posted by dragorth View Post

Games do care about AVX512, though, and 3D tools like Blender will care. Big budget movies that buy Intel by the truckload will care, which is why Intel is going to do it. Also, Cloud companies care, as a feature for their competitive advantage.

Games are definitely a bad use-case of AVX512, and are the worst thing to use it with. The reduced clock speed would be a huge impact. The time it takes to ramp back up is more than a few frames, and with a latency of anything more than a frame the processor would never run at a normal clock speed.

Blender is arguable. The only benefit would be during rendering on the CPU, but it’s already better to use a GPU there.

**dxin** · 11 July 2020, 10:32 PM

Same goes with Intel GPUs. Why waste transistor on something nobody cares?

Announcement

Linus Torvalds: "I Hope AVX512 Dies A Painful Death"

Linus Torvalds: "I Hope AVX512 Dies A Painful Death"

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment