Announcement

**timofonic** · 12 February 2024, 06:40 PM

What can this be useful for? What can benefit from it?

**flakmirror** · 12 February 2024, 07:18 PM

Originally posted by timofonic View Post

What can this be useful for? What can benefit from it?

TFA mentions NumPy, and any NumPy speedup is pretty big deal. While I personally don't deal with it that much as I work at the data storage level and mainly just maintain the raw data availibility via databases with minor massaging.

But if you deal with a lot of numerical data, say data analysis, then this might be huge.

**sobrus** · 13 February 2024, 03:56 AM

I have an impression that since its introduction years ago, ppl are trying to find real life applications for AVX512 that are not already covered by GPUs.
While with earlier SIMD extensions it was relatively simple - multimedia and games benefited much, this is no longer the case with AVX512.
So we have json parsing, ps3 emulation, AI interferencing, dynamic molecular simulation and ray tracing.
AVX512 is also a must nowadays for winning processor comparison on phoronix.

And now we are seeing Intel working on another specialized use case (sorting), with another specialized library.
I won't argue that it's better to have AVX512 than not, but still it doesn't convince me that Linus was wrong.
This is nowhere as useful as SSE/SSE2 was years ago.

Also, it would be nice to know what we are comparing AVX512 to. Because from what I see AVX2 code was added to this library much LATER than AVX512 code (in 4.0 few months ago). And isn't getting anywhere as much love as AVX512 which is being optimized again and again. So it's 5x faster than what? MMX?

I'm waiting for benchmarks comparing v3 and v4 optimized distro builds. At least AVX512 has some clever instructions that AVX2 doesn't have, and more registers.

**Anux** · 13 February 2024, 04:30 AM

Originally posted by sobrus View Post

While with earlier SIMD extensions it was relatively simple - multimedia and games benefited much, this is no longer the case with AVX512.
So we have json parsing, ps3 emulation, AI interferencing, dynamic molecular simulation and ray tracing.

Everything that benefited from SSE most likely also benefits from AVX.

This is nowhere as useful as SSE/SSE2 was years ago.

Give it time, as soon as it is as wide spread as SSE it will get more adoption, up until now only a small percentage of users have capable hardware.

Also reread what Linus said about AVX, it was much more specific than "it's useless".

Originally posted by timofonic View Post

What can benefit from it?

AMD CPUs.

**pong** · 13 February 2024, 05:14 PM

I wonder how good the code optimization is for taking advantage of AVX512 -- I mean not so much that I care
but only that if say LLVM can detect relevant use cases and optimize the generated code to use those features when it
sees IR that it could apply to then one would think one could just have some kind of analyzer tool
based on compiler code generation level analysis that looks for relevant spots and simply emits LINT
or something suggesting "Hey this code area could possibly be sped up" and then people could
attend to profiling / optimizing those areas specifically if there's something that could be done beyond
letting the compiler optimization work.

I guess the worst case would be "hot" / performance critical code that just isn't even phrased in such a way the
compiler could even figure out that it should / could use AVX512 for that function / block in which case if a person doesn't
catch it in a manual optimization process they'd never know.

Originally posted by sobrus View Post

I have an impression that since its introduction years ago, ppl are trying to find real life applications for AVX512 that are not already covered by GPUs.
While with earlier SIMD extensions it was relatively simple - multimedia and games benefited much, this is no longer the case with AVX512.
So we have json parsing, ps3 emulation, AI interferencing, dynamic molecular simulation and ray tracing.
AVX512 is also a must nowadays for winning processor comparison on phoronix.

And now we are seeing Intel working on another specialized use case (sorting), with another specialized library.
I won't argue that it's better to have AVX512 than not, but still it doesn't convince me that Linus was wrong.
This is nowhere as useful as SSE/SSE2 was years ago.

Also, it would be nice to know what we are comparing AVX512 to. Because from what I see AVX2 code was added to this library much LATER than AVX512 code (in 4.0 few months ago). And isn't getting anywhere as much love as AVX512 which is being optimized again and again. So it's 5x faster than what? MMX?

I'm waiting for benchmarks comparing v3 and v4 optimized distro builds. At least AVX512 has some clever instructions that AVX2 doesn't have, and more registers.

**sobrus** · 14 February 2024, 03:51 AM

Originally posted by pong View Post

I wonder how good the code optimization is for taking advantage of AVX512

From what I see, most software isn't even optimized for AVX2 yet, despite widespread hardware support for years. Even the ones that are usually highly tuned, like libvpx, which had much AVX2 work recently in 1.13 release (it runs noticeably better now).
I bought AVX2 capable machine three years ago, it was already "outdated" a bit back then, and not much has changed since. It's still a bit "outdated", and waiting for v3 software adoption at the same time.

**Anux** · 14 February 2024, 05:15 AM

I think most here have a warped perception how long SSE2 needed to be widely supported. Introduced 2000 but it wasn't till 2004 that all new CPUs had the instructions, then at least another 5 years till there was a saturation of SSE2 CPUs on the user side.
Microsoft Visual C++ got SSE2 compiler support in version 2012.

AVX512 was first introduced in 2016 and the most new CPUs today still come without it. So there is no way we are at the same saturation that we had with SSE2 in 2010.

**david-nk** · 14 February 2024, 10:56 AM

Sounds great in theory, having the potential to speed up basically every program ever written,
but in my initial test on a Ryzen 7950X the SIMD sort took 111 ms, while std::sort took 104 ms.

I was using the "sort points by distance to origin" example on the github page with 1m elements.

I'll have to look into it again later why that is the case.

**Anux** · 14 February 2024, 11:21 AM

Originally posted by david-nk View Post

Sounds great in theory, having the potential to speed up basically every program ever written,
but in my initial test on a Ryzen 7950X the SIMD sort took 111 ms, while std::sort took 104 ms.

This will vary heavily with different CPUs.
It might also depend on the compiler you're using and the options you pass it. (generic O2 vs native O3 for example)

Announcement

Intel Releases x86-simd-sort 5.0 With 4~5x Faster C++ Object Sorting Using AVX-512

Intel Releases x86-simd-sort 5.0 With 4~5x Faster C++ Object Sorting Using AVX-512

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment