Announcement

**qarium** · 08 April 2024, 03:14 PM

Originally posted by Joe2021 View Post

Sorry, but I dislike the headline. It gives the impression this is an AMD exclusive gain, but in fact intel does not only have a very similar gain (+155% vs +151%, who cares?), but is in fact even faster in absolute numbers.

it shows that the AMD AVX-512 double-pump-256bit approach is 4% faster than the full-sized 10nm Xeon implementation

its the first time in decades that everyone see that AMD won the Intel-ISA war LOL...

what a shame for intel.

**qarium** · 08 April 2024, 03:15 PM

Originally posted by Chugworth View Post

Ice Lake and Sapphire Rapids are Xeon chips, so if you're just looking at desktops then it is largely an AMD gain. Intel abandoned AVX-512 after the 11th gen on the desktop, but AMD picked it up. AVX-512 was criticized at first, and even Torvalds said he hoped AVX-512 'dies a painful death'. But as it turns out, AVX-512 is actually really good.

AMD Won the ISA-WAR this time its the AMD AVX-512 double-pump-256bit approach who is 4% faster than the full-sized 10nm Xeon implementation

Intel has literally nothing to compete.

**piotrj3** · 08 April 2024, 06:12 PM

Originally posted by qarium View Post

it shows that the AMD AVX-512 double-pump-256bit approach is 4% faster than the full-sized 10nm Xeon implementation

its the first time in decades that everyone see that AMD won the Intel-ISA war LOL...

what a shame for intel.

Intel 512 isn't full-sized implementation. AVX-512 is glued essentially 2x256. Only major difference between intel and amd is that technically Intel's AVX512 is atomic when memory alligment is correct.

**Paradigm Shifter** · 08 April 2024, 06:38 PM

Oh, that has to hurt. Intel develop AVX-512, screw it up, effectively abandon it (implementing then removing from consumer CPUs, as without those it will never see ubiquity) and AMD do it better...

**MarkG** · 08 April 2024, 06:56 PM

Originally posted by Anux View Post

... LUKS FDE all the way!

A few years ago, my dentist found a cavity in an upper-back molar and did a root-canal. It was not until I was driving home, and after the novacaine wore off, that I realized "OMG, I've been putting up with this discomfort for a long time." I've been doing LUKS FDE ~everywhere ~forever, but don't notice any pain -- it's just the cost of doing business. But, I suspect/hope that when this rolls out generally, I'll have a similar "Ahhh" moment when I notice the speed-up.

**bradh352** · 08 April 2024, 07:37 PM

Originally posted by MarkG View Post

I've been doing LUKS FDE ~everywhere ~forever, but don't notice any pain -- it's just the cost of doing business. But, I suspect/hope that when this rolls out generally, I'll have a similar "Ahhh" moment when I notice the speed-up.

Its fairly unlikely you're being bottlenecked on your encryption performance, but more likely not using the right flags in modern systems. See

Speeding up Linux disk encryption

https://blog.cloudflare.com/speeding-up-linux-disk-encryption

In this post, we will investigate the performance of disk encryption on Linux and explain how we made it at least two times faster for ourselves and our customers!

You need to use no_read_workqueue and no_write_workqueue on modern systems. I really have no idea why those aren't the default these days.

**MarkG** · 08 April 2024, 09:11 PM

Originally posted by bradh352 View Post

You need to use no_read_workqueue and no_write_workqueue on modern systems. I really have no idea why those aren't the default these days.

Thanks for the article. I plan to try this over my morning coffee...

**numacross** · 09 April 2024, 02:56 AM

Originally posted by piotrj3 View Post

Intel 512 isn't full-sized implementation. AVX-512 is glued essentially 2x256. Only major difference between intel and amd is that technically Intel's AVX512 is atomic when memory alligment is correct.

Which Intel implementation are you describing?

Skylake-X had two AVX-512 execution pipelines: a fused port 0+1 which indeed was 2x256-bit (but in a different way than Zen 4) with a second dedicated 1x512-bit unit on port 5.
Sunny Cove in Ice Lake server (and Ice Lake mobile, and desktop Rocket Lake) used one 512-bit FMA and ALU on port 0 with a second 512-bit ALU on port 5.
Golden Cove in its client version (early Alder Lake that allowed AVX-512 with E-cores disabled) indeed has only fused ports 0+1, so 2x256-bit. The server version of Golden Cove used in Sapphire Rapids has a second full 512-bit unit.

**piotrj3** · 09 April 2024, 04:48 AM

Originally posted by numacross View Post

Which Intel implementation are you describing?

Skylake-X had two AVX-512 execution pipelines: a fused port 0+1 which indeed was 2x256-bit (but in a different way than Zen 4) with a second dedicated 1x512-bit unit on port 5.
Sunny Cove in Ice Lake server (and Ice Lake mobile, and desktop Rocket Lake) used one 512-bit FMA and ALU on port 0 with a second 512-bit ALU on port 5.
Golden Cove in its client version (early Alder Lake that allowed AVX-512 with E-cores disabled) indeed has only fused ports 0+1, so 2x256-bit. The server version of Golden Cove used in Sapphire Rapids has a second full 512-bit unit.

About that detail i didn't know (I remembered those few cases with fused 0+1). Anyway, after reading deeper, skylake-x actaully has always 2 ports, but not always 2 FMA units to support 512 units. 2x256 bit is always there, but only above 10+ cores you had 512 one. But yeah i kinda didn't read deep enough to be aware of that 2nd unit that is full sized 512 bit.

**Anux** · 09 April 2024, 05:20 AM

Originally posted by MarkG View Post

A few years ago, my dentist found a cavity in an upper-back molar and did a root-canal. It was not until I was driving home, and after the novacaine wore off, that I realized "OMG, I've been putting up with this discomfort for a long time." I've been doing LUKS FDE ~everywhere ~forever, but don't notice any pain -- it's just the cost of doing business. But, I suspect/hope that when this rolls out generally, I'll have a similar "Ahhh" moment when I notice the speed-up.

I never noticed any slow downs from FDE, my laptop has a core i 3000 and its AES is faster than the sata 3 SSD and on my desktop the NVMe has around 3,5 GB/s throughput and AES is around 3 GB/s, since those speeds are never the bottleneck for me there is no noticeable performance hit.
I guess with heavy database usage or other disk intensive workloads you might be able to notice something.

I'm not even sure if I activated the workqueue hack on my current system but I used it in the past and there was no noticeable change.

With the improvement from the article I don't expect to feel a difference but the CPU load should be less.

Announcement

Linux 6.10 AES-XTS For Disk/File Encryption As Much As ~155% Faster For AMD Zen 4 CPUs

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment