Announcement

Collapse
No announcement yet.

Linux 6.10 AES-XTS For Disk/File Encryption As Much As ~155% Faster For AMD Zen 4 CPUs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by Joe2021 View Post
    Sorry, but I dislike the headline. It gives the impression this is an AMD exclusive gain, but in fact intel does not only have a very similar gain (+155% vs +151%, who cares?), but is in fact even faster in absolute numbers.
    it shows that the AMD AVX-512 double-pump-256bit approach is 4% faster than the full-sized 10nm Xeon implementation

    its the first time in decades that everyone see that AMD won the Intel-ISA war LOL...

    what a shame for intel.
    Phantom circuit Sequence Reducer Dyslexia

    Comment


    • #12
      Originally posted by Chugworth View Post
      Ice Lake and Sapphire Rapids are Xeon chips, so if you're just looking at desktops then it is largely an AMD gain. Intel abandoned AVX-512 after the 11th gen on the desktop, but AMD picked it up. AVX-512 was criticized at first, and even Torvalds said he hoped AVX-512 'dies a painful death'. But as it turns out, AVX-512 is actually really good.
      AMD Won the ISA-WAR this time its the AMD AVX-512 double-pump-256bit approach who is 4% faster than the full-sized 10nm Xeon implementation

      Intel has literally nothing to compete.
      Phantom circuit Sequence Reducer Dyslexia

      Comment


      • #13
        Originally posted by qarium View Post

        it shows that the AMD AVX-512 double-pump-256bit approach is 4% faster than the full-sized 10nm Xeon implementation

        its the first time in decades that everyone see that AMD won the Intel-ISA war LOL...

        what a shame for intel.
        Intel 512 isn't full-sized implementation. AVX-512 is glued essentially 2x256. Only major difference between intel and amd is that technically Intel's AVX512 is atomic when memory alligment is correct.

        Comment


        • #14
          Oh, that has to hurt. Intel develop AVX-512, screw it up, effectively abandon it (implementing then removing from consumer CPUs, as without those it will never see ubiquity) and AMD do it better...

          Comment


          • #15
            Originally posted by Anux View Post
            ... LUKS FDE all the way!
            A few years ago, my dentist found a cavity in an upper-back molar and did a root-canal. It was not until I was driving home, and after the novacaine wore off, that I realized "OMG, I've been putting up with this discomfort for a long time." I've been doing LUKS FDE ~everywhere ~forever, but don't notice any pain -- it's just the cost of doing business. But, I suspect/hope that when this rolls out generally, I'll have a similar "Ahhh" moment when I notice the speed-up.

            Comment


            • #16
              Originally posted by MarkG View Post
              I've been doing LUKS FDE ~everywhere ~forever, but don't notice any pain -- it's just the cost of doing business. But, I suspect/hope that when this rolls out generally, I'll have a similar "Ahhh" moment when I notice the speed-up.
              Its fairly unlikely you're being bottlenecked on your encryption performance, but more likely not using the right flags in modern systems. See
              In this post, we will investigate the performance of disk encryption on Linux and explain how we made it at least two times faster for ourselves and our customers!


              You need to use no_read_workqueue and no_write_workqueue on modern systems. I really have no idea why those aren't the default these days.

              Comment


              • #17
                Originally posted by bradh352 View Post
                You need to use no_read_workqueue and no_write_workqueue on modern systems. I really have no idea why those aren't the default these days.
                Thanks for the article. I plan to try this over my morning coffee...

                Comment


                • #18
                  Originally posted by piotrj3 View Post
                  Intel 512 isn't full-sized implementation. AVX-512 is glued essentially 2x256. Only major difference between intel and amd is that technically Intel's AVX512 is atomic when memory alligment is correct.
                  Which Intel implementation are you describing?

                  Skylake-X had two AVX-512 execution pipelines: a fused port 0+1 which indeed was 2x256-bit (but in a different way than Zen 4) with a second dedicated 1x512-bit unit on port 5.
                  Sunny Cove in Ice Lake server (and Ice Lake mobile, and desktop Rocket Lake) used one 512-bit FMA and ALU on port 0 with a second 512-bit ALU on port 5.
                  Golden Cove in its client version (early Alder Lake that allowed AVX-512 with E-cores disabled) indeed has only fused ports 0+1, so 2x256-bit. The server version of Golden Cove used in Sapphire Rapids has a second full 512-bit unit.

                  Comment


                  • #19
                    Originally posted by numacross View Post

                    Which Intel implementation are you describing?

                    Skylake-X had two AVX-512 execution pipelines: a fused port 0+1 which indeed was 2x256-bit (but in a different way than Zen 4) with a second dedicated 1x512-bit unit on port 5.
                    Sunny Cove in Ice Lake server (and Ice Lake mobile, and desktop Rocket Lake) used one 512-bit FMA and ALU on port 0 with a second 512-bit ALU on port 5.
                    Golden Cove in its client version (early Alder Lake that allowed AVX-512 with E-cores disabled) indeed has only fused ports 0+1, so 2x256-bit. The server version of Golden Cove used in Sapphire Rapids has a second full 512-bit unit.
                    About that detail i didn't know (I remembered those few cases with fused 0+1). Anyway, after reading deeper, skylake-x actaully has always 2 ports, but not always 2 FMA units to support 512 units. 2x256 bit is always there, but only above 10+ cores you had 512 one. But yeah i kinda didn't read deep enough to be aware of that 2nd unit that is full sized 512 bit.

                    Comment


                    • #20
                      Originally posted by MarkG View Post
                      A few years ago, my dentist found a cavity in an upper-back molar and did a root-canal. It was not until I was driving home, and after the novacaine wore off, that I realized "OMG, I've been putting up with this discomfort for a long time." I've been doing LUKS FDE ~everywhere ~forever, but don't notice any pain -- it's just the cost of doing business. But, I suspect/hope that when this rolls out generally, I'll have a similar "Ahhh" moment when I notice the speed-up.
                      I never noticed any slow downs from FDE, my laptop has a core i 3000 and its AES is faster than the sata 3 SSD and on my desktop the NVMe has around 3,5 GB/s throughput and AES is around 3 GB/s, since those speeds are never the bottleneck for me there is no noticeable performance hit.
                      I guess with heavy database usage or other disk intensive workloads you might be able to notice something.

                      I'm not even sure if I activated the workqueue hack on my current system but I used it in the past and there was no noticeable change.

                      With the improvement from the article I don't expect to feel a difference but the CPU load should be less.

                      Comment

                      Working...
                      X