Announcement

Collapse
No announcement yet.

Intel Dramatically Speeds Up NSS With AVX2

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Intel Dramatically Speeds Up NSS With AVX2

    Phoronix: Intel Dramatically Speeds Up NSS With AVX2

    Intel has managed to dramatically speed-up Network Security Services (NSS) for the new Haswell (and forthcoming Broadwell) processors that boast AVX2 instruction set support...

    http://www.phoronix.com/vr.php?view=MTM5NDY

  • #2
    That's certainly one way to wait on coding until a new CPU product is released and then proclaim a big improvement if and only if you buy our new massively overpriced CPU.

    Nothing NSS does cannot be massively improved by using OpenCL. Presto! You don't have to buy this new CPU!

    Comment


    • #3
      This is very nice. My first thought was: generic 64 bit binaries will not benefit from the avx2 extension, will they? But then I realized, this is actually an implementation of an algorithm. Did anyone look at the patch? It's beyond my head, but most of it is assembler. How does this work? The platform is detected at runtime, and the optimizations get picked up on the the haswell / broadwell processors?

      Comment


      • #4
        Originally posted by Marc Driftmeyer View Post
        Nothing NSS does cannot be massively improved by using OpenCL. Presto! You don't have to buy this new CPU!
        Easy to say, but where's the code for that? Intel's contribution might be self-serving, but it's still a contribution - real code that can be used today. And that's more useful than a wishlist that says OpenCL would be better...

        Comment


        • #5
          Originally posted by mendieta View Post
          How does this work? The platform is detected at runtime, and the optimizations get picked up on the the haswell / broadwell processors?
          Looks like it, yeah. I can't read the assembler, but the C parts of the patch (assuming they've not been disabled at compiletime) seem to be doing runtime checking of whether the current CPU supports the AVX2 extensions.

          Comment


          • #6
            Originally posted by Marc Driftmeyer View Post
            Nothing NSS does cannot be massively improved by using OpenCL. Presto! You don't have to buy this new CPU!
            I'm not sure that's true. If you have a discrete card, it means you have to ship the data all the way from the CPU over to the GPU. That's widely known to be extremely slow, and it's why you don't invoke OpenCL kernels unless you've got a decent amount of data to crunch through at once, to overcome the latency slowdown.

            I have no idea if the work Firefox is doing would qualify for that or not, but i suspect in many cases it probably wouldn't.

            Now, when AMD releases their HSA CPUs, things might be different. There the GPU is able to address the same memory, from aboard the same cpu chip, and it should allow OpenCL to be a lot more viable for these types of applications. It also might mean that the AVX2 code is just running on the same hardware that the OpenCL code is, though, making an extra OpenCL version pointless.

            Comment


            • #7
              Although I agree it is a good thing that Intel is participating in this project, I'm also a little skeptical. If you have a modern Haswell CPU, is NSS processing really going to be a noticable bottleneck for your average browsing session? I don't think so. Perhaps we are indeed looking at a technology demo with little impact.

              Comment


              • #8
                Originally posted by bastiaan View Post
                Although I agree it is a good thing that Intel is participating in this project, I'm also a little skeptical. If you have a modern Haswell CPU, is NSS processing really going to be a noticable bottleneck for your average browsing session? I don't think so. Perhaps we are indeed looking at a technology demo with little impact.
                If you have an Atom cpu is NSS a bottleneck?

                Comment


                • #9
                  Originally posted by Ferdinand View Post
                  If you have an Atom cpu is NSS a bottleneck?
                  I doubt it, except under special circumstances. NSS is fast enough to run smoothly on low power ARM chips (as part of Firefox mobile, for example), and a Haswell Atom will surely be at least on par with the performance of the fastest ARM chip.

                  Comment


                  • #10
                    What happened to AVX1? No support?

                    Comment


                    • #11
                      Anecdote of when NSS is a bottleneck: trying to start FF in a fresh VM. There is not enough entropy, NSS insists on /dev/random instead of /dev/urandom, and so FF start blocks until NSS has enough random data.

                      Comment


                      • #12
                        Originally posted by bastiaan View Post
                        Although I agree it is a good thing that Intel is participating in this project, I'm also a little skeptical. If you have a modern Haswell CPU, is NSS processing really going to be a noticable bottleneck for your average browsing session? I don't think so. Perhaps we are indeed looking at a technology demo with little impact.
                        Probably useful server side.

                        Comment


                        • #13
                          Originally posted by RealNC View Post
                          What happened to AVX1? No support?
                          well the first release of AVX is FP operations using 256 wide instructions and AVX2 handle the integer operations using 256 wide instructions, so assuming the algorithm is integer only it would be trivial for AVX2 but not so much for AVX since you need an additional step and cast the operands as floats/doubles before move the vector to the closest cache[L1/L2] to later recast the output vector back to integer.

                          now if the algorithm can operate with FP then it can use AVX just fine.

                          abut FMA4/3 im not entirely sure either can handle integer vectors.

                          so as a thumb rule AVX1 Floating Point only and AVX2 Integer only

                          Comment


                          • #14
                            Originally posted by erendorn View Post
                            Probably useful server side.
                            That's what my first thought was, since that's an environment where you would be doing a lot of signing and verifying.

                            Comment


                            • #15
                              Originally posted by bastiaan View Post
                              I doubt it, except under special circumstances. NSS is fast enough to run smoothly on low power ARM chips (as part of Firefox mobile, for example), and a Haswell Atom will surely be at least on par with the performance of the fastest ARM chip.
                              Apparently the new Atoms won't be based on Haswell, but on 'Silvermont', which won't have AVX2. So it is a moot point.

                              Comment

                              Working...
                              X