Announcement

Collapse
No announcement yet.

Improving The Linux Kernel's Memory Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    memcpy? As in... memcpy from string.h? What does this have to do with the kernel? Is there some system call that's also called memcpy?

    Comment


    • #12
      Originally posted by phoronix View Post
      Phoronix: Improving The Linux Kernel's Memory Performance

      Over the past few days there's been an active discussion on the Linux kernel mailing list surrounding the memory copy (the memcpy function to copy blocks of memory) performance within the kernel. In particular, an application vendor claims to have boosted their application (a video recorder) performance by 12% when implementing an "optimized" memory copy function that takes advantage of SSE3...

      http://www.phoronix.com/vr.php?view=OTgwMQ
      Do we really want to add more x86 specific code to the kernel?
      Other than that, sounds cool. I had no idea hitting the SSE was so costly. I suppose it makes sense that they were intended for rather larger data sets, but still, hadn't occured to me.

      Comment


      • #13
        Originally posted by Smorg View Post
        memcpy? As in... memcpy from string.h? What does this have to do with the kernel? Is there some system call that's also called memcpy?
        Copying memory around is pretty common. When I was writing video drivers there were a number of places where we used MMX for memory copies because it was faster at copying aligned data than rep mov (basically we read a bunch of memory into MMX registers from the source and then wrote them all out to the destination so we were processing 128 bytes or something at a time).

        Comment


        • #14
          BTW, wasn't SSE3 a compulsory part of AMD64? If so, then this would presumably go into any AMD64 kernel with no need to check processor flags.

          Comment


          • #15
            Originally posted by Drago View Post
            You did exactly what on your kernel? Replaced x86 memcopy() implementation with SSE one?
            Sorry, I didn't explained well. My kernel isn't Linux, is a kernel I've written from scratch

            However I wrote four different versions of memcpy (normal, MMX, SSE, SSE2) and used the best according to the CPU support.

            Comment


            • #16
              Originally posted by movieman View Post
              BTW, wasn't SSE3 a compulsory part of AMD64? If so, then this would presumably go into any AMD64 kernel with no need to check processor flags.
              No. The first AMD64 CPU's didn't have it (130nm and the initial 90nm chips).

              Comment


              • #17
                Originally posted by Dylar View Post
                It's indeed pretty interesting.
                But if I use a prebuilt generic x86_64 kernel provided by my distro, is there a way the kernel could autodetect if my CPU has support for SSE3 at runtime, or do I have to recompile the kernel ?
                You seem to be asking two separate questions at the same time.

                1. Yes, there is a way for the kernel to check for SSE3 support. /cat/proc demonstrates this capability.

                2. There is a distinction between -march style optimizations, rather than -mtune style optimizations. With -mtune the compiler will generate multiple versions of different code paths so that every CPU gets its own "optimized" version. With -march, the compiler assumes that these instructions are always available. To the best of my knowledge, the kernel uses -march. I have not checked the code to verify this, but I am fairly certain that if you configure your kernel compilation to build a kernel for hardware newer than what you have, it breaks when you try to run it on the older hardware, which is consistent with -march. Therefore, the kernel must be explicitly compiled for it.

                Originally posted by dimko View Post
                cat /proc/cpuinfo |grep sse3

                It's sort weird, but i dont seem to have SSE3 on my AMD quad core, however, I think extension was there, just for licensing matters it was called something else. I wonder what is SSE4A and if it absorbs SSE3 into itself?
                SSE4A is AMD's variant of Intel's SSE4 extensions.

                Originally posted by dimko View Post
                pni - checked!

                Is it still called PNI on Intel CPU?
                They never renamed it. It would cause newer processors to use code paths meant for much older processors when executing older binaries if they did that.

                They call it pni on AMD cpus for the same reason.

                Originally posted by Smorg View Post
                memcpy? As in... memcpy from string.h? What does this have to do with the kernel? Is there some system call that's also called memcpy?
                If you write a kernel, you need a way of copying data back and forth between real memory and virtual memory. You will have a problem if you hit a page boundary and the rest of what you are copying does not continue on the next page.

                Also, when writing a kernel, you need to write your own library routines, because libraries specified in the ANSI C specification are meant for userland, not kernels.

                Originally posted by liam View Post
                Do we really want to add more x86 specific code to the kernel?
                Other than that, sounds cool. I had no idea hitting the SSE was so costly. I suppose it makes sense that they were intended for rather larger data sets, but still, hadn't occured to me.
                It is not necessarily x86 specific. It is a technique that applies to any CPU that has SSE3-like vector instructions and if they implement it properly, every CPU with such instructions should see a boost.


                Originally posted by movieman View Post
                BTW, wasn't SSE3 a compulsory part of AMD64? If so, then this would presumably go into any AMD64 kernel with no need to check processor flags.
                AMD produced the K8 architecture first and then Intel produced Prescott in response. SSE3 did not exist when the K8 architecture was made, so it was not part of the original x86_64 instruction set.
                Last edited by Shining Arcanine; 16 August 2011, 10:57 PM.

                Comment


                • #18
                  Originally posted by Shining Arcanine View Post
                  There is a distinction between -march style optimizations, rather than -mtune style optimizations. With -mtune the compiler will generate multiple versions of different code paths so that every CPU gets its own "optimized" version. With -march, the compiler assumes that these instructions are always available.
                  No, that's not what -mtune is doing. It does not generate any instructions that would not run on other CPUs. It just applies changes that will work everywhere, but are known to result in faster execution. From the docs:

                  "-mtune=cpu-type - Tune to cpu-type everything applicable about the generated code, except for the ABI and the set of available instructions."

                  So no multiple code paths or anything. That's what the Intel compiler does. GCC doesn't provide that functionality.

                  Comment


                  • #19
                    You mean the if (!intel) slow(); path?

                    Comment


                    • #20
                      Originally posted by movieman View Post
                      BTW, wasn't SSE3 a compulsory part of AMD64? If so, then this would presumably go into any AMD64 kernel with no need to check processor flags.
                      No, AMD64 implies SSE2.

                      Comment

                      Working...
                      X