Announcement

Collapse
No announcement yet.

Improving The Linux Kernel's Memory Performance

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Improving The Linux Kernel's Memory Performance

    Phoronix: Improving The Linux Kernel's Memory Performance

    Over the past few days there's been an active discussion on the Linux kernel mailing list surrounding the memory copy (the memcpy function to copy blocks of memory) performance within the kernel. In particular, an application vendor claims to have boosted their application (a video recorder) performance by 12% when implementing an "optimized" memory copy function that takes advantage of SSE3...

    http://www.phoronix.com/vr.php?view=OTgwMQ

  • #2
    Very interesting

    Very interesting!
    Thank you!

    Comment


    • #3
      It's indeed pretty interesting.
      But if I use a prebuilt generic x86_64 kernel provided by my distro, is there a way the kernel could autodetect if my CPU has support for SSE3 at runtime, or do I have to recompile the kernel ?

      Comment


      • #4
        Originally posted by Dylar View Post
        But if I use a prebuilt generic x86_64 kernel provided by my distro, is there a way the kernel could autodetect if my CPU has support for SSE3 at runtime, or do I have to recompile the kernel ?
        Most likely, it's already doing it. Most likely your kernel already has support for SSE3, etc. Programs that are designed to take advantage of SSE3 will do so.
        Before, memcopy() function did magix of copying stuff, however, if I understood article correct, they want to use SSE3 for copying something big, which will give rather nice boost.

        But i am no programmer unfortunately.

        Comment


        • #5
          to check it:

          Originally posted by Dylar View Post
          But if I use a prebuilt generic x86_64 kernel provided by my distro, is there a way the kernel could autodetect if my CPU has support for SSE3 at runtime, or do I have to recompile the kernel ?
          cat /proc/cpuinfo |grep sse3

          It's sort weird, but i dont seem to have SSE3 on my AMD quad core, however, I think extension was there, just for licensing matters it was called something else. I wonder what is SSE4A and if it absorbs SSE3 into itself?

          Comment


          • #6
            Originally posted by dimko View Post
            cat /proc/cpuinfo |grep sse3

            It's sort weird, but i dont seem to have SSE3 on my AMD quad core, however, I think extension was there, just for licensing matters it was called something else. I wonder what is SSE4A and if it absorbs SSE3 into itself?
            It's not called "sse3" in /proc/cpuinfo. I believe the kernel calls it "pni" for "Prescott New Instructions" which was the Intel code name.

            Comment


            • #7
              Originally posted by signals View Post
              It's not called "sse3" in /proc/cpuinfo. I believe the kernel calls it "pni" for "Prescott New Instructions" which was the Intel code name.
              pni - checked!

              Is it still called PNI on Intel CPU?

              Comment


              • #8
                Originally posted by dimko View Post
                pni - checked!

                Is it still called PNI on Intel CPU?
                It is on my Core i7.

                Comment


                • #9
                  I did something like this in my kernel and the performance was really impressive.
                  At the time, I used CPUID to know if the CPU supported MMX or SSE or SSE2 and set malloc to use what was supported.

                  I hadn't any graphics driver, so drawing operations were really slow.
                  Without SSE2 I couldn't do any acceptable graphic operation, instead with SSE2 I could draw windows and move them flawlessy.

                  Probably this was due to the fact that there were only few threads and the data to move was consistent (1280 x 1024 x 4 bytes).

                  Comment


                  • #10
                    Originally posted by abral View Post
                    I did something like this in my kernel and the performance was really impressive.
                    At the time, I used CPUID to know if the CPU supported MMX or SSE or SSE2 and set malloc to use what was supported.

                    I hadn't any graphics driver, so drawing operations were really slow.
                    Without SSE2 I couldn't do any acceptable graphic operation, instead with SSE2 I could draw windows and move them flawlessy.

                    Probably this was due to the fact that there were only few threads and the data to move was consistent (1280 x 1024 x 4 bytes).
                    You did exactly what on your kernel? Replaced x86 memcopy() implementation with SSE one?

                    Comment


                    • #11
                      memcpy? As in... memcpy from string.h? What does this have to do with the kernel? Is there some system call that's also called memcpy?

                      Comment


                      • #12
                        Originally posted by phoronix View Post
                        Phoronix: Improving The Linux Kernel's Memory Performance

                        Over the past few days there's been an active discussion on the Linux kernel mailing list surrounding the memory copy (the memcpy function to copy blocks of memory) performance within the kernel. In particular, an application vendor claims to have boosted their application (a video recorder) performance by 12% when implementing an "optimized" memory copy function that takes advantage of SSE3...

                        http://www.phoronix.com/vr.php?view=OTgwMQ
                        Do we really want to add more x86 specific code to the kernel?
                        Other than that, sounds cool. I had no idea hitting the SSE was so costly. I suppose it makes sense that they were intended for rather larger data sets, but still, hadn't occured to me.

                        Comment


                        • #13
                          Originally posted by Smorg View Post
                          memcpy? As in... memcpy from string.h? What does this have to do with the kernel? Is there some system call that's also called memcpy?
                          Copying memory around is pretty common. When I was writing video drivers there were a number of places where we used MMX for memory copies because it was faster at copying aligned data than rep mov (basically we read a bunch of memory into MMX registers from the source and then wrote them all out to the destination so we were processing 128 bytes or something at a time).

                          Comment


                          • #14
                            BTW, wasn't SSE3 a compulsory part of AMD64? If so, then this would presumably go into any AMD64 kernel with no need to check processor flags.

                            Comment


                            • #15
                              Originally posted by Drago View Post
                              You did exactly what on your kernel? Replaced x86 memcopy() implementation with SSE one?
                              Sorry, I didn't explained well. My kernel isn't Linux, is a kernel I've written from scratch

                              However I wrote four different versions of memcpy (normal, MMX, SSE, SSE2) and used the best according to the CPU support.

                              Comment

                              Working...
                              X