Announcement

Collapse
No announcement yet.

Improving The Linux Kernel's Memory Performance

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Improving The Linux Kernel's Memory Performance

    Phoronix: Improving The Linux Kernel's Memory Performance

    Over the past few days there's been an active discussion on the Linux kernel mailing list surrounding the memory copy (the memcpy function to copy blocks of memory) performance within the kernel. In particular, an application vendor claims to have boosted their application (a video recorder) performance by 12% when implementing an "optimized" memory copy function that takes advantage of SSE3...

    http://www.phoronix.com/vr.php?view=OTgwMQ

  • #2
    Very interesting

    Very interesting!
    Thank you!

    Comment


    • #3
      It's indeed pretty interesting.
      But if I use a prebuilt generic x86_64 kernel provided by my distro, is there a way the kernel could autodetect if my CPU has support for SSE3 at runtime, or do I have to recompile the kernel ?

      Comment


      • #4
        Originally posted by Dylar View Post
        But if I use a prebuilt generic x86_64 kernel provided by my distro, is there a way the kernel could autodetect if my CPU has support for SSE3 at runtime, or do I have to recompile the kernel ?
        Most likely, it's already doing it. Most likely your kernel already has support for SSE3, etc. Programs that are designed to take advantage of SSE3 will do so.
        Before, memcopy() function did magix of copying stuff, however, if I understood article correct, they want to use SSE3 for copying something big, which will give rather nice boost.

        But i am no programmer unfortunately.

        Comment


        • #5
          to check it:

          Originally posted by Dylar View Post
          But if I use a prebuilt generic x86_64 kernel provided by my distro, is there a way the kernel could autodetect if my CPU has support for SSE3 at runtime, or do I have to recompile the kernel ?
          cat /proc/cpuinfo |grep sse3

          It's sort weird, but i dont seem to have SSE3 on my AMD quad core, however, I think extension was there, just for licensing matters it was called something else. I wonder what is SSE4A and if it absorbs SSE3 into itself?

          Comment


          • #6
            Originally posted by dimko View Post
            cat /proc/cpuinfo |grep sse3

            It's sort weird, but i dont seem to have SSE3 on my AMD quad core, however, I think extension was there, just for licensing matters it was called something else. I wonder what is SSE4A and if it absorbs SSE3 into itself?
            It's not called "sse3" in /proc/cpuinfo. I believe the kernel calls it "pni" for "Prescott New Instructions" which was the Intel code name.

            Comment


            • #7
              Originally posted by signals View Post
              It's not called "sse3" in /proc/cpuinfo. I believe the kernel calls it "pni" for "Prescott New Instructions" which was the Intel code name.
              pni - checked!

              Is it still called PNI on Intel CPU?

              Comment


              • #8
                Originally posted by dimko View Post
                pni - checked!

                Is it still called PNI on Intel CPU?
                It is on my Core i7.

                Comment


                • #9
                  I did something like this in my kernel and the performance was really impressive.
                  At the time, I used CPUID to know if the CPU supported MMX or SSE or SSE2 and set malloc to use what was supported.

                  I hadn't any graphics driver, so drawing operations were really slow.
                  Without SSE2 I couldn't do any acceptable graphic operation, instead with SSE2 I could draw windows and move them flawlessy.

                  Probably this was due to the fact that there were only few threads and the data to move was consistent (1280 x 1024 x 4 bytes).

                  Comment


                  • #10
                    Originally posted by abral View Post
                    I did something like this in my kernel and the performance was really impressive.
                    At the time, I used CPUID to know if the CPU supported MMX or SSE or SSE2 and set malloc to use what was supported.

                    I hadn't any graphics driver, so drawing operations were really slow.
                    Without SSE2 I couldn't do any acceptable graphic operation, instead with SSE2 I could draw windows and move them flawlessy.

                    Probably this was due to the fact that there were only few threads and the data to move was consistent (1280 x 1024 x 4 bytes).
                    You did exactly what on your kernel? Replaced x86 memcopy() implementation with SSE one?

                    Comment

                    Working...
                    X