Page 1 of 4 123 ... LastLast
Results 1 to 10 of 34

Thread: Improving The Linux Kernel's Memory Performance

  1. #1
    Join Date
    Jan 2007
    Posts
    15,611

    Default Improving The Linux Kernel's Memory Performance

    Phoronix: Improving The Linux Kernel's Memory Performance

    Over the past few days there's been an active discussion on the Linux kernel mailing list surrounding the memory copy (the memcpy function to copy blocks of memory) performance within the kernel. In particular, an application vendor claims to have boosted their application (a video recorder) performance by 12% when implementing an "optimized" memory copy function that takes advantage of SSE3...

    http://www.phoronix.com/vr.php?view=OTgwMQ

  2. #2
    Join Date
    Dec 2009
    Posts
    281

    Default Very interesting

    Very interesting!
    Thank you!

  3. #3
    Join Date
    Aug 2011
    Posts
    2

    Default

    It's indeed pretty interesting.
    But if I use a prebuilt generic x86_64 kernel provided by my distro, is there a way the kernel could autodetect if my CPU has support for SSE3 at runtime, or do I have to recompile the kernel ?

  4. #4
    Join Date
    Dec 2009
    Posts
    281

    Default

    Quote Originally Posted by Dylar View Post
    But if I use a prebuilt generic x86_64 kernel provided by my distro, is there a way the kernel could autodetect if my CPU has support for SSE3 at runtime, or do I have to recompile the kernel ?
    Most likely, it's already doing it. Most likely your kernel already has support for SSE3, etc. Programs that are designed to take advantage of SSE3 will do so.
    Before, memcopy() function did magix of copying stuff, however, if I understood article correct, they want to use SSE3 for copying something big, which will give rather nice boost.

    But i am no programmer unfortunately.

  5. #5
    Join Date
    Dec 2009
    Posts
    281

    Default to check it:

    Quote Originally Posted by Dylar View Post
    But if I use a prebuilt generic x86_64 kernel provided by my distro, is there a way the kernel could autodetect if my CPU has support for SSE3 at runtime, or do I have to recompile the kernel ?
    cat /proc/cpuinfo |grep sse3

    It's sort weird, but i dont seem to have SSE3 on my AMD quad core, however, I think extension was there, just for licensing matters it was called something else. I wonder what is SSE4A and if it absorbs SSE3 into itself?

  6. #6
    Join Date
    May 2010
    Posts
    20

    Default

    Quote Originally Posted by dimko View Post
    cat /proc/cpuinfo |grep sse3

    It's sort weird, but i dont seem to have SSE3 on my AMD quad core, however, I think extension was there, just for licensing matters it was called something else. I wonder what is SSE4A and if it absorbs SSE3 into itself?
    It's not called "sse3" in /proc/cpuinfo. I believe the kernel calls it "pni" for "Prescott New Instructions" which was the Intel code name.

  7. #7
    Join Date
    Dec 2009
    Posts
    281

    Default

    Quote Originally Posted by signals View Post
    It's not called "sse3" in /proc/cpuinfo. I believe the kernel calls it "pni" for "Prescott New Instructions" which was the Intel code name.
    pni - checked!

    Is it still called PNI on Intel CPU?

  8. #8
    Join Date
    May 2010
    Posts
    20

    Default

    Quote Originally Posted by dimko View Post
    pni - checked!

    Is it still called PNI on Intel CPU?
    It is on my Core i7.

  9. #9
    Join Date
    Aug 2011
    Posts
    30

    Default

    I did something like this in my kernel and the performance was really impressive.
    At the time, I used CPUID to know if the CPU supported MMX or SSE or SSE2 and set malloc to use what was supported.

    I hadn't any graphics driver, so drawing operations were really slow.
    Without SSE2 I couldn't do any acceptable graphic operation, instead with SSE2 I could draw windows and move them flawlessy.

    Probably this was due to the fact that there were only few threads and the data to move was consistent (1280 x 1024 x 4 bytes).

  10. #10
    Join Date
    Aug 2009
    Location
    Russe, Bulgaria
    Posts
    542

    Default

    Quote Originally Posted by abral View Post
    I did something like this in my kernel and the performance was really impressive.
    At the time, I used CPUID to know if the CPU supported MMX or SSE or SSE2 and set malloc to use what was supported.

    I hadn't any graphics driver, so drawing operations were really slow.
    Without SSE2 I couldn't do any acceptable graphic operation, instead with SSE2 I could draw windows and move them flawlessy.

    Probably this was due to the fact that there were only few threads and the data to move was consistent (1280 x 1024 x 4 bytes).
    You did exactly what on your kernel? Replaced x86 memcopy() implementation with SSE one?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •