Announcement

Collapse
No announcement yet.

Improving The Linux Kernel's Memory Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • movieman
    replied
    BTW, wasn't SSE3 a compulsory part of AMD64? If so, then this would presumably go into any AMD64 kernel with no need to check processor flags.

    Leave a comment:


  • movieman
    replied
    Originally posted by Smorg View Post
    memcpy? As in... memcpy from string.h? What does this have to do with the kernel? Is there some system call that's also called memcpy?
    Copying memory around is pretty common. When I was writing video drivers there were a number of places where we used MMX for memory copies because it was faster at copying aligned data than rep mov (basically we read a bunch of memory into MMX registers from the source and then wrote them all out to the destination so we were processing 128 bytes or something at a time).

    Leave a comment:


  • liam
    replied
    Originally posted by phoronix View Post
    Phoronix: Improving The Linux Kernel's Memory Performance

    Over the past few days there's been an active discussion on the Linux kernel mailing list surrounding the memory copy (the memcpy function to copy blocks of memory) performance within the kernel. In particular, an application vendor claims to have boosted their application (a video recorder) performance by 12% when implementing an "optimized" memory copy function that takes advantage of SSE3...

    http://www.phoronix.com/vr.php?view=OTgwMQ
    Do we really want to add more x86 specific code to the kernel?
    Other than that, sounds cool. I had no idea hitting the SSE was so costly. I suppose it makes sense that they were intended for rather larger data sets, but still, hadn't occured to me.

    Leave a comment:


  • Smorg
    replied
    memcpy? As in... memcpy from string.h? What does this have to do with the kernel? Is there some system call that's also called memcpy?

    Leave a comment:


  • Drago
    replied
    Originally posted by abral View Post
    I did something like this in my kernel and the performance was really impressive.
    At the time, I used CPUID to know if the CPU supported MMX or SSE or SSE2 and set malloc to use what was supported.

    I hadn't any graphics driver, so drawing operations were really slow.
    Without SSE2 I couldn't do any acceptable graphic operation, instead with SSE2 I could draw windows and move them flawlessy.

    Probably this was due to the fact that there were only few threads and the data to move was consistent (1280 x 1024 x 4 bytes).
    You did exactly what on your kernel? Replaced x86 memcopy() implementation with SSE one?

    Leave a comment:


  • abral
    replied
    I did something like this in my kernel and the performance was really impressive.
    At the time, I used CPUID to know if the CPU supported MMX or SSE or SSE2 and set malloc to use what was supported.

    I hadn't any graphics driver, so drawing operations were really slow.
    Without SSE2 I couldn't do any acceptable graphic operation, instead with SSE2 I could draw windows and move them flawlessy.

    Probably this was due to the fact that there were only few threads and the data to move was consistent (1280 x 1024 x 4 bytes).

    Leave a comment:


  • signals
    replied
    Originally posted by dimko View Post
    pni - checked!

    Is it still called PNI on Intel CPU?
    It is on my Core i7.

    Leave a comment:


  • dimko
    replied
    Originally posted by signals View Post
    It's not called "sse3" in /proc/cpuinfo. I believe the kernel calls it "pni" for "Prescott New Instructions" which was the Intel code name.
    pni - checked!

    Is it still called PNI on Intel CPU?

    Leave a comment:


  • signals
    replied
    Originally posted by dimko View Post
    cat /proc/cpuinfo |grep sse3

    It's sort weird, but i dont seem to have SSE3 on my AMD quad core, however, I think extension was there, just for licensing matters it was called something else. I wonder what is SSE4A and if it absorbs SSE3 into itself?
    It's not called "sse3" in /proc/cpuinfo. I believe the kernel calls it "pni" for "Prescott New Instructions" which was the Intel code name.

    Leave a comment:


  • dimko
    replied
    to check it:

    Originally posted by Dylar View Post
    But if I use a prebuilt generic x86_64 kernel provided by my distro, is there a way the kernel could autodetect if my CPU has support for SSE3 at runtime, or do I have to recompile the kernel ?
    cat /proc/cpuinfo |grep sse3

    It's sort weird, but i dont seem to have SSE3 on my AMD quad core, however, I think extension was there, just for licensing matters it was called something else. I wonder what is SSE4A and if it absorbs SSE3 into itself?

    Leave a comment:

Working...
X