memcpy? As in... memcpy from string.h? What does this have to do with the kernel? Is there some system call that's also called memcpy?
Announcement
Collapse
No announcement yet.
Improving The Linux Kernel's Memory Performance
Collapse
X
-
Originally posted by phoronix View PostPhoronix: Improving The Linux Kernel's Memory Performance
Over the past few days there's been an active discussion on the Linux kernel mailing list surrounding the memory copy (the memcpy function to copy blocks of memory) performance within the kernel. In particular, an application vendor claims to have boosted their application (a video recorder) performance by 12% when implementing an "optimized" memory copy function that takes advantage of SSE3...
http://www.phoronix.com/vr.php?view=OTgwMQ
Other than that, sounds cool. I had no idea hitting the SSE was so costly. I suppose it makes sense that they were intended for rather larger data sets, but still, hadn't occured to me.
Comment
-
Originally posted by Smorg View Postmemcpy? As in... memcpy from string.h? What does this have to do with the kernel? Is there some system call that's also called memcpy?
Comment
-
Originally posted by Drago View PostYou did exactly what on your kernel? Replaced x86 memcopy() implementation with SSE one?
However I wrote four different versions of memcpy (normal, MMX, SSE, SSE2) and used the best according to the CPU support.
Comment
-
-
Originally posted by Dylar View PostIt's indeed pretty interesting.
But if I use a prebuilt generic x86_64 kernel provided by my distro, is there a way the kernel could autodetect if my CPU has support for SSE3 at runtime, or do I have to recompile the kernel ?
1. Yes, there is a way for the kernel to check for SSE3 support. /cat/proc demonstrates this capability.
2. There is a distinction between -march style optimizations, rather than -mtune style optimizations. With -mtune the compiler will generate multiple versions of different code paths so that every CPU gets its own "optimized" version. With -march, the compiler assumes that these instructions are always available. To the best of my knowledge, the kernel uses -march. I have not checked the code to verify this, but I am fairly certain that if you configure your kernel compilation to build a kernel for hardware newer than what you have, it breaks when you try to run it on the older hardware, which is consistent with -march. Therefore, the kernel must be explicitly compiled for it.
Originally posted by dimko View Postcat /proc/cpuinfo |grep sse3
It's sort weird, but i dont seem to have SSE3 on my AMD quad core, however, I think extension was there, just for licensing matters it was called something else. I wonder what is SSE4A and if it absorbs SSE3 into itself?
Originally posted by dimko View Postpni - checked!
Is it still called PNI on Intel CPU?
They call it pni on AMD cpus for the same reason.
Originally posted by Smorg View Postmemcpy? As in... memcpy from string.h? What does this have to do with the kernel? Is there some system call that's also called memcpy?
Also, when writing a kernel, you need to write your own library routines, because libraries specified in the ANSI C specification are meant for userland, not kernels.
Originally posted by liam View PostDo we really want to add more x86 specific code to the kernel?
Other than that, sounds cool. I had no idea hitting the SSE was so costly. I suppose it makes sense that they were intended for rather larger data sets, but still, hadn't occured to me.
Originally posted by movieman View PostBTW, wasn't SSE3 a compulsory part of AMD64? If so, then this would presumably go into any AMD64 kernel with no need to check processor flags.Last edited by Shining Arcanine; 16 August 2011, 10:57 PM.
Comment
-
Originally posted by Shining Arcanine View PostThere is a distinction between -march style optimizations, rather than -mtune style optimizations. With -mtune the compiler will generate multiple versions of different code paths so that every CPU gets its own "optimized" version. With -march, the compiler assumes that these instructions are always available.
"-mtune=cpu-type - Tune to cpu-type everything applicable about the generated code, except for the ABI and the set of available instructions."
So no multiple code paths or anything. That's what the Intel compiler does. GCC doesn't provide that functionality.
Comment
Comment