No announcement yet.

Improving The Linux Kernel's Memory Performance

  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by Shining Arcanine View Post
    The compiler won't generate its own SSE3 assembly unless it is told to do it by the build system, so strictly speaking, he would need to recompile his kernel to get SSE3 instructions into areas where the kernel developers did not do this manually.
    Yes of course but recompiling your kernel with "sse3" have few chance to build optimized "sse3" path. The best way is to "manually" code SSE3 path: this is the subject of this article. And if that is done in the kernel ("manual" sse3 paths), you don't have to compile the kernel with "sse3" options to use this optimization.
    Last edited by whitecat; 08-18-2011, 09:38 AM.


    • #32
      Originally posted by Shining Arcanine View Post
      How is SSE4a not a SSE4 derivative if half of its instructions match SSE4 instructions in opcode, name and functionality?

      SSE was made after 3DNow, while SSE4a was made after Intel published its SSE4 extensions. The instructions provided by SSE and 3DNow do not intersect.

      I feel like these points on SSE4a not being a SSE4 derivative are derived from the following rather than any actual technical reason:
      4 instructions are common out of 54. That's 7%.

      If you're going to claim it's a derivative, that # should be AT LEAST 50%, if not higher. IMHO

      And this has nothing to do with me liking Intel - I actually root for AMD which is why SSE4a is so disappointing.


      • #33
        Originally posted by whitecat View Post
        No, AMD64 implies SSE2.
        Thanks, that must be what I was thinking of.


        • #34
          Yes, it will be able to detect it at runtime, there is a feature called CPUID that is used for this. You can also see my previous post in this thread on how the kernel determines the best implementation for raid6.