Announcement

Collapse
No announcement yet.

Linux - CONFIG_CC_OPTIMIZE_FOR_SIZE

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Linux - CONFIG_CC_OPTIMIZE_FOR_SIZE

    With the Gentoo benchmark, I was wondering if a full test could be done comparing the performance between a kernel with CONFIG_CC_OPTIMIZE_FOR_SIZE=y and one with CONFIG_CC_OPTIMIZE_FOR_SIZE=n.

    There have been myths whether throughput increases by building the kernel for size as CPUs with small L2 cache banks would benefit from the reduced executable code, but the code itself should be less optimized in general. Linus himself suggests that optimize for size should be used unless your application uses MMX or floating points -> http://gcc.gnu.org/ml/gcc/2001-07/msg01543.html

    If L2 thrashing can be reduced by building with -Os, wouldn't user space applications in general execute faster?

  • #2
    Originally posted by damentz View Post
    With the Gentoo benchmark, I was wondering if a full test could be done comparing the performance between a kernel with CONFIG_CC_OPTIMIZE_FOR_SIZE=y and one with CONFIG_CC_OPTIMIZE_FOR_SIZE=n.

    There have been myths whether throughput increases by building the kernel for size as CPUs with small L2 cache banks would benefit from the reduced executable code, but the code itself should be less optimized in general. Linus himself suggests that optimize for size should be used unless your application uses MMX or floating points -> http://gcc.gnu.org/ml/gcc/2001-07/msg01543.html

    If L2 thrashing can be reduced by building with -Os, wouldn't user space applications in general execute faster?
    Actually some testing on this was done by novell's partners and it was found that overall in general optimize for size actually hurt overall performance and as such was disabled in their corporate solutions and as well with openSUSE 11.2.

    Comment


    • #3
      This is the relevant part of the discussion when I proposed a separate -desktop flavor of the kernel in openSUSE 11.2 instead of a "one-size fits all" kernel.


      #33: Nick Piggin (npiggin) (2009-07-20 18:05) [reply]I'm actually of the opinion that we should disable optimize for size in our server kernel as well. I will try to recall the particular sles bug report I have with some numbers, but we have an ISV customer doing some virtual memory intensive workloads (basically mmap/page fault/munmap) and they found their real world performance is improved very significantly by using -O2 in SLES11. I can't remember exactly, but it is several 10s of % IIRC.
      The reasoning for -Os in the kernel has seemed a bit flawed to me (as I have written other times before). icache issues are almost no different in userspace applications or libraries. There will always be various combinations of uncommon, common, large, small code being run -- the gcc guys are presumably always trying to make good tradeoffs based on that, and "performance" for them is including icache misses. Specifying -Os would seem to tell gcc that we care more about just binary size rather than actual performance.
      If the kernel has commonly used code, we absolutely want it to be optimized as highly as possible. Uncommonly used code sure would be nice to reduce in size, but if it is uncommonly executed then by definition it should have smaller (temporal) icache footprint.
      Now I don't have any numbers or reason to believe -O2 should lead in the desktop flavour -- unless like a staging step to enable it in the server flavour. I don't know of desktop workload where the kernel is going to be very costly, but actually I don't really profile 3d rendering which is one thing that might benefit from -O2. If anyone is gathering these kinds of framerate numbers, then it would be very interesting to test the difference between -Os and -O2.

      • #35: Nick Piggin (npiggin) (2009-07-21 12:06) [reply] OK it is SLES bug 482887 . ISV reports VM intensive microbenchmark slows down by about 45%, and real world (for them) performance by 10-20% by using -Os rather than -O2 in SLES11 kernel.

        • #36: Nick Piggin (npiggin) (2009-07-22 08:36) [reply]We have another result from a hardware vendor showing an important database workload is actually improving by 1% (system-wide throughput) by compiling kernel with -O2 rather than -Os. Their result is a little sparse on details, and I can't share some details to public, but I think it is a meaningful result.

          • #37: Takashi Iwai (tiwai) (2009-07-22 16:15) [reply]IIRC, the decision for -Os was due to a significant performance difference on PowerPC a few years ago. There was little difference between -Os and -O2 on x86, thus we chose -Os.
            But, your current number is much more convincing. We should go for -O2 indeed.

      Comment


      • #4
        It would be nice to see some modern hard numbers though, GCC 4.4.2 would be an excellent start.

        Comment

        Working...
        X