Announcement

Collapse
No announcement yet.

GCC 13 "-O2" Performance Being Sped Up With Enabling Small Loop Unrolling

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GCC 13 "-O2" Performance Being Sped Up With Enabling Small Loop Unrolling

    Phoronix: GCC 13 "-O2" Performance Being Sped Up With Enabling Small Loop Unrolling

    For those compiling their programs using the common "-O2" optimization level as is used for the production builds by many Linux distributions and other software vendors, small loop unrolling is being enabled at this level for GCC 13. Enabling small loop unrolling with -O2 should help the performance in some areas of modern Intel and AMD CPUs...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    This also led to a 0.9% code size improvement too.
    Larger code is not an improvement. Maybe this is an English issue, but improvement means 'better than' and larger code is not 'better than', it is *larger*, but larger isn't always an improvement.

    Comment


    • #3
      Michael The original text says 0.9% increment, which is much more logical. How could loop unrolling reduce the code size?

      Comment


      • #4
        I believe larger code also means more cache usage but with newer CPU with larger cache the 0.9% increment is probably negligible.

        Comment


        • #5
          Originally posted by willmore View Post

          Larger code is not an improvement. Maybe this is an English issue, but improvement means 'better than' and larger code is not 'better than', it is *larger*, but larger isn't always an improvement.
          It is when there's more room to do stuff. If a modern processor can do 5 operations at a time but the compiler only produces code that sends 3 operations then you can potentially produce code that's 2 operations larger with no performance impact since you're operating within the bounds of the hardware's limitations. Hence, larger is better in this instance.

          Comment


          • #6
            So is it the same as using
            Code:
            -O2 -funroll-loops --param  max-unrolled-insns=4 --param max-unroll-times=2
            on earlier versions of GCC? Looks like the default parameters for GCC 10 and 11 are 200 and 8.
            Code:
            gcc --help=param -Q | grep unroll
            Last edited by trubicoid2; 14 November 2022, 10:03 AM.

            Comment


            • #7
              do they mean the size of the resulting binary? or what does "code size" mean? surely, the programming code does not increase by compiling it

              Comment


              • #8
                Originally posted by SigHunter View Post
                do they mean the size of the resulting binary? or what does "code size" mean? surely, the programming code does not increase by compiling it
                The size of the generated instructions.

                Comment

                Working...
                X