Announcement

Collapse
No announcement yet.

A Linux Compiler Deathmatch

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    [QUOTE=Jimbo;171652]One of the main features of a compiler is to produce assembler code to a specific CPU using the available instruction set in an optimal way. Other main features is to use some generic implemented optimizations to produce faster executables, like -O3, LTO (GCC) , ip (intel compiler). On those features is were much development efforts are used.

    Benchmarking plain x86 code has not much sense in a compiler benchmarking article.
    We agree to disagree. Michael and I have had discussions with a commercial compiler vendor about the defaults and compiler structure on their product. They accepted that the default is a critical entry bar for people, tuning for peak performance comes as a second order.

    Most people do not go and tune all 5 compilers to maximum performance. They choose the one that is in the order of what they want to see and then tune from there.

    Again, the option is for someone, anyone, to take one test and one benchmark from the article, provide a tuning guide to maximize the performance for that one test, one benchmark to show the benefit of the extra effort tuning.

    Anyone?

    Comment


    • #22
      [QUOTE=mtippett;171653]
      Originally posted by Jimbo View Post
      One of the main features of a compiler is to produce assembler code to a specific CPU using the available instruction set in an optimal way. Other main features is to use some generic implemented optimizations to produce faster executables, like -O3, LTO (GCC) , ip (intel compiler). On those features is were much development efforts are used.



      We agree to disagree. Michael and I have had discussions with a commercial compiler vendor about the defaults and compiler structure on their product. They accepted that the default is a critical entry bar for people, tuning for peak performance comes as a second order.

      Most people do not go and tune all 5 compilers to maximum performance. They choose the one that is in the order of what they want to see and then tune from there.

      Again, the option is for someone, anyone, to take one test and one benchmark from the article, provide a tuning guide to maximize the performance for that one test, one benchmark to show the benefit of the extra effort tuning.

      Anyone?
      As seen on my latest merge (-Os comming soon... need to run the tests which I do over night the nights when I am not sleeping at my GFs place), different compilers respond differently to optimization flags.

      To then choose the basic performance as a starting point and then move onward from there may lead to that you get a sub-optimal end result. Someone mentioned Engineering and optimization, and I would here like to lend my vote to multivariate approaches as the best ones. This was nicely demonstrated to me during my chemometrics classes back in 2001 or something, where you might find a local optimum while moving along one variable and then when you start moving along the next variable you find another local optimum and so on... the point is that the optimum you end up with with this one-variable-at-a-time approach is actually not very optimal. When the end result depends on in which order you optimize your variables you can be pretty sure that there is a real optimum that you are missing...

      Comment


      • #23
        Originally posted by mtippett View Post
        They accepted that the default is a critical entry bar for people, tuning for peak performance comes as a second order.
        I accept that, but writing a compiler article and not investigate a little the main features those compilers are offering to optimize a executable, and how those features affect the code is a vague article under my point of view.

        I know there are a lot of flags, so a bunch of combinations arise, the point is that some of those flags are generic and vastly used, for example, on ICC, -fast flag enables the major optimizations options: -ip -O3 and -static, which really makes a difference over defaults. On GCC is easy to find the default optimizations flags (gentoo forums)...

        Comment


        • #24
          @staalmannen

          I am not sure that -march=core2 is corrected handled by ICC, so maybe you are generating core2 code on some compilers and generic x86 code on the rest . On ICC I recommend u use -xSSSE3 -fast flags as default flags to optimize code for speed on core2.

          Comment


          • #25
            Originally posted by Jimbo View Post
            @staalmannen

            I am not sure that -march=core2 is corrected handled by ICC, so maybe you are generating core2 code on some compilers and generic x86 code on the rest . On ICC I recommend u use -xSSSE3 -fast flags as default flags to optimize code for speed on core2.

            This is indeed in the plans in the last round of "compiler specific stuff".
            In this one I am also adding flags to PCC and run GCC with LTO.
            If anyone got TCC-specific optimizations or any other tips about this compiler, I am all ears.

            First, I need to finish the -Os runs though.

            Comment


            • #26
              Originally posted by pvtcupcakes View Post
              99% of users use the default GCC configuration. Do you think anyone running Ubuntu packages is going to have -funroll-loops an -march=core2? Only loony Gentoo users like myself use anything but the default.

              Even most users who compile a package or two themselves don't override the CFLAGS in the Makefile.
              That is bullshit.
              You want to compare compilers not makefiles then do just that compare compilers.

              And with the car analogy. Yeah you (mtippett) only compare cars via looking only at the first speed and doing nothing else.
              Yet this is exactly what Michael did.

              @mtippett no I am certainly not doing your job. What I am doing is using generic stuff like O2 when compiling something myself. That is often a good trade-off to increase performance without investing much time in the gcc manual. And believe it or not the differnence can be quite huge depending on the code.

              Comment


              • #27
                [QUOTE=staalmannen;171656]
                Originally posted by mtippett View Post
                To then choose the basic performance as a starting point and then move onward from there may lead to that you get a sub-optimal end result. Someone mentioned Engineering and optimization, and I would here like to lend my vote to multivariate approaches as the best ones. This was nicely demonstrated to me during my chemometrics classes back in 2001 or something, where you might find a local optimum while moving along one variable and then when you start moving along the next variable you find another local optimum and so on... the point is that the optimum you end up with with this one-variable-at-a-time approach is actually not very optimal. When the end result depends on in which order you optimize your variables you can be pretty sure that there is a real optimum that you are missing...
                I agree that there is a risk that you may run into local minima (maxima) there are probably lots of them. However, for most people they are looking for an ROI scenario and rely on the upstream experts (distribution/compiler developers/etc..) to do the majority of the tuning.

                In your particular case, I applaud the result set that you are building. But as you can see in the result set, there are some workloads that show no meaningful difference between the optimization level (gcc-64; bullet physics; 1000 convex) or degradation in the high optimizations (open64; bullet; 1000 convex).

                Ultimately it comes down to the workload that you want to use and examining that carefully.

                Comment


                • #28
                  Originally posted by mat69 View Post
                  @mtippett no I am certainly not doing your job. What I am doing is using generic stuff like O2 when compiling something myself. That is often a good trade-off to increase performance without investing much time in the gcc manual. And believe it or not the differnence can be quite huge depending on the code.
                  Doing broad optimizations is more of an intellectual exercise than anything meaningful. -O, -O2, -O3 are a facade over a collection of options. Each option may or may not have benefit to a particular workload.

                  Most of us rely on the upstream people making intelligent choices in the defaults. -O optimization is just that too.

                  The request is to focus on just a particular workload, that you care about and then present that tuned to it's maximum. Yes, those numbers will be great. However, the numbers against other workloads will most likely be similar or worse than the untuned.

                  Comment


                  • #29
                    Originally posted by mtippett View Post
                    Doing broad optimizations is more of an intellectual exercise than anything meaningful. -O, -O2, -O3 are a facade over a collection of options. Each option may or may not have benefit to a particular workload.

                    Most of us rely on the upstream people making intelligent choices in the defaults. -O optimization is just that too.

                    The request is to focus on just a particular workload, that you care about and then present that tuned to it's maximum. Yes, those numbers will be great. However, the numbers against other workloads will most likely be similar or worse than the untuned.
                    Why do a "A Linux Compiler Deathmatch" then?
                    Especially as upstream often only cares about few compilers and their respective settings.

                    Btw. in a Deathmatch I suppose that everyone is trying its best ...

                    Comment


                    • #30
                      Originally posted by mat69 View Post
                      Why do a "A Linux Compiler Deathmatch" then?
                      Especially as upstream often only cares about few compilers and their respective settings.

                      Btw. in a Deathmatch I suppose that everyone is trying its best ...
                      FTA Started by one of our readers more than a week ago was a compiler deathmatch for comparing the performance of GCC, LLVM Clang, PCC (the Portable C Compiler), TCC (Tiny C Compiler), and Intel's C Compiler under Arch Linux. This user did not stop there with compiling these different x86_64 code compilers, but he also went on to look at the compiler performance with different compiler flags, among other options. The results are definitely worth looking at and here are some more.



                      Michael reported on the results that staalmannen was finding. staalmannen is already going further with the -O? options, but then there is still a huge number of other options that can be used for determining other areas where optimization can be improved.

                      If people close down on to a few metrics and a few conditions, we sure can have a deathmatch.

                      Comment

                      Working...
                      X