Announcement

Collapse
No announcement yet.

Optimizing Mesa Performance With Compiler Flags

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    This change is mainly to benefit 32-bit systems where SSE support can't be assumed by default, but with the i965 driver, more often than not it can be assumed an Intel Core 2 processor or newer is in use. (The older Intel processors are generally using the i915 driver.) By setting the -march=core2 flag, for i386 builds SSE would now be used for floating-point math and cmov instructions, plus other performance optimizations.
    [...]
    This patch was ultimately rejected since it turns out there's still some old Pentium 4s that could be found in an i965 driver configuration where things might break.
    Then why not use something like -march=i686 -msse -msse2? That would enable gcc to use cmov and sse/sse2 instructions and the binaries would still run on a P4.

    Comment


    • #12
      @mark

      It's mainly about the inlining. Yes, it can have that big an effect.

      C++ templates much exacerbate that effect, when you have templates calling templates calling templates, you can get thousands of pointless function calls without inlining.

      Comment


      • #13
        Originally posted by curaga View Post
        @mark

        It's mainly about the inlining. Yes, it can have that big an effect.

        C++ templates much exacerbate that effect, when you have templates calling templates calling templates, you can get thousands of pointless function calls without inlining.
        ok, makes sense. But shouldn't the programmer use inline functions or macros in this case?
        I guess I will add the inline parameter to my CXXFLAGs and for single C packages.

        Comment


        • #14
          Originally posted by ryao View Post
          It does not matter if this code is not a bottleneck.
          True. Modern CPUs also has bigger caches than before so I would expect the inner loop fit in cache.

          It would still be interesting to see how -Os compares.

          Comment


          • #15
            Originally posted by ncopa View Post
            True. Modern CPUs also has bigger caches than before so I would expect the inner loop fit in cache.

            It would still be interesting to see how -Os compares.
            I did not benchmark -Os but used it for some months instead of -O2. I felt no difference and sometimes had some segfaults that disappeared after switching back to -O2. I guess -Os is only worth looking at if you really need it and know what you are doing.

            Comment


            • #16
              -Os is slower in some cases, tried it now on r200 and immidiately i can see slower menus in supertuxkart: going through kart chooser for example is slugish, so no go...

              From my experiance maybe -O1 is the best for mesa stability, but safe is to just go with -O2 and -pipe that will produce smaller libraries or if you want to play with processor optimisation then add -march=blabla , but always stick with -O2 if you want and to keep driver stability.

              Comment


              • #17
                Originally posted by mark_ View Post
                This affects C also, it looks like a function call is replaced by the function code. This should result in less stack usage but the function has to be so simple that creating a new stack entry costs more performance than executing the function. Seems to be relatively useless.
                Actually, since the functions code would be executed anyway, you should always gain performance from avoiding the new stack entry. The main drawbacks they try to avoid are probably bigger binaries, more memory usage for very large functions.
                And it could potentially allow even more optimization with the "neighbouring" code, since it's not isolated in a function anymore. There way too many things to consider in compiler optimization.

                Comment


                • #18
                  It would be nice to have a database/list of programs and their fastest compile flags (depending on the compiler/version of course).

                  Comment


                  • #19
                    Question is indeed if mesa is speed limiting step (aka bottleneck) in the whole system here. But it won't hurt to keep my Gentoo CFLAGS like they are. Mainly march set and -O2. In few cases I actually use -Os for VIA CPUs or AMD's old Geode LX. Few packages might dislike messing too much with CFLAGS though.

                    Comment


                    • #20
                      My understanding is that right now the biggest bottleneck in the oss graphics stack is GEM/TTM. It needs replaced, but I don't think anybody has a good idea on what to replace it with.

                      Comment

                      Working...
                      X