Page 2 of 3 FirstFirst 123 LastLast
Results 11 to 20 of 25

Thread: Optimizing Mesa Performance With Compiler Flags

  1. #11
    Join Date
    Apr 2011
    Posts
    38

    Default

    This change is mainly to benefit 32-bit systems where SSE support can't be assumed by default, but with the i965 driver, more often than not it can be assumed an Intel Core 2 processor or newer is in use. (The older Intel processors are generally using the i915 driver.) By setting the -march=core2 flag, for i386 builds SSE would now be used for floating-point math and cmov instructions, plus other performance optimizations.
    [...]
    This patch was ultimately rejected since it turns out there's still some old Pentium 4s that could be found in an i965 driver configuration where things might break.
    Then why not use something like -march=i686 -msse -msse2? That would enable gcc to use cmov and sse/sse2 instructions and the binaries would still run on a P4.

  2. #12
    Join Date
    Feb 2008
    Location
    Linuxland
    Posts
    5,128

    Default

    @mark

    It's mainly about the inlining. Yes, it can have that big an effect.

    C++ templates much exacerbate that effect, when you have templates calling templates calling templates, you can get thousands of pointless function calls without inlining.

  3. #13
    Join Date
    Apr 2011
    Posts
    35

    Default

    Quote Originally Posted by curaga View Post
    @mark

    It's mainly about the inlining. Yes, it can have that big an effect.

    C++ templates much exacerbate that effect, when you have templates calling templates calling templates, you can get thousands of pointless function calls without inlining.
    ok, makes sense. But shouldn't the programmer use inline functions or macros in this case?
    I guess I will add the inline parameter to my CXXFLAGs and for single C packages.

  4. #14

    Default

    Quote Originally Posted by ryao View Post
    It does not matter if this code is not a bottleneck.
    True. Modern CPUs also has bigger caches than before so I would expect the inner loop fit in cache.

    It would still be interesting to see how -Os compares.

  5. #15
    Join Date
    Apr 2011
    Posts
    35

    Default

    Quote Originally Posted by ncopa View Post
    True. Modern CPUs also has bigger caches than before so I would expect the inner loop fit in cache.

    It would still be interesting to see how -Os compares.
    I did not benchmark -Os but used it for some months instead of -O2. I felt no difference and sometimes had some segfaults that disappeared after switching back to -O2. I guess -Os is only worth looking at if you really need it and know what you are doing.

  6. #16
    Join Date
    Feb 2008
    Posts
    988

    Default

    -Os is slower in some cases, tried it now on r200 and immidiately i can see slower menus in supertuxkart: going through kart chooser for example is slugish, so no go...

    From my experiance maybe -O1 is the best for mesa stability, but safe is to just go with -O2 and -pipe that will produce smaller libraries or if you want to play with processor optimisation then add -march=blabla , but always stick with -O2 if you want and to keep driver stability.

  7. #17
    Join Date
    Aug 2012
    Posts
    245

    Default

    Quote Originally Posted by mark_ View Post
    This affects C also, it looks like a function call is replaced by the function code. This should result in less stack usage but the function has to be so simple that creating a new stack entry costs more performance than executing the function. Seems to be relatively useless.
    Actually, since the functions code would be executed anyway, you should always gain performance from avoiding the new stack entry. The main drawbacks they try to avoid are probably bigger binaries, more memory usage for very large functions.
    And it could potentially allow even more optimization with the "neighbouring" code, since it's not isolated in a function anymore. There way too many things to consider in compiler optimization.

  8. #18
    Join Date
    Apr 2012
    Posts
    118

    Default

    It would be nice to have a database/list of programs and their fastest compile flags (depending on the compiler/version of course).

  9. #19
    Join Date
    Mar 2009
    Location
    in front of my box :p
    Posts
    797

    Default

    Question is indeed if mesa is speed limiting step (aka bottleneck) in the whole system here. But it won't hurt to keep my Gentoo CFLAGS like they are. Mainly march set and -O2. In few cases I actually use -Os for VIA CPUs or AMD's old Geode LX. Few packages might dislike messing too much with CFLAGS though.

  10. #20
    Join Date
    Nov 2007
    Posts
    1,353

    Default

    My understanding is that right now the biggest bottleneck in the oss graphics stack is GEM/TTM. It needs replaced, but I don't think anybody has a good idea on what to replace it with.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •