Announcement

Collapse
No announcement yet.

Reducing The CPU Usage In Mesa To Improve Performance

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    I barely know anything about all of this, but... Mesa isn't optimized for SSE2? Even in 64 bit builds? One would think this is one of the first things you'd do :/
    That's news to me...

    Comment


    • #32
      Maybe you can get early access to Civilization: Beyond Earth, and some donated hardware. The game will definitely not be playable without binary drivers but wishful thinking...wishful

      Comment


      • #33
        i am wondering why audio isnt disabled for opengl benchmarks o.0

        Comment


        • #34
          Originally posted by tarceri View Post
          This is a long shot but does changing the first if statment from:

          if (aligned_count >= 4) {

          to

          if (aligned_count >= 8) {

          help at all?
          The same with that .

          Originally posted by asdfblah View Post
          I barely know anything about all of this, but... Mesa isn't optimized for SSE2? Even in 64 bit builds? One would think this is one of the first things you'd do :/
          That's news to me...
          One is build time compiler optimization, another is runtime mesa optimization.

          Originally posted by looserouting View Post
          i am wondering why audio isnt disabled for opengl benchmarks o.0
          Why so? Likely you will get same/similar results, it is better to disable sound driver at boot time if you wish just to compare sound driver overhead . It can be tramendous difference even in games if there is CPU overhead, i have 25% better fps rate in Xonotic 64bit, and 30% better fps rate in Xonotic 32bit just with the sound driver disabled .

          But Xonotic is special case it use plain alsa for sound, no openal... so it can't be optimized
          Last edited by dungeon; 28 October 2014, 04:05 AM.

          Comment


          • #35
            @dungeon

            How exactly are you running the tux racer as a benchmark? I've tried:

            etr -a

            on fedora 20 but it doesn't do anything it just opens the game as normal.

            Update: Also how are you measuring a drop in cpu? I just ran it in callgrind completing the first level and its saying only 0.04% of cpu is spent in the function the patch optimises. Its was called only 1200 time compared to millions of times in OpenArena.
            Last edited by tarceri; 28 October 2014, 05:01 AM.

            Comment


            • #36
              Originally posted by asdfblah View Post
              I barely know anything about all of this, but... Mesa isn't optimized for SSE2? Even in 64 bit builds? One would think this is one of the first things you'd do :/
              That's news to me...
              Just because SSE2 is enabled doesnt always mean gcc will know when its best to use it.

              Comment


              • #37
                Originally posted by tarceri View Post
                Just because SSE2 is enabled doesnt always mean gcc will know when its best to use it.
                So there's a performance difference between compiling with -march=native and the SSE2 patch(es) ? If so: Are there any plans for SSE3, SSE4a and/or 3Dnow! patches (could be good for AMD CPUs) ?

                Comment


                • #38
                  Originally posted by tarceri View Post
                  @dungeon

                  How exactly are you running the tux racer as a benchmark? I've tried:

                  etr -a

                  on fedora 20 but it doesn't do anything it just opens the game as normal.

                  Update: Also how are you measuring a drop in cpu? I just ran it in callgrind completing the first level and its saying only 0.04% of cpu is spent in the function the patch optimises. Its was called only 1200 time compared to millions of times in OpenArena.
                  Benchmarking by eye GALLIUM_HUD=cpu,fps Somehow benchmark does not work it seems in 0.6 etr (it is more like ppracer, but not extremetuxracer), and 0.5, 0.4 extremetuxracer... i much like planetpenguinracer 0.3... huh, how many forks that tuxracer had

                  Originally posted by TAXI View Post
                  So there's a performance difference between compiling with -march=native and the SSE2 patch(es) ? If so: Are there any plans for SSE3, SSE4a and/or 3Dnow! patches (could be good for AMD CPUs) ?
                  3Dnow! is abandonware as of 2010. i think, it is not recommended to use at any new code... maybe it is now right time to remove that hardly ever halfly broken instruction from mesa

                  Sometimes it is the best to remove all that optimization from mesa --disable-asm and hackete hack configure.ac to remove sse4.1 support... yeah sometimes you can see better performace if you don't build that
                  Last edited by dungeon; 28 October 2014, 06:00 AM.

                  Comment


                  • #39
                    Originally posted by TAXI View Post
                    So there's a performance difference between compiling with -march=native and the SSE2 patch(es) ? If so: Are there any plans for SSE3, SSE4a and/or 3Dnow! patches (could be good for AMD CPUs) ?
                    If anyones interested I just got sent an email pointing me to this post [1] about auto-vectorization in gcc i.e automated use of SSE/AVX.

                    There are likely to be at most 3 targets for this particular optimisation:

                    SSE2 - Because its common and can be assumed by default in 64-bit builds.

                    SSE4.1 - Because it includes min/max instructions which means its faster as thats the main thing the function does.

                    AVX2 - Because it has min/max instructions that can compare 8 values at once rather than 4 in SSE4.1

                    Also for anyone curious here is a list of the intrinsics available [2]

                    [1] http://locklessinc.com/articles/vectorize/
                    [2] https://software.intel.com/sites/lan...trinsicsGuide/

                    Comment


                    • #40
                      Originally posted by dungeon View Post
                      Benchmarking by eye GALLIUM_HUD=cpu,fps Somehow benchmark does not work it seems in 0.6 etr (it is more like ppracer, but not extremetuxracer), and 0.5, 0.4 extremetuxracer... i much like planetpenguinracer 0.3... huh, how many forks that tuxracer had
                      To use your own words, I'm not impressed by that benchmarking technique As I said previously the function in question seems to be hardly used at all in extremetuxracer so if you are really seeing a difference (which would be very difficult to conclude because your not really comparing anything) its very unlikely that its caused by the patch.

                      Originally posted by dungeon View Post
                      Sometimes it is the best to remove all that optimization from mesa --disable-asm and hackete hack configure.ac to remove sse4.1 support... yeah sometimes you can see better performace if you don't build that
                      I'm not trying to be rude here but I seriously think you need to try some real benchmarks before handing out this advice or providing feedback on patches.

                      Also I'm pretty sure sse4.1 support in mesa is only currently use to build one function which is used in the intel driver, so its highly unlikely that removing this from configure.ac will do anything at all since you are using AMD hardware.

                      Comment

                      Working...
                      X