Announcement

Collapse
No announcement yet.

Greater Radeon Gallium3D Shader Optimization Tests

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by AnonymousCoward View Post
    No, I'm running the native port. I may well test the Wine version to compare, though.
    Just tested with Wine. The default GLSL backend (in Wine) was crashing and had broken water output, so I used the ARB backend (which also tends to be faster):

    Wine ARB + SB = 32.24 FPS = 49% improvement

    So faster again, but very minor graphical glitches from what I can tell. Keep in mind there may be engine differences as well. Perhaps the Linux version is using a newer version of the HL2 engine, although graphically they don't look dramatically different.

    EDIT: Upon closer investigation, I do think that the Linux version is using a newer engine. I believe that it is actually using HDR for one, whereas the older engine just reports but doesn't use it.

    So I disabled HDR, Bloom and motion blur, and ran it again:

    Native Linux SB = 30.66 FPS

    That might not be enough for it to be comparable, however. I think we'll have to wait for the new engine to make it into the Windows version for these to be directly comparable. But yeah, the engine differences limit these results, obviously. Probably should have checked more closely in the first place.
    Last edited by AnonymousCoward; 17 May 2013, 10:49 AM.

    Comment


    • #22
      Alright, last post for now. You can use the newer engine in Windows by opting into the SteamPipe Beta. A quick bench with Wine showed that performance was about the same as Linux, so that was probably the difference. Haven't tested on native Windows, but I suspect it will still be a lot faster.
      Last edited by AnonymousCoward; 17 May 2013, 11:22 AM.

      Comment


      • #23
        Originally posted by AnonymousCoward View Post
        Alright, last post for now. You can use the newer engine in Windows by opting into the SteamPipe Beta. A quick bench with Wine showed that performance was about the same as Linux, so that was probably the difference. Haven't tested on native Windows, but I suspect it will still be a lot faster.
        You might be bumping into something that is using non-accelerated functionality.
        I'd start with tests such as Q3 or first half-life(the original clients for lin and win).
        I am also sure, Valve could answer a few questions...

        Comment


        • #24
          Originally posted by AnonymousCoward View Post
          Alright, last post for now. You can use the newer engine in Windows by opting into the SteamPipe Beta. A quick bench with Wine showed that performance was about the same as Linux, so that was probably the difference. Haven't tested on native Windows, but I suspect it will still be a lot faster.
          Sorry, if you still can repeat the tests, could you try to force CPU governor to "performance" instead of "ondemand", because of this. I also hope you set GPU to profile/high before testing? A lot of nuances are still not ironed out as you see... We seriously need a power logic that does this by itself.

          Comment


          • #25
            Originally posted by brosis View Post
            Sorry, if you still can repeat the tests, could you try to force CPU governor to "performance" instead of "ondemand", because of this. I also hope you set GPU to profile/high before testing? A lot of nuances are still not ironed out as you see... We seriously need a power logic that does this by itself.
            I used performance, but to be fair ondemand hasn't shown any decrements on my particular system. I think because the HL2 games don't actually use that much CPU, or that ondemand actually works properly on my system for whatever reason.

            As for GPU profiles, I used DynPM and previous benchmarks did not differ from "High" profile, but good point, I'll retest to be certain.

            I did run a quick Windows 8 benchmark with SteamPipe, and performance was around 73/74 fps. So Linux looks a bit better when both systems use the new engine.

            I suspect it's as you say and software fallbacks are being used. I wouldn't be surpised if they are using some extensions that are just not supported on the Open Source ATI drivers, and perhaps not with FGLRX either (i.e currently nvidia specific extensions).

            I may run some more benchmarks, but I'm going to be quite busy from tommorrow until next Saturday, so might not have any time after today. Anyone who owns the Orange Box can run this bench, BTW, if you wish to check for yourself. It's in the Half Life complete collection as well, although it currently looks pretty expensive (40 USD).

            EDIT: Just ran with performance and "High" GPU profile and there was no change (HDR, Motion Blur enabled):

            R600 SB = 26.19 FPS

            I checked the commandline output and it reports a number of extensions that it doesn't support, including a summary at the end that is presumably the key extensions needed for the game:

            GL_NV_bindless_texture: DISABLED
            GL_AMD_pinned_memory: DISABLED
            GL_EXT_texture_sRGB_decode: AVAILABLE
            GL_NVX_gpu_memory_info: UNAVAILABLE
            GL_ATI_meminfo: UNAVAILABLE
            GL_MAX_SAMPLES_EXT: 8

            I seem to remember 'pinned memory' being a performance (but also stability issue) with FGLRX when enabled, but I believe it does increase performance when working correctly.
            Last edited by AnonymousCoward; 17 May 2013, 10:56 PM.

            Comment


            • #26
              Originally posted by AnonymousCoward View Post
              As for GPU profiles, I used DynPM and previous benchmarks did not differ from "High" profile, but good point, I'll retest to be certain.
              The other thing to check is what the final clock rate is when in high profile. I think some of the amd integrated gpus are still limited to slower speeds until they get better PM, although i'm not sure if that includes your hardware or not.

              Comment


              • #27
                Originally posted by smitty3268 View Post
                The other thing to check is what the final clock rate is when in high profile. I think some of the amd integrated gpus are still limited to slower speeds until they get better PM, although i'm not sure if that includes your hardware or not.
                Yes, this might be it. /sys/kernel/debug/dri/0/radeon_pm_info shows this when 'High' profile is enabled:

                Code:
                default engine clock: 200000 kHz
                current engine clock: 200000 kHz
                default memory clock: 800000 kHz
                However, "rovclock -i" shows lots of different frequencies when I run it multiple times, up to about 600+ mhz (close to its maximum frequency) so I'm not sure if radeon_pm_info is not detecting the changed frequency, or rovclock is wrong:

                Code:
                Video BIOS signature not found.
                Invalid reference clock from BIOS: 387.96 MHz
                
                Video BIOS signature not found.
                Invalid reference clock from BIOS: 618.57 MHz
                Perhaps it's just random noise as it doesn't even seem to be associated with GPU load. Is there a better utililty/sysfs entry to look at?

                Comment


                • #28
                  AnonymousCoward here: I think that this is the issue, the performance difference certainly makes sense. I'm not sure why it didn't occur to me actually, as I remember nouveau having the same isssue. I tried to force "Low" profile to test, and it doesn't seem to lower performance. So either the driver is severely bottlenecking the card, or power management is broken. Here's radeontop output in case that is helpful. I suspect there would be high CPU load rather than GPU load if the driver was bottlenecking the card (cpu load is relatively low):

                  Code:
                   radeontop v0.6-4-g244c88e, running on ARUBA, 120 samples/sec                                                                            
                                                                                                                             |
                                                                                                       Graphics pipe 100.00% |                                                                                                       
                  -----------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------
                                                                                                        Event Engine   0.00% |
                                                                                                                             |
                                                                                         Vertex Grouper + Tesselator  55.00% |                                                        
                                                                                                                             |
                                                                                                   Texture Addresser  90.00% |                                                                                            
                                                                                                                             |
                                                                                                       Shader Export  96.67% |                                                                                                   
                                                                                         Sequencer Instruction Cache  94.17% |                                                                                                
                                                                                                 Shader Interpolator 100.00% |                                                                                                       
                                                                                                                             |
                                                                                                      Scan Converter  99.17% |                                                                                                      
                                                                                                  Primitive Assembly  56.67% |                                                          
                                                                                                                             |
                                                                                                         Depth Block  98.33% |                                                                                                     
                                                                                                         Color Block  94.17% |

                  Comment


                  • #29
                    One more bit of info: Steam is detecting only 268.44 MB of VRAM, when in theory it should be supporting 512MB. This could account for some of the performance issue, especially since I run games at 1920x1080. This thread discusses the issue: http://steamcommunity.com/app/221410...8532588748333/

                    EDIT: Nevermind, probably just an issue with the way Steam detects available ram according to that thread.
                    Last edited by AnonymousCoward; 18 May 2013, 05:31 AM.

                    Comment


                    • #30
                      Mobile APUs are definitely clock-limited, no question about it. You have to modify the kernel to workaround. Delete this if-block.

                      Comment

                      Working...
                      X