Announcement

Collapse
No announcement yet.

Greater Radeon Gallium3D Shader Optimization Tests

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by brosis View Post
    Paypal: vadimgirlin at gmail dot com
    I will be donating a bit later this month, the guy clearly needs better GPU card xD.

    Vadim's patches are awesome, radeon starts to match and even outperform fglrx !
    Hmmm, maybe we should be donating towards an even worse card if its optimisations we are after

    Disclaimer: The above comment is a joke any response that fails to recognise this will be ignored by its author.

    Comment


    • #17
      I've been running the video stress test with Half Life 2: Lost Coast and am getting some improvement with the SB backend, using my A10-4600M APU. These are an average of 3 replications for each configuration to minimise variance:

      Default backend: 21.62

      LLVM backend: 23.99 fps (graphical glitches for the helicopter) = 11% improvement

      SB backend: 26.31 = 21% improvement

      SB + LLVM: 25.74 = 19% improvement

      Windows 8: 88.52 fps = 409% improvement

      Well, some nice improvements for both of the alternative backends. I also found that the default backend would randomly run much slower than normal, resulting in about 18 fps instead of 21 or so; I removed them from the average.

      Unfortunately, these results show that the open source drivers are still a lot slower than Windows, although perhaps this is a weakness with the Linux port specifically.

      I also tried to run the bench with FGLRX, but I couldn't get it to work. I just get a black screen when starting the game. Probably a configuration issue on my end, though. Portal doesn't work with FGLRX either but I had it working before.

      Comment


      • #18
        Originally posted by AnonymousCoward View Post
        I've been running the video stress test with Half Life 2: Lost Coast and am getting some improvement with the SB backend, using my A10-4600M APU. These are an average of 3 replications for each configuration to minimise variance:

        Default backend: 21.62
        ...
        SB backend: 26.31

        Windows 8: 88.52
        I suggest some of the used functionality is still falling back to CPU. Would surely be nice if we could trace it :/
        Maybe it would be a good idea to test an array of applications (depending on GL level) on both platforms and report the deficiencies found.

        Comment


        • #19
          Do you use Wine to run HL2 Lost Coast ? (HL2 is ported to Linux, but I'm not sure for Lost Coast)

          Comment


          • #20
            Originally posted by vljn View Post
            Do you use Wine to run HL2 Lost Coast ? (HL2 is ported to Linux, but I'm not sure for Lost Coast)
            No, I'm running the native port. I may well test the Wine version to compare, though.

            Comment


            • #21
              Originally posted by AnonymousCoward View Post
              No, I'm running the native port. I may well test the Wine version to compare, though.
              Just tested with Wine. The default GLSL backend (in Wine) was crashing and had broken water output, so I used the ARB backend (which also tends to be faster):

              Wine ARB + SB = 32.24 FPS = 49% improvement

              So faster again, but very minor graphical glitches from what I can tell. Keep in mind there may be engine differences as well. Perhaps the Linux version is using a newer version of the HL2 engine, although graphically they don't look dramatically different.

              EDIT: Upon closer investigation, I do think that the Linux version is using a newer engine. I believe that it is actually using HDR for one, whereas the older engine just reports but doesn't use it.

              So I disabled HDR, Bloom and motion blur, and ran it again:

              Native Linux SB = 30.66 FPS

              That might not be enough for it to be comparable, however. I think we'll have to wait for the new engine to make it into the Windows version for these to be directly comparable. But yeah, the engine differences limit these results, obviously. Probably should have checked more closely in the first place.
              Last edited by AnonymousCoward; 05-17-2013, 10:49 AM.

              Comment


              • #22
                Alright, last post for now. You can use the newer engine in Windows by opting into the SteamPipe Beta. A quick bench with Wine showed that performance was about the same as Linux, so that was probably the difference. Haven't tested on native Windows, but I suspect it will still be a lot faster.
                Last edited by AnonymousCoward; 05-17-2013, 11:22 AM.

                Comment


                • #23
                  Originally posted by AnonymousCoward View Post
                  Alright, last post for now. You can use the newer engine in Windows by opting into the SteamPipe Beta. A quick bench with Wine showed that performance was about the same as Linux, so that was probably the difference. Haven't tested on native Windows, but I suspect it will still be a lot faster.
                  You might be bumping into something that is using non-accelerated functionality.
                  I'd start with tests such as Q3 or first half-life(the original clients for lin and win).
                  I am also sure, Valve could answer a few questions...

                  Comment


                  • #24
                    Originally posted by AnonymousCoward View Post
                    Alright, last post for now. You can use the newer engine in Windows by opting into the SteamPipe Beta. A quick bench with Wine showed that performance was about the same as Linux, so that was probably the difference. Haven't tested on native Windows, but I suspect it will still be a lot faster.
                    Sorry, if you still can repeat the tests, could you try to force CPU governor to "performance" instead of "ondemand", because of this. I also hope you set GPU to profile/high before testing? A lot of nuances are still not ironed out as you see... We seriously need a power logic that does this by itself.

                    Comment


                    • #25
                      Originally posted by brosis View Post
                      Sorry, if you still can repeat the tests, could you try to force CPU governor to "performance" instead of "ondemand", because of this. I also hope you set GPU to profile/high before testing? A lot of nuances are still not ironed out as you see... We seriously need a power logic that does this by itself.
                      I used performance, but to be fair ondemand hasn't shown any decrements on my particular system. I think because the HL2 games don't actually use that much CPU, or that ondemand actually works properly on my system for whatever reason.

                      As for GPU profiles, I used DynPM and previous benchmarks did not differ from "High" profile, but good point, I'll retest to be certain.

                      I did run a quick Windows 8 benchmark with SteamPipe, and performance was around 73/74 fps. So Linux looks a bit better when both systems use the new engine.

                      I suspect it's as you say and software fallbacks are being used. I wouldn't be surpised if they are using some extensions that are just not supported on the Open Source ATI drivers, and perhaps not with FGLRX either (i.e currently nvidia specific extensions).

                      I may run some more benchmarks, but I'm going to be quite busy from tommorrow until next Saturday, so might not have any time after today. Anyone who owns the Orange Box can run this bench, BTW, if you wish to check for yourself. It's in the Half Life complete collection as well, although it currently looks pretty expensive (40 USD).

                      EDIT: Just ran with performance and "High" GPU profile and there was no change (HDR, Motion Blur enabled):

                      R600 SB = 26.19 FPS

                      I checked the commandline output and it reports a number of extensions that it doesn't support, including a summary at the end that is presumably the key extensions needed for the game:

                      GL_NV_bindless_texture: DISABLED
                      GL_AMD_pinned_memory: DISABLED
                      GL_EXT_texture_sRGB_decode: AVAILABLE
                      GL_NVX_gpu_memory_info: UNAVAILABLE
                      GL_ATI_meminfo: UNAVAILABLE
                      GL_MAX_SAMPLES_EXT: 8

                      I seem to remember 'pinned memory' being a performance (but also stability issue) with FGLRX when enabled, but I believe it does increase performance when working correctly.
                      Last edited by AnonymousCoward; 05-17-2013, 10:56 PM.

                      Comment


                      • #26
                        Originally posted by AnonymousCoward View Post
                        As for GPU profiles, I used DynPM and previous benchmarks did not differ from "High" profile, but good point, I'll retest to be certain.
                        The other thing to check is what the final clock rate is when in high profile. I think some of the amd integrated gpus are still limited to slower speeds until they get better PM, although i'm not sure if that includes your hardware or not.

                        Comment


                        • #27
                          Originally posted by smitty3268 View Post
                          The other thing to check is what the final clock rate is when in high profile. I think some of the amd integrated gpus are still limited to slower speeds until they get better PM, although i'm not sure if that includes your hardware or not.
                          Yes, this might be it. /sys/kernel/debug/dri/0/radeon_pm_info shows this when 'High' profile is enabled:

                          Code:
                          default engine clock: 200000 kHz
                          current engine clock: 200000 kHz
                          default memory clock: 800000 kHz
                          However, "rovclock -i" shows lots of different frequencies when I run it multiple times, up to about 600+ mhz (close to its maximum frequency) so I'm not sure if radeon_pm_info is not detecting the changed frequency, or rovclock is wrong:

                          Code:
                          Video BIOS signature not found.
                          Invalid reference clock from BIOS: 387.96 MHz
                          
                          Video BIOS signature not found.
                          Invalid reference clock from BIOS: 618.57 MHz
                          Perhaps it's just random noise as it doesn't even seem to be associated with GPU load. Is there a better utililty/sysfs entry to look at?

                          Comment


                          • #28
                            AnonymousCoward here: I think that this is the issue, the performance difference certainly makes sense. I'm not sure why it didn't occur to me actually, as I remember nouveau having the same isssue. I tried to force "Low" profile to test, and it doesn't seem to lower performance. So either the driver is severely bottlenecking the card, or power management is broken. Here's radeontop output in case that is helpful. I suspect there would be high CPU load rather than GPU load if the driver was bottlenecking the card (cpu load is relatively low):

                            Code:
                             radeontop v0.6-4-g244c88e, running on ARUBA, 120 samples/sec                                                                            
                                                                                                                                       |
                                                                                                                 Graphics pipe 100.00% |                                                                                                       
                            -----------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------
                                                                                                                  Event Engine   0.00% |
                                                                                                                                       |
                                                                                                   Vertex Grouper + Tesselator  55.00% |                                                        
                                                                                                                                       |
                                                                                                             Texture Addresser  90.00% |                                                                                            
                                                                                                                                       |
                                                                                                                 Shader Export  96.67% |                                                                                                   
                                                                                                   Sequencer Instruction Cache  94.17% |                                                                                                
                                                                                                           Shader Interpolator 100.00% |                                                                                                       
                                                                                                                                       |
                                                                                                                Scan Converter  99.17% |                                                                                                      
                                                                                                            Primitive Assembly  56.67% |                                                          
                                                                                                                                       |
                                                                                                                   Depth Block  98.33% |                                                                                                     
                                                                                                                   Color Block  94.17% |

                            Comment


                            • #29
                              One more bit of info: Steam is detecting only 268.44 MB of VRAM, when in theory it should be supporting 512MB. This could account for some of the performance issue, especially since I run games at 1920x1080. This thread discusses the issue: http://steamcommunity.com/app/221410...8532588748333/

                              EDIT: Nevermind, probably just an issue with the way Steam detects available ram according to that thread.
                              Last edited by AnonymousCoward; 05-18-2013, 05:31 AM.

                              Comment


                              • #30
                                Mobile APUs are definitely clock-limited, no question about it. You have to modify the kernel to workaround. Delete this if-block.

                                Comment

                                Working...
                                X