Announcement

Collapse
No announcement yet.

Radeon Gallium3D R600g Color Tiling Performance

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Radeon Gallium3D R600g Color Tiling Performance

    Phoronix: Radeon Gallium3D R600g Color Tiling Performance

    With 2D color tiling enabled by default in the R600 Gallium3D Radeon open-source driver as of this week, here are new benchmarks showing off the OpenGL performance impact of the 1D and 2D tiling methods for this common open-source AMD Linux graphics driver.

    http://www.phoronix.com/vr.php?view=18099

  • #2
    This confirms what I am seeing with my silly "one rotating static texture" test. No big improvement yet when I enable 2D tiling for textures and the frame buffer. Perhaps there is a specific usage scenario where 2d tiling helps and I am not hitting it, though I thought that rasterising a big texture is precisely that.

    (BTW, I verified that it is indeed enabled by poking values directly. The sizzling pattern is insane! :-)

    Comment


    • #3
      Originally posted by tmikov View Post
      This confirms what I am seeing with my silly "one rotating static texture" test. No big improvement yet when I enable 2D tiling for textures and the frame buffer. Perhaps there is a specific usage scenario where 2d tiling helps and I am not hitting it, though I thought that rasterising a big texture is precisely that.

      (BTW, I verified that it is indeed enabled by poking values directly. The sizzling pattern is insane! :-)
      YOu won't exhaust GPU VRAM bandwidth with 1 big texture. You really need to push the bandwidth requirement to clearly see 2D tiling effect. 1 Big texture is not a test stressing the gpu.

      Comment


      • #4
        even if its just a few extra FPS here and there, it doesn't seem to have any regressions and every little bit counts. I kinda get the impression most of the ATI/AMD open source drivers are slow because of dozens of little things like this that are missing. I doubt that there are many core problems that would cause anything higher than say a 40% performance increase.

        Comment


        • #5
          Why a 4650 of all things?

          Comment


          • #6
            Originally posted by glisse View Post
            YOu won't exhaust GPU VRAM bandwidth with 1 big texture. You really need to push the bandwidth requirement to clearly see 2D tiling effect. 1 Big texture is not a test stressing the gpu.
            Fair enough. But what would be a good synthetic stress test?

            Also, do you have an idea why the blob is faster? Could it be memory clocks, power management, etc?

            Comment


            • #7
              Originally posted by Lemonzest View Post
              Why a 4650 of all things?
              maybe because with such a low end card any improvements of 2d tiling will be lost and so he can write some more stupid articles?

              4650 a 'midrange' card my ass. No, back then the 4770 was a midrange card. Nowadays a 5770/6770/7770 is a midrange card. 4650? That is low end in todays standards.

              Comment


              • #8
                Originally posted by tmikov View Post
                Fair enough. But what would be a good synthetic stress test?

                Also, do you have an idea why the blob is faster? Could it be memory clocks, power management, etc?
                No, it's because it has millions of lines of code, specifically hand-optimized for every imaginable usage scenario under the sun, all tweaked with heuristics over 15 years of development.

                The Mesa devs first try to implement all features in the natural, straight-forward way, and then optimise later, once the features are in place and stable.

                Comment


                • #9
                  Originally posted by pingufunkybeat View Post
                  No, it's because it has millions of lines of code, specifically hand-optimized for every imaginable usage scenario under the sun, all tweaked with heuristics over 15 years of development.

                  The Mesa devs first try to implement all features in the natural, straight-forward way, and then optimise later, once the features are in place and stable.
                  Thank you for this generic armchair response, but that's not really what I am looking for. I am trying to find our why the blob is much faster at rendering a single texture.

                  All that talk about 15 years of optimization and special cases for games is utter nonsense, unless the fundamental underlying operations are fast. Things like 2D tiling is exactly the fundamental improvements I am talking about. Perhaps there is something else missing that we don't know about.

                  Comment


                  • #10
                    Originally posted by tmikov View Post
                    Thank you for this generic armchair response, but that's not really what I am looking for. I am trying to find our why the blob is much faster at rendering a single texture.
                    In your case, it is about 4ms faster at rendering a single texture.

                    Optimising 4ms away is really hard work, especially if it consists of 100 different miliseconds collected across different parts of the driver. That's what my armchair response was about.

                    Comment


                    • #11
                      Originally posted by pingufunkybeat View Post
                      In your case, it is about 4ms faster at rendering a single texture.

                      Optimising 4ms away is really hard work, especially if it consists of 100 different miliseconds collected across different parts of the driver. That's what my armchair response was about.
                      4 Ms is really a very long time, a huge amount of CPU instructions. Plus CPU utilization is not high. This is not about code optimization. I am certain it is not a hundred little things, it must be a couple big ones.

                      Comment


                      • #12
                        Originally posted by pingufunkybeat View Post
                        In your case, it is about 4ms faster at rendering a single texture.

                        Optimising 4ms away is really hard work, especially if it consists of 100 different miliseconds collected across different parts of the driver. That's what my armchair response was about.
                        Please elaborate on what optimizations should be done on rendering a single image? In such a simple process, pretty much the same "calls" to the gpu should be made, no?
                        The driver's bottleneck is the CPU, it's where it works as a program, no? But low CPU usage, pretty much eliminates this possibility for this case. So GPU is the bottleneck. So something must be wrong there. Such a simple case, indicates that something is done wrong, a speed up feature not used or extra/different usage of the GPU is done. And it doesn't seem like many small ones, more like a couple bigger ones as mentioned. I doubt it took AMD 15 years to optimize rendering a texture(one). One possibility, may be invalid though, could it have to do with texture compression? Test it with a simple gradient instead and see there tooPMtmikov .. xD

                        Comment


                        • #13
                          Originally posted by Rigaldo View Post
                          Please elaborate on what optimizations should be done on rendering a single image? In such a simple process, pretty much the same "calls" to the gpu should be made, no?
                          Like I said, this is for driver developers to answer, I lack the knowledge. Marek and Alex have already written that all hardware functionality is used (only HiZ is not on by default). If I remember correctly, Jerome Glisse did profile the driver and couldn't find one single bottleneck, but many small ones. I can't find a link at the moment, perhaps somebody is better at googling.

                          The driver's bottleneck is the CPU, it's where it works as a program, no? But low CPU usage, pretty much eliminates this possibility for this case.
                          Only if they operate completely asynchronously. If the GPU ever has to wait for the driver before continuing, then no.

                          Even if your processor is mostly idle, a simple cache miss might cause a considerable delay while your GPU is waiting for the next instruction.

                          But again, I'm not a GPU developer. I just don't believe that using less than 100% of CPU all the time means that there are no bottlenecks in the driver. A 1ms delay is a 1ms delay, even if it only happens occasionally.

                          Comment


                          • #14
                            Originally posted by pingufunkybeat View Post
                            Like I said, this is for driver developers to answer, I lack the knowledge. Marek and Alex have already written that all hardware functionality is used (only HiZ is not on by default). If I remember correctly, Jerome Glisse did profile the driver and couldn't find one single bottleneck, but many small ones. I can't find a link at the moment, perhaps somebody is better at googling.


                            Only if they operate completely asynchronously. If the GPU ever has to wait for the driver before continuing, then no.

                            Even if your processor is mostly idle, a simple cache miss might cause a considerable delay while your GPU is waiting for the next instruction.

                            But again, I'm not a GPU developer. I just don't believe that using less than 100% of CPU all the time means that there are no bottlenecks in the driver. A 1ms delay is a 1ms delay, even if it only happens occasionally.
                            arent open source drivers single threaded? i wouldnt wonder that the cpu isnt fully used. would be interesting to see when the cpu would run in single core mode and then compare to catalyst...

                            Comment


                            • #15
                              2D tiling should show a bigger improvement on bigger cards since they have more memory channels.

                              Comment

                              Working...
                              X