Announcement

Collapse
No announcement yet.

Anyone with HD5870 or HD5850 using recent opensource driver and kernel?

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Anyone with HD5870 or HD5850 using recent opensource driver and kernel?

    I was wondering what is current realistic performance on this card using lastest stack?
    Lightsmark and Xonotic are of relevance for me.. 1600, 1920 or similar; high or ultra.

    I couldn't find any information on openbenchmarking or via google. The only part that comes close is this

    Anyone?

  • #2
    Originally posted by crazycheese View Post
    I was wondering what is current realistic performance on this card using lastest stack?
    Lightsmark and Xonotic are of relevance for me.. 1600, 1920 or similar; high or ultra.

    I couldn't find any information on openbenchmarking or via google. The only part that comes close is this

    Anyone?
    in my knowledge the hd6970 series do have the best open-source performance from the AMD side right now.
    its because of the shader compiler and the hd6970 do have the most advanced and most "easy" to program VLIW shader core.
    Its still a full joke to the hd7000 series technically but the hd7000 series is not ready to use right now.

    In my benchmark research the hd4000 series is one of the worst performers and the hd6970 is the best right now.
    people with much lower hardware than your old hd4770 with the 6000series get much better benchmark results. Its just 3-4 times faster on similar hardware specifications. the performance difference is "criminal" they just betrayed there hd4000 customers.

    I personally would not waste my money on a death horse like VLIW ... you'll regret it because amd will make sure you'll regret it.

    Comment


    • #3
      Yes, I know, but I need some real-life FPS. Not much has changed since 5870.
      The GCN is in heavy works right now, unusable true,.. still.. based upon approximation from evergreen surfacing to its current support, it will take at least 1.5 years. A GCN card bought right now, in 1.5 years, is already outdated. This is why, for opensource GCN is excluded from the list.

      Also, evergreen still has, hardware-sided, good energy consumption (would become even better, if they finally switch to dynprofiles), good 3d performance, and ability to drive multiple displays which works and some opencl work is going on.

      Yes, this is exactly like 3 years ago, where AMD consumers were forced to purchase old cards if they want opensource, but at least the hardware stopped sucking now.

      A possible candidate is 5870 eyefinity version with 2GiB of vram.

      Comment


      • #4
        Originally posted by crazycheese View Post
        Yes, I know, but I need some real-life FPS. Not much has changed since 5870.
        The GCN is in heavy works right now, unusable true,.. still.. based upon approximation from evergreen surfacing to its current support, it will take at least 1.5 years. A GCN card bought right now, in 1.5 years, is already outdated. This is why, for opensource GCN is excluded from the list.

        Also, evergreen still has, hardware-sided, good energy consumption (would become even better, if they finally switch to dynprofiles), good 3d performance, and ability to drive multiple displays which works and some opencl work is going on.

        Yes, this is exactly like 3 years ago, where AMD consumers were forced to purchase old cards if they want opensource, but at least the hardware stopped sucking now.

        A possible candidate is 5870 eyefinity version with 2GiB of vram.
        "Not much has changed since 5870."

        this is wrong 5870 is 1 big(complex) 4 little(simple) VLIW shader units.
        a 6970 is 4 big complex shader units. without little simple shader units.
        in the reality they just use the 1 big shader unit and ignore the simple little simple shaders
        this means the hd6970 is easier to write a compiler for because you don't have to care about this.
        also the full utilization per shader cluster is higher in 4D-VLIW than in 5D-VLIW average is ~3,5
        this means you lose 5-3,5=1,5 for 5D VLIW and only 0,5 for 4D-VLIW

        in other words you are stupid if you think a hd5870 is a option instead if a hd6000 card.

        if a complex shader hits your pipe you do only have 1/5 performance with a 5D-VLIW architecture the 4D-VLIW do not have the same problem because there are 4 complex shaders instead.

        the catalyst do have some dirty shader compiler tricks with shader replacement to cheat with the replacement of complex shaders with simple shaders. thatís why the catalyst is so much faster.

        but even with the catalyst the 4D-VLIW win because of the average utilizatio of 3.5... this means 5D-VLIW worth nothing.

        Comment


        • #5
          Hmm, thanks!

          Maybe this is the cause for opensource radeon being slow? I remember someone (Marek?) said that "we don't have efficient shader compiler"...

          So opensource driver simply uses 1 of 5 units and the simpler are utilized randomly?

          If catalyst can break or cheat the complex into simpler, it will approach 3,5-4 (out of 5) load compared to 1-2.5(out of 5) load(performance) on opensource. Wild claim here..

          To support (or deny) this, one should extensively test 4D(0/4) hardware with opensource and catalyst, and compare it to good 5D(4/1) VLIWs.

          Test of any HD5xxx, HD64xx-68xx VS HD69xx (only three cards available, all high-end). If the performance of opensource and catalyst is much closer to each on HD69xx, then you are correct...
          Last edited by crazycheese; 09-15-2012, 12:31 PM.

          Comment


          • #6
            Indeed until recently, I believe the open driver only scheduled one of the five units. The LLVM VLIW packetizer should be enabled in current git, so it should now use more units, but it's likely not close to Catalyst-level efficiency.

            Comment


            • #7
              Originally posted by curaga View Post
              Indeed until recently, I believe the open driver only scheduled one of the five units.
              IIRC the current (non llvm) shader compiler always used one ALU for each component the TGSI instruction was working on. For common vertex and fragment/pixel operations this usually meant 4 ALUs used in each instruction, but more complex shaders included relatively more single component operations or functions which only the single T unit could handle (integer ops, transcendentals etc..) and so the average ALU utilization went down.

              A more capable compiler could help with the first case by packing multiple single component operations into a single instruction, but couldn't do much about the second case where the T unit was required. That said the second case didn't happen too much for graphics -- it was mostly compute workloads that justified moving from 4 simple +1 special to 4 identical ALUs.

              Q's point about the Cayman shader core being able to execute 4 complex operations in a single instruction is correct in principle, however I don't believe the current compiler is able to pack multiple operations into a single instruction. Not sure if the llvm compiler paths are able to do that yet either.

              When we looked a couple of years ago the average open source compiler utilization was a bit under 3 while the proprietary shader compiler was a bit under 4. With the trend to more complex shaders I imagine both numbers have gone down a bit further since then.
              Last edited by bridgman; 09-15-2012, 02:29 PM.

              Comment


              • #8
                Originally posted by crazycheese View Post
                Hmm, thanks!

                Maybe this is the cause for opensource radeon being slow?
                this sentence is wrong this is why hd2000-hd5000 is slow hd6000 4D-VLIW is much faster

                Originally posted by crazycheese View Post
                I remember someone (Marek?) said that "we don't have efficient shader compiler"...
                you need more you need a shader replacement per app to fix that.
                the catalyst do replace complex shaders with simple shaders for hd2000-hd5000 for every single app.


                Originally posted by crazycheese View Post
                So opensource driver simply uses 1 of 5 units and the simpler are utilized randomly?
                yes in my experience they only use the 1 complex FULL shader because they do not have shader replacement per app technology right now. only some apps do have "Simple" shaders for AMD then they can use more than 1 shader. but the lag of shader compiler cause that the app uses 100 shaders and your graphic card do have 700 shaders then the graphic card only use 100 shaders because there is no compiler to use the 700 shaders-







                Originally posted by crazycheese View Post
                If catalyst can break or cheat the complex into simpler, it will approach 3,5-4 (out of 5) load compared to 1-2.5(out of 5) load(performance) on opensource. Wild claim here..
                the average for catalyst is 3,5 catalyst use 3,5 of 5 per group for 5D VLIW and 3,5 of 4 for 4D VLIW
                this means 5D VLIW is useless even for the catalyst.

                yes in the worst case the opensource driver use 1 of 5 for 5D VLIW and with a shader compiler in the future 3,5 of 4 for the 4D VLIW

                now you get what? hd2000-hd5000 will never ever get a improvement because you need a shader replacement to load the 4 simple shader units per group.

                this means for the open-source driver its complete stupidity to buy a hd5000 because you will not get any improvement because they will not build a per app shader replacement infrastructure.

                you will get a good result with a hd6970 with the future shader compiler.

                Originally posted by crazycheese View Post
                To support (or deny) this, one should extensively test 4D(0/4) hardware with opensource and catalyst, and compare it to good 5D(4/1) VLIWs.
                no need to test these stuff are technical facts.
                they switch from 5D to 4D and complex shader only because even the catalyst CAN NOT HANDLE THIS COMPLEXITY of shader replacement and compiling for 2 different kind of shaders.


                Originally posted by crazycheese View Post
                Test of any HD5xxx, HD64xx-68xx VS HD69xx (only three cards available, all high-end). If the performance of opensource and catalyst is much closer to each on HD69xx, then you are correct...
                no there are low-end 4D VLIW ... the second generation of APUs are all 4D VLIW. all FM2 based systems and notebooks.

                a example of lowend 4D VLIW: AMD A6-3420M AMD Radeon HD 7470M

                Comment


                • #9
                  Originally posted by necro-lover View Post
                  the catalyst do replace complex shaders with simple shaders for hd2000-hd5000 for every single app.
                  No

                  Originally posted by necro-lover View Post
                  the average for catalyst is 3,5 catalyst use 3,5 of 5 per group for 5D VLIW and 3,5 of 4 for 4D VLIW
                  this means 5D VLIW is useless even for the catalyst.
                  No - the average number of ALUs used per instruction is a bit lower for VLIW4 than for VLIW5, since there are a number of cases where the shader compiler could pack a single component operation into the same instruction as a 4-vector operation. The point was that the utilization as a percentage was slightly better with VLIW4.

                  In general VLIW5 was better for pure graphics workloads, but as compute became a larger part of GPU workload (there's a lot of compute hidden in modern graphical apps as well) then VLIW4 became a better fit.
                  Last edited by bridgman; 09-15-2012, 02:36 PM.

                  Comment


                  • #10
                    Originally posted by bridgman View Post
                    Q's point about the Cayman shader core being able to execute 4 complex operations in a single instruction is correct in principle, however I don't believe the current compiler is able to pack multiple operations into a single instruction. Not sure if the llvm compiler paths are able to do that yet either.

                    When we looked a couple of years ago the average open source compiler utilization was a bit under 3 while the proprietary shader compiler was a bit under 4. With the trend to more complex shaders I imagine both numbers have gone down a bit further since then.
                    my argument was also a "Future" argument *in the Future the 4D VLIW is much better than the old 5D VLIW cards*

                    I just don't want him to buy a hd5000 card because its technically bullshit in a modern world of more and more complex shaders.

                    If he buy a hd7970 he buy 4 shaders and use 3-3.5 then he lost only 1-0,5 instead of you buy 5 shaders and only use 2-3 and lost 2-3

                    Comment


                    • #11
                      Yep, I agree that a VLIW4 GPU is a bit more future-proof than a VLIW5.

                      FWIW, I didn't get the impression that crazycheese was planning to buy an HD58xx, just wondering what performance was like these days.

                      Comment


                      • #12
                        Originally posted by bridgman View Post
                        No - the average number of ALUs used per instruction is a bit lower for VLIW4 than for VLIW5, since there are a number of cases where the shader compiler could pack a single component operation into the same instruction as a 4-vector operation. [...]

                        In general VLIW5 was better for pure graphics workloads, but as compute became a larger part of GPU workload (there's a lot of compute hidden in modern graphical apps as well) then VLIW4 became a better fit.
                        what kind of graphics workload? raster or ray-tracing ? because ray-tracing graphic load is pure compute workload....

                        and right: "(there's a lot of compute hidden in modern graphical apps as well) then VLIW4 became a better fit."

                        why he should buy hardware for obsolete stuff? the 4D VLIW architecture is better for modern games.

                        Originally posted by bridgman View Post
                        The point was that the utilization as a percentage was slightly better with VLIW4.
                        that was my argument but in the past i read something about a average usage of 3,5

                        Comment


                        • #13
                          Originally posted by bridgman View Post
                          Yep, I agree that a VLIW4 GPU is a bit more future-proof than a VLIW5.

                          FWIW, I didn't get the impression that crazycheese was planning to buy an HD58xx, just wondering what performance was like these days.
                          Correct, I was just planning to do it in near future(1-2 months from now). But only if the results are acceptable. Unfortunately, I was unable to find any benchmarks of this GPU especially with open driver up at openbenchmarking. The VLIW GPU is also very solid-looking for non-graphical stuff. Besides, there are Marek&co who actively were seeking the ways to improve the driver.

                          Now, I forgot that pre-SI GPUs have 5 units/block, yet they are in 1/4 config. So, its either 6950/70 or 7xxx area for me left. 7xxx area is unstable and inefficient, so it seems I have plenty of time.

                          Have been reading on Itanium VLIW implementation today, and besides mentioning (absent in opensource) VLIW compiler as being the most important and required for VLIW hardware to work efficiently, there was an indication of "special hardware feature" allowing to profile the current execution within VLIW for the goal of improving the compiler itself. What makes me wonder is if AMD is willing to provide the documentation for that mode (in case it was implemented, and not hardware simulations were methods to optimize the execution)?.. Would be real help for pre-NI driver.

                          Originally posted by necro-lover View Post
                          my argument was also a "Future" argument *in the Future the 4D VLIW is much better than the old 5D VLIW cards*
                          I just don't want him to buy a hd5000 card because its technically bullshit in a modern world of more and more complex shaders.
                          If he buy a hd7970 he buy 4 shaders and use 3-3.5 then he lost only 1-0,5 instead of you buy 5 shaders and only use 2-3 and lost 2-3
                          I've got and understood your warning in your first post, n-l! Thanks for it!
                          Last edited by crazycheese; 09-15-2012, 07:27 PM.

                          Comment


                          • #14
                            The Trinity GPUs are also VLIW4, btw.

                            Comment


                            • #15
                              Originally posted by bridgman View Post
                              The Trinity GPUs are also VLIW4, btw.
                              the FM2 desktop Trinity's are not ready and the notebooks are broken by design because of the power-management.

                              Because of this broken "power-managment" I ordered a Intel notebook for a friend with intel-hd4000 graphic for 450Ä today.

                              I'm so sorry AMD but only stupid people buy AMD-Notebook hardware for Linux.

                              Comment

                              Working...
                              X