Announcement

Collapse
No announcement yet.

The Performance-Per-Watt, Efficiency Of GPUs On Open-Source Drivers

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by Michael View Post
    The HD 5450 was passively cooled while the ASUS HD 4890 has a very large cooler.
    Oh! Well, that explains that.

    Hmm, the choice of passive vs active cooling is kind of annoying. Passive cooling is quieter, but then if it raises the temperature so that the CPU cooler has to work harder, then in the end it might be louder... Argh.

    Comment


    • #22
      Why not use the best combination?

      Passively cooled components + huge silent fan on the case, 20cm+. Temps lower than normal small-fan equipped hw, silence.

      Comment


      • #23
        Originally posted by bridgman View Post
        I think there are a couple of messages here, but none of them are "r600 is a better architecture for Mesa"
        Are you sure about that? I remember hearing VLIW was tuned for graphics while SIMD (GCN) is there to make GPGPU tasks easier/faster. Now I don't want to spread misinformation, so please correct me if I'm wrong as I think you know it best.

        Comment


        • #24
          Originally posted by TAXI View Post
          Are you sure about that? I remember hearing VLIW was tuned for graphics while SIMD (GCN) is there to make GPGPU tasks easier/faster. Now I don't want to spread misinformation, so please correct me if I'm wrong as I think you know it best.
          It's less that VLIW is more tuned for graphics, and more that it is unable to be tuned for anything else.

          SIMD is better for GPGPU, but it can be equally good for graphics. The hardware is probably just more expensive to create that you could get away with if you were only targeting graphics through a VLIW architecture (and so therefore, it may end up being slower if amd just targets a specific price point).

          Also, VLIW requires more complicated compiler optimizations to work correctly, so if anything SIMD should be the "better" option for Mesa.
          Last edited by smitty3268; 06-07-2014, 08:45 PM.

          Comment


          • #25
            vec4 a = b +c

            One instruction, one "core" taken on a vector arch. Four units taken on a scalar (gcn) arch. As long as the gcn card has less than 4x the cores, well vectorized code will be slower.

            Comment


            • #26
              Originally posted by smitty3268 View Post
              SIMD is better for GPGPU, but it can be equally good for graphics. The hardware is probably just more expensive to create that you could get away with if you were only targeting graphics through a VLIW architecture (and so therefore, it may end up being slower if amd just targets a specific price point).
              Right. A VLIW SIMD implementation requires less silicon area than a scalar SIMD implementation for the same graphics performance (since graphics workloads are mostly 3- and 4-vector operations anyways), but it's harder to make optimal use of VLIW hardware on arbitrary compute workloads.

              On the other hand, for compute workloads which *do* fit well with VLIW hardware (basically ones which can be readily modified to make use of 4-vectors) the compute performance per unit area (and hence per-dollar) can be very high.
              Last edited by bridgman; 06-08-2014, 06:30 AM.

              Comment


              • #27
                Originally posted by bridgman View Post
                Right. A VLIW SIMD implementation requires less silicon area than a scalar SIMD implementation for the same graphics performance (since graphics workloads are mostly 3- and 4-vector operations anyways), but it's harder to make optimal use of VLIW hardware on arbitrary compute workloads.

                On the other hand, for compute workloads which *do* fit well with VLIW hardware (basically ones which can be readily modified to make use of 4-vectors) the compute performance per unit area (and hence per-dollar) can be very high.
                Can you explain the difference between scalar and vector simd? I know that a big difference between pre-gcn and gcn but I haven't found a resource that actually explains exactly what it means except that scalar SIMD is more flexible and can make good use of a hardware scheduler. That's also the very issue of SCALAR SIMD. The idea seems bizarre, as if it becomes an oxymoron with the multiple data part.

                Comment


                • #28
                  Originally posted by liam View Post
                  Can you explain the difference between scalar and vector simd? I know that a big difference between pre-gcn and gcn but I haven't found a resource that actually explains exactly what it means except that scalar SIMD is more flexible and can make good use of a hardware scheduler. That's also the very issue of SCALAR SIMD. The idea seems bizarre, as if it becomes an oxymoron with the multiple data part.
                  I don't know about all the exact terminology, but the major difference between the old VLIW and the new radeon SI architecture is explained somewhat by Anandtech.
                  Whereas VLIW is all about extracting instruction level parallelism (ILP), a non-VLIW SIMD is primarily about thread level parallelism (TLP).
                  Because the smallest unit of work is the SIMD and a CU has 4 SIMDs, a CU works on 4 different wavefronts at once. As wavefronts are still 64 operations wide, each cycle a SIMD will complete of the operations on their respective wavefront, and after 4 cycles the current instruction for the active wavefront is completed.

                  Cayman by comparison would attempt to execute multiple instructions from the same wavefront in parallel, rather than executing a single instruction from multiple wavefronts. This is where Cayman got bursty if the instructions were in any way dependent, Cayman would have to let some of its ALUs go idle. GCN on the other hand does not face this issue, because each SIMD handles single instructions from different wavefronts they are in no way attempting to take advantage of ILP, and their performance will be very consistent.
                  http://www.anandtech.com/show/4455/a...-for-compute/3
                  http://www.anandtech.com/show/5261/a...-7970-review/3

                  Comment


                  • #29
                    Originally posted by smitty3268 View Post
                    I don't know about all the exact terminology, but the major difference between the old VLIW and the new radeon SI architecture is explained somewhat by Anandtech.




                    http://www.anandtech.com/show/4455/a...-for-compute/3
                    http://www.anandtech.com/show/5261/a...-7970-review/3
                    Nice find. Thanks. This seems like it forces the hardware to be far more aware of program state than previous iterations. This would take some of the burden off the compiler writers, but it also appears to be more costly when it comes to silicon efficiency.

                    Comment

                    Working...
                    X