Page 3 of 3 FirstFirst 123
Results 21 to 29 of 29

Thread: The Performance-Per-Watt, Efficiency Of GPUs On Open-Source Drivers

  1. #21
    Join Date
    Sep 2008
    Location
    Vilnius, Lithuania
    Posts
    2,521

    Default

    Quote Originally Posted by Michael View Post
    The HD 5450 was passively cooled while the ASUS HD 4890 has a very large cooler.
    Oh! Well, that explains that.

    Hmm, the choice of passive vs active cooling is kind of annoying. Passive cooling is quieter, but then if it raises the temperature so that the CPU cooler has to work harder, then in the end it might be louder... Argh.

  2. #22
    Join Date
    Feb 2008
    Location
    Linuxland
    Posts
    4,991

    Default

    Why not use the best combination?

    Passively cooled components + huge silent fan on the case, 20cm+. Temps lower than normal small-fan equipped hw, silence.

  3. #23
    Join Date
    Mar 2011
    Posts
    374

    Default

    Quote Originally Posted by bridgman View Post
    I think there are a couple of messages here, but none of them are "r600 is a better architecture for Mesa"
    Are you sure about that? I remember hearing VLIW was tuned for graphics while SIMD (GCN) is there to make GPGPU tasks easier/faster. Now I don't want to spread misinformation, so please correct me if I'm wrong as I think you know it best.

  4. #24
    Join Date
    Oct 2008
    Posts
    3,033

    Default

    Quote Originally Posted by TAXI View Post
    Are you sure about that? I remember hearing VLIW was tuned for graphics while SIMD (GCN) is there to make GPGPU tasks easier/faster. Now I don't want to spread misinformation, so please correct me if I'm wrong as I think you know it best.
    It's less that VLIW is more tuned for graphics, and more that it is unable to be tuned for anything else.

    SIMD is better for GPGPU, but it can be equally good for graphics. The hardware is probably just more expensive to create that you could get away with if you were only targeting graphics through a VLIW architecture (and so therefore, it may end up being slower if amd just targets a specific price point).

    Also, VLIW requires more complicated compiler optimizations to work correctly, so if anything SIMD should be the "better" option for Mesa.
    Last edited by smitty3268; 06-07-2014 at 08:45 PM.

  5. #25
    Join Date
    Feb 2008
    Location
    Linuxland
    Posts
    4,991

    Default

    vec4 a = b +c

    One instruction, one "core" taken on a vector arch. Four units taken on a scalar (gcn) arch. As long as the gcn card has less than 4x the cores, well vectorized code will be slower.

  6. #26
    Join Date
    Oct 2007
    Location
    Toronto-ish
    Posts
    7,386

    Default

    Quote Originally Posted by smitty3268 View Post
    SIMD is better for GPGPU, but it can be equally good for graphics. The hardware is probably just more expensive to create that you could get away with if you were only targeting graphics through a VLIW architecture (and so therefore, it may end up being slower if amd just targets a specific price point).
    Right. A VLIW SIMD implementation requires less silicon area than a scalar SIMD implementation for the same graphics performance (since graphics workloads are mostly 3- and 4-vector operations anyways), but it's harder to make optimal use of VLIW hardware on arbitrary compute workloads.

    On the other hand, for compute workloads which *do* fit well with VLIW hardware (basically ones which can be readily modified to make use of 4-vectors) the compute performance per unit area (and hence per-dollar) can be very high.
    Last edited by bridgman; 06-08-2014 at 06:30 AM.

  7. #27
    Join Date
    Jan 2009
    Posts
    1,303

    Default

    Quote Originally Posted by bridgman View Post
    Right. A VLIW SIMD implementation requires less silicon area than a scalar SIMD implementation for the same graphics performance (since graphics workloads are mostly 3- and 4-vector operations anyways), but it's harder to make optimal use of VLIW hardware on arbitrary compute workloads.

    On the other hand, for compute workloads which *do* fit well with VLIW hardware (basically ones which can be readily modified to make use of 4-vectors) the compute performance per unit area (and hence per-dollar) can be very high.
    Can you explain the difference between scalar and vector simd? I know that a big difference between pre-gcn and gcn but I haven't found a resource that actually explains exactly what it means except that scalar SIMD is more flexible and can make good use of a hardware scheduler. That's also the very issue of SCALAR SIMD. The idea seems bizarre, as if it becomes an oxymoron with the multiple data part.

  8. #28
    Join Date
    Oct 2008
    Posts
    3,033

    Default

    Quote Originally Posted by liam View Post
    Can you explain the difference between scalar and vector simd? I know that a big difference between pre-gcn and gcn but I haven't found a resource that actually explains exactly what it means except that scalar SIMD is more flexible and can make good use of a hardware scheduler. That's also the very issue of SCALAR SIMD. The idea seems bizarre, as if it becomes an oxymoron with the multiple data part.
    I don't know about all the exact terminology, but the major difference between the old VLIW and the new radeon SI architecture is explained somewhat by Anandtech.
    Whereas VLIW is all about extracting instruction level parallelism (ILP), a non-VLIW SIMD is primarily about thread level parallelism (TLP).
    Because the smallest unit of work is the SIMD and a CU has 4 SIMDs, a CU works on 4 different wavefronts at once. As wavefronts are still 64 operations wide, each cycle a SIMD will complete of the operations on their respective wavefront, and after 4 cycles the current instruction for the active wavefront is completed.

    Cayman by comparison would attempt to execute multiple instructions from the same wavefront in parallel, rather than executing a single instruction from multiple wavefronts. This is where Cayman got bursty if the instructions were in any way dependent, Cayman would have to let some of its ALUs go idle. GCN on the other hand does not face this issue, because each SIMD handles single instructions from different wavefronts they are in no way attempting to take advantage of ILP, and their performance will be very consistent.
    http://www.anandtech.com/show/4455/a...-for-compute/3
    http://www.anandtech.com/show/5261/a...-7970-review/3

  9. #29
    Join Date
    Jan 2009
    Posts
    1,303

    Default

    Quote Originally Posted by smitty3268 View Post
    I don't know about all the exact terminology, but the major difference between the old VLIW and the new radeon SI architecture is explained somewhat by Anandtech.




    http://www.anandtech.com/show/4455/a...-for-compute/3
    http://www.anandtech.com/show/5261/a...-7970-review/3
    Nice find. Thanks. This seems like it forces the hardware to be far more aware of program state than previous iterations. This would take some of the burden off the compiler writers, but it also appears to be more costly when it comes to silicon efficiency.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •