Announcement

Collapse
No announcement yet.

"Ask ATI" dev thread

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • If he's fine with his 3850's performance, I'd suggest waiting for the the 56xx (dunno) or 57xx (october) series.

    Comment


    • Originally posted by d2kx View Post
      If he's fine with his 3850's performance, I'd suggest waiting for the the 56xx (dunno) or 57xx (october) series.
      Yes, I'll do that.

      Comment


      • Originally posted by bridgman View Post
        I can't really comment on unreleased products, unfortunately.

        My understanding is that the current [email protected] implementation uses essentially the same code for 6xx and 7xx parts, so it does not take advantage of LDS/GDS on the 7xx parts. I imagine that's where the discussion of "calculate twice vs store and re-use" comes from.
        It's strange that ATI didn't work with [email protected] to fix such a high profile GPGPU application.

        I guess everything will be better and easier to fix when we have both opensource OpenCL implementation, and when (if) [email protected] releases their code. Since FAH is based on gromacs, and gromacs is open source, I guess this shouldn't be that impossible. I'll go back to my little hole now, and wait for OpenCL and 57xx, in that order.

        Comment


        • When will the documentation for the RS780 series of graphics chips be coming out along with documentation on handling the power management features of these chips.

          Comment


          • Scroll down to the "Chipset Guides and Documentation" section :

            http://developer.amd.com/documentati...s/default.aspx

            Most of the information required for power management is already out there - the main issue is that dynamic power management really needs to be implemented in a KMS-enabled DRM so that the PM code can (a) have access to all the required activity information, and (b) avoid hardware access conflicts between PM code (which needs to be in drm) and modesetting code. The issue there is that a couple of register locations are used for both PM and modesetting functions.

            There are a couple of things we still need to document -- some missing bits in the AtomBIOS power-related tables and the on-chip fan controller for sure. They're next on the list after we get interrupts working on 6xx/7xx.
            Last edited by bridgman; 13 October 2009, 05:52 PM.
            Test signature

            Comment


            • Superscalar vs. VLIW

              I'm not sure this is the right thread, but I think it is better to ask AMD/ATI dev. Various journalists, portals, and forum members all around the internet call ATI R600-R800 architecture Superscalar.

              But, AFAIK the ATI architecture is VLIW, not superscalar. Both superscalar and VLIW ways achieve the same goals. But these implementations are different. Superscalar architecture use HW dependency checking among the instructions. This means the chip is bigger. On the other hand, VLIW use SW depenency checking, so it depends heavily on compiler thus chips can be smaler.

              So, it seems to me, ATI chose the VLIW way (HD5870 has 320 VLIW cores) and nVidia superscalar way (GT200 has 120 superscalar, or 240 scalar cores).

              Do I understand it right? Is it ATI architecture VLIW and relay heavily on compiler to do instruction dependency checking?

              Comment


              • To those of you that asked most of the questions in this thread, it doesn't look like AMD is going to ever finish the formal Q&A... So you can probably stop asking questions.
                Michael Larabel
                https://www.michaellarabel.com/

                Comment


                • Originally posted by next9 View Post
                  I'm not sure this is the right thread, but I think it is better to ask AMD/ATI dev. Various journalists, portals, and forum members all around the internet call ATI R600-R800 architecture Superscalar.

                  But, AFAIK the ATI architecture is VLIW, not superscalar. Both superscalar and VLIW ways achieve the same goals. But these implementations are different. Superscalar architecture use HW dependency checking among the instructions. This means the chip is bigger. On the other hand, VLIW use SW depenency checking, so it depends heavily on compiler thus chips can be smaler.

                  So, it seems to me, ATI chose the VLIW way (HD5870 has 320 VLIW cores) and nVidia superscalar way (GT200 has 120 superscalar, or 240 scalar cores).

                  Do I understand it right? Is it ATI architecture VLIW and relay heavily on compiler to do instruction dependency checking?
                  Are you Spyhawk on Beyond3D ? I just answered the same question there

                  Anyways, most definitions of superscalar include VLIW as a subset. Some distinguish between "static superscalar" (VLIW) and "dynamic superscalar". I haven't found any definitions of superscalar which exclude VLIW but I'm sure they exist.

                  ATI GPUs are superscalar via VLIW, or just "VLIW" if you don't consider VLIW to be a subset of superscalar. They do depend on having the shader compiler identify instruction level parallelism, but since most graphics operations deal with 3- or 4-element vectors anyways (pixels are almost always RGBA, vertices and normals are either float3 or float4) you can get decently high ALU utilization even with a simple translator like we use in the r600 mesa driver today. The approach is similar to the vector+scalar ALUs we used in r3xx-r5xx, but more general and so more useful for compute workloads.

                  Extracting instruction-level-parallelism in the compiler is much more difficult with a typical CPU workload, where most of the operations are scalar. It's the high proportion of short vectors in a graphics or HPC workload which makes a VLIW approach to superscalar GPU hardware attractive.
                  Last edited by bridgman; 16 October 2009, 06:59 PM.
                  Test signature

                  Comment


                  • Originally posted by bridgman
                    Are you Spyhawk on Beyond3D ? I just answered the same question there
                    No. I'm not.

                    Anyways, most definitions of superscalar include VLIW as a subset. Some distinguish between "static superscalar" (VLIW) and "dynamic superscalar". I haven't found any definitions of superscalar which exclude VLIW but I'm sure they exist.
                    There can be found many academic presentations, claiming that Superscalar and VLIW are opposite ways.

                    http://www.haenni.info/thesis/presen...tml/sld006.htm
                    http://csd.ijs.si/courses/trends/tsld008.htm

                    The most important thing is, Eric Demers claimed the same thing:
                    Originally posted by Eric Demers
                    Actually, it's not really superscalar...more like VLIW...
                    http://www.rage3d.com/interviews/atichats/undertheihs/

                    Thats why I'm asking, because it seems most of the sites just copy and paste the same nonsence.

                    Extracting instruction-level-parallelism in the compiler is much more difficult with a typical CPU workload, where most of the operations are scalar. It's the high proportion of short vectors in a graphics or HPC workload which makes a VLIW approach to superscalar GPU hardware attractive.
                    And what about GPGPU? What about scientific applications? Do they have to be compiled with VLIW in mind to run fast on Radeon? Or it is just a problem of driver compiler?

                    Comment


                    • Does Radeon 4200 support OpenCL? Does it support compute shaders in CAL? AMD has made big claims about 4200 being Stream-friendly so I am confused. Is it based on RV7xx SIMDs with shared memory and the whole enchilada?

                      Comment

                      Working...
                      X