Announcement

**d2kx** · 26 September 2009, 01:40 PM

If he's fine with his 3850's performance, I'd suggest waiting for the the 56xx (dunno) or 57xx (october) series.

**vrodic** · 26 September 2009, 03:08 PM

Originally posted by d2kx View Post

If he's fine with his 3850's performance, I'd suggest waiting for the the 56xx (dunno) or 57xx (october) series.

Yes, I'll do that.

**vrodic** · 26 September 2009, 03:11 PM

Originally posted by bridgman View Post

I can't really comment on unreleased products, unfortunately.

My understanding is that the current F@H implementation uses essentially the same code for 6xx and 7xx parts, so it does not take advantage of LDS/GDS on the 7xx parts. I imagine that's where the discussion of "calculate twice vs store and re-use" comes from.

It's strange that ATI didn't work with folding@home to fix such a high profile GPGPU application.

I guess everything will be better and easier to fix when we have both opensource OpenCL implementation, and when (if) folding@home releases their code. Since FAH is based on gromacs, and gromacs is open source, I guess this shouldn't be that impossible. I'll go back to my little hole now, and wait for OpenCL and 57xx, in that order.

**Dandeloreon** · 13 October 2009, 05:21 PM

When will the documentation for the RS780 series of graphics chips be coming out along with documentation on handling the power management features of these chips.

**bridgman** · 13 October 2009, 05:26 PM

Scroll down to the "Chipset Guides and Documentation" section :

http://developer.amd.com/documentati...s/default.aspx

Most of the information required for power management is already out there - the main issue is that dynamic power management really needs to be implemented in a KMS-enabled DRM so that the PM code can (a) have access to all the required activity information, and (b) avoid hardware access conflicts between PM code (which needs to be in drm) and modesetting code. The issue there is that a couple of register locations are used for both PM and modesetting functions.

There are a couple of things we still need to document -- some missing bits in the AtomBIOS power-related tables and the on-chip fan controller for sure. They're next on the list after we get interrupts working on 6xx/7xx.

**next9** · 16 October 2009, 06:16 PM

Superscalar vs. VLIW

I'm not sure this is the right thread, but I think it is better to ask AMD/ATI dev. Various journalists, portals, and forum members all around the internet call ATI R600-R800 architecture Superscalar.

But, AFAIK the ATI architecture is VLIW, not superscalar. Both superscalar and VLIW ways achieve the same goals. But these implementations are different. Superscalar architecture use HW dependency checking among the instructions. This means the chip is bigger. On the other hand, VLIW use SW depenency checking, so it depends heavily on compiler thus chips can be smaler.

So, it seems to me, ATI chose the VLIW way (HD5870 has 320 VLIW cores) and nVidia superscalar way (GT200 has 120 superscalar, or 240 scalar cores).

Do I understand it right? Is it ATI architecture VLIW and relay heavily on compiler to do instruction dependency checking?

**Michael** · 16 October 2009, 06:18 PM

To those of you that asked most of the questions in this thread, it doesn't look like AMD is going to ever finish the formal Q&A... So you can probably stop asking questions.

**bridgman** · 16 October 2009, 06:32 PM

Originally posted by next9 View Post

I'm not sure this is the right thread, but I think it is better to ask AMD/ATI dev. Various journalists, portals, and forum members all around the internet call ATI R600-R800 architecture Superscalar.

But, AFAIK the ATI architecture is VLIW, not superscalar. Both superscalar and VLIW ways achieve the same goals. But these implementations are different. Superscalar architecture use HW dependency checking among the instructions. This means the chip is bigger. On the other hand, VLIW use SW depenency checking, so it depends heavily on compiler thus chips can be smaler.

So, it seems to me, ATI chose the VLIW way (HD5870 has 320 VLIW cores) and nVidia superscalar way (GT200 has 120 superscalar, or 240 scalar cores).

Do I understand it right? Is it ATI architecture VLIW and relay heavily on compiler to do instruction dependency checking?

Are you Spyhawk on Beyond3D ? I just answered the same question there

Anyways, most definitions of superscalar include VLIW as a subset. Some distinguish between "static superscalar" (VLIW) and "dynamic superscalar". I haven't found any definitions of superscalar which exclude VLIW but I'm sure they exist.

ATI GPUs are superscalar via VLIW, or just "VLIW" if you don't consider VLIW to be a subset of superscalar. They do depend on having the shader compiler identify instruction level parallelism, but since most graphics operations deal with 3- or 4-element vectors anyways (pixels are almost always RGBA, vertices and normals are either float3 or float4) you can get decently high ALU utilization even with a simple translator like we use in the r600 mesa driver today. The approach is similar to the vector+scalar ALUs we used in r3xx-r5xx, but more general and so more useful for compute workloads.

Extracting instruction-level-parallelism in the compiler is much more difficult with a typical CPU workload, where most of the operations are scalar. It's the high proportion of short vectors in a graphics or HPC workload which makes a VLIW approach to superscalar GPU hardware attractive.

**next9** · 17 October 2009, 03:42 AM

Originally posted by bridgman

Are you Spyhawk on Beyond3D ? I just answered the same question there

No. I'm not.

Anyways, most definitions of superscalar include VLIW as a subset. Some distinguish between "static superscalar" (VLIW) and "dynamic superscalar". I haven't found any definitions of superscalar which exclude VLIW but I'm sure they exist.

There can be found many academic presentations, claiming that Superscalar and VLIW are opposite ways.

404 Not Found

http://www.haenni.info/thesis/presentations/noptimization_html/sld006.htm

Superscalar vs. VLIW

http://csd.ijs.si/courses/trends/tsld008.htm

The most important thing is, Eric Demers claimed the same thing:

Originally posted by Eric Demers

Actually, it's not really superscalar...more like VLIW...

404 Not Found

http://www.rage3d.com/interviews/atichats/undertheihs/

Thats why I'm asking, because it seems most of the sites just copy and paste the same nonsence.

Extracting instruction-level-parallelism in the compiler is much more difficult with a typical CPU workload, where most of the operations are scalar. It's the high proportion of short vectors in a graphics or HPC workload which makes a VLIW approach to superscalar GPU hardware attractive.

And what about GPGPU? What about scientific applications? Do they have to be compiled with VLIW in mind to run fast on Radeon? Or it is just a problem of driver compiler?

**codedivine** · 17 October 2009, 04:54 AM

Does Radeon 4200 support OpenCL? Does it support compute shaders in CAL? AMD has made big claims about 4200 being Stream-friendly so I am confused. Is it based on RV7xx SIMDs with shared memory and the whole enchilada?

Announcement

"Ask ATI" dev thread

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment