If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.
Does Radeon 4200 support OpenCL? Does it support compute shaders in CAL? AMD has made big claims about 4200 being Stream-friendly so I am confused. Is it based on RV7xx SIMDs with shared memory and the whole enchilada?
According to http://en.wikipedia.org/wiki/Compari....2C_HD_4xxx.29, the integrated HD 4200 GPU is a rv620 core, like my mobility radeon 3470, and as such it doesn't support double precision and other memory-related requisites for AMD's OpenCL driver. Real r700 or newer cores are required.
The second one is particularly interesting, since it distinguishes between "static superscalar" and VLIW, but using those definitions our core falls into the static superscalar bucket because instructions can use the results of the previous instruction.
I think there is a slight trend towards reserving the "superscalar" term for dynamic extraction of instruction-level parallelism and using "VLIW" for compile-time ILP extraction, but it seems to be pretty recent (ie after the chips were designed). Today you can find both definitions fairly easily.
Thats why I'm asking, because it seems most of the sites just copy and paste the same nonsence.
If the trend towards defining "superscalar" to exclude VLIW I mentioned above is real I imagine we will shift our usage accordingly (and Eric's comment supports that). In the meantime I think the big question is "which definition of superscalar do you subscribe to ?". If you don't consider VLIW to be a subset of superscalar, then we're VLIW. If you do consider VLIW to be one a subset of superscalar, then we're superscalar via VLIW. I guess I don't understand all the fuss.
And what about GPGPU? What about scientific applications? Do they have to be compiled with VLIW in mind to run fast on Radeon? Or it is just a problem of driver compiler?
The compiler usually seems to be able to optimize to the point where the algorithm is running fetch-limited, ie where further ALU optimization would not make a difference. Tweaking for a specific architecture (whether ours or someone elses) usually seems to focus on optimizing memory accesses more than ALU operations.
There are probably exceptions where tweaking the code to match the ALU architecture can get a speedup but in general it seems that optimizing I/O is what makes the biggest difference on all architectures these days.
Last edited by bridgman; 17 October 2009, 11:55 AM.
Does Radeon 4200 support OpenCL? Does it support compute shaders in CAL? AMD has made big claims about 4200 being Stream-friendly so I am confused. Is it based on RV7xx SIMDs with shared memory and the whole enchilada?
As Loris said, the HD4200 IGP uses a 3D engine from the RV620 so it has Stream Processors (what we call the unified shaders introduced with r600) and supports the Stream framework (CAL etc..) but does not have all the features from the RV7xx 3D engine. It does not have the per-SIMD LDS, not sure about GDS. I don't believe the OpenCL implementation supports the HD4200, since OpenCL makes heavy use of the shared memory blocks.
Not sure about DX11 Compute Shaders but I believe they will run on the HD4200 hardware. Be aware that there are different levels of Compute Shader support, however (CS 4.0, 4.1, 5.0 IIRC), and Compute Shader 5.0 requires DX11 hardware (ie HD5xxx).
Last edited by bridgman; 17 October 2009, 01:32 PM.
If you don't consider VLIW to be a subset of superscalar, then we're VLIW. If you do consider VLIW to be one a subset of superscalar, then we're superscalar via VLIW. I guess I don't understand all the fuss.
Great. Now I understand. I prefer Engineers over marketing guys, and this seemed to me like:
"Hey. nVidia has a scalar architecure. Lets say to our customers, we have superscalar architetcure" - marketing bullshit.
nVidia started to use term "stream processor". After that, ATI started to use term "stream processor. But ATI SP and nVidia SP are something different. Higher number of SP seems to be better in marketing material, no matter these are apples to oranges. Thats how it works every day.
Thats why I ask developer or engineer, instead of asking marketing guy. No matter what definition we use, it is clear how it works.
The compiler usually seems to be able to optimize to the point where the algorithm is running fetch-limited, ie where further ALU optimization would not make a difference. Tweaking for a specific architecture (whether ours or someone else) usually seems to focus on optimizing memory accesses more than ALU operations.
There are probably exceptions where tweaking the code to match the ALU architecture can get a speedup but in general it seems that optimizing I/O is what makes the biggest difference on all architectures these days.
I think, it is clear. Let me ask another question. If VLIW does not mean the problem in GPGPU, what is the reason of lower Radeon performance in typical GPGPU popular application Folding@home? I have seen some graphs, where 9600/9800GT were faster than Radeon HD4890, which does not make a sense to me.
Great. Now I understand. I prefer Engineers over marketing guys, and this seemed to me like:
"Hey. nVidia has a scalar architecure. Lets say to our customers, we have superscalar architetcure" - marketing bullshit.
Yeah, I dread the day when someone develops an architecture that can reasonably be described as "superduperscalar".
For what it's worth, we did talk about the design as "superscalar" inside engineering, it's not just something marketing created. I suspect the tendency to exclude VLIW from the definition of superscalar mostly happened after the unified shader core was designed.
nVidia started to use term "stream processor". After that, ATI started to use term "stream processor. But ATI SP and nVidia SP are something different. Higher number of SP seems to be better in marketing material, no matter these are apples to oranges. Thats how it works every day.
Thats why I ask developer or engineer, instead of asking marketing guy. No matter what definition we use, it is clear how it works.
AFAIK the SPs are relatively similar in terms of what they can do. The tradeoff is partly "a smaller number of SPs at a higher clock speed vs a larger number of SPs at a lower clock speed" and partly "scalar vs superscaler... err... VLIW". Every vendor chooses the approach they think is best, and eventually they converge on something that isn't quite what any of them had in mind at the start.
I think, it is clear. Let me ask another question. If VLIW does not mean the problem in GPGPU, what is the reason of lower Radeon performance in typical GPGPU popular application Folding@home? I have seen some graphs, where 9600/9800GT were faster than Radeon HD4890, which does not make a sense to me.
Just going from what I have read, the core issue is that the F@H client is running basically the same code paths on 6xx and 7xx rather than taking advantage of the additional capabilities in 7xx hardware. Rather than rewriting the GPU2 client for 7xx and up I *think* the plan is to focus on OpenCL and the upcoming GPU3 client.
The current F@H implementation on ATI hardware seems to have to do the force calculations twice rather than being able to store and re-use them -- storing and re-using is feasible on newer ATI GPUs but not on the earlier 6xx parts. BTW it appears that FLOPs for the duplicated calculations are not counted in the stats.
There also seems to be a big variation in relative performance depending on the size of the protein, with ATI and competing hardware being quite close on large proteins even though we are doing some of the calculations twice. There have been a couple of requests from folding users to push large proteins to ATI users and small proteins to NVidia users, not sure of the status.
There also seem to be long threads about the way points are measured. Some of the discussions (see link, around page 4) imply that the performance difference on small proteins may be a quirk of the points mechanism rather than an actual difference in throughput, but I have to admit I don't fully understand the argument there :
I saw that 57xx doesn't have double floating point precision support, so 57xx is out of the question for me. Will OpenCL implementation from AMD support double floating point precision emulation using GPU hardware?
Also, what are the numbers on integer crunching?
hat about parallel kernel execution support that is announced from nVidia?
Does AMD support parallel execution of multiple compute kernels?
Other things I've noticed that are cool about Fermi are ECC support, syscall support, developer configurable caching/manageable memory schemes (for SP local memory).
but Physx is Nvidia... not ati... and most of the time I actually like what ati does... if they release a driver that works then I'm happy... add a few features here and there and it makes it even better... But no Physx... unless ati can get nvidia to release it... but nvidia hasn't even done that for linux yet(i think)
I'm running Ubuntu 9.10 64bit right now, and all worked until I installed the ati drivers (classic, huh?)
The drivers that came with Ubuntu were fine, but I thought I'd update to 9.10 so I installed it and then rebooted and I lost acceleration.
I discovered a /usr/lib64 directory had been created with all the graphics libs. Considering ubuntu doesnt use /usr/lib64 i knew it had to be moved, so I moved it to /usr/lib and then rebooted and I didnt get X at all this time.
It complained about not being able to find a amdpcsdb.default file. So i copied amdpcsdb to amdpcsdb.default and then I got X, but now I have a green water mark on the bottom right hand corner of the scren saying:
Comment