Kinda makes me wonder why this wasn't done nearly 20 years ago? Even though there are many good reasons for the CPU to do the work, having the option for the GPU to do "raw" C code would still make sense and perhaps would have seen CUDA with a lot less adoption.
How AMD Is Taking Standard C/C++ Code To Run Directly On GPUs
Collapse
X
-
Originally posted by habilain View Post
GPUs are turing complete these days, so "slow" rather than "not possible". I think at the moment it's mainly "this code can be run on the GPU" rather than automatic offloading - which, while it would be nice for the lazy programmer, would probably not be possible on a standard GPU architecture. That's mainly because the GPU and CPU do not share the same memory, so offloading a loop from CPU to GPU has a high fixed cost and I really don't see a good way of being able to automatically determine when it's worth doing.
Comment
-
-
Guest
Originally posted by ddriver View PostBad idea,
Comment
-
This is a great idea, here's what I would like to see happen:
They create a simple, Python style language, that has easy to learn and read syntax and is easily expandable via libraries.
Then add a function decorator that allows a programmer to specify which parts of the code should be run where, for instance:
@_CPU
def function(ONE)
@_GPU
def function(TWO)
@_CPU_128_SIMD
def function(Three)
I guess what I am saying is write a compiler for Python that allows a programmer to specify on what hardware he wants that code portion to be executed.Last edited by sophisticles; 11 December 2024, 03:02 PM.
Comment
-
-
Originally posted by schmidtbag View PostKinda makes me wonder why this wasn't done nearly 20 years ago? Even though there are many good reasons for the CPU to do the work, having the option for the GPU to do "raw" C code would still make sense and perhaps would have seen CUDA with a lot less adoption.
Comment
-
-
Originally posted by carewolf View PostUsing the GPU integrated into the CPU that use the same memory?
Even if it is possible, I'm not sure there'd be that much benefit. APUs aren't that powerful - and it would likely still need special code to get decent performance. For example, double precision floating points tend to be quite slow on a GPU.
Comment
-
-
Originally posted by habilain View Post
Probably not possible? ROCm doesn't officially support APUs, so I'm pretty sure ROCm doesn't support zero copy. Or if it does, that's weird.
Even if it is possible, I'm not sure there'd be that much benefit. APUs aren't that powerful - and it would likely still need special code to get decent performance. For example, double precision floating points tend to be quite slow on a GPU.
Comment
-
Comment