Originally posted by airlied
View Post
Announcement
Collapse
No announcement yet.
David Airlie Tries DOOM On CPU-Based Lavapipe Vulkan
Collapse
X
-
Guest replied
-
Originally posted by c0d1f1ed View PostIt's a common misconception that a CPU core can be compared to a GPU core. What most GPUs call a core is actually a 32-bit SIMD lane. A CPU with 64 cores with two AVX-512 SIMD units each actually has 2K of those, and would be capable of 8 SP FMA TFLOPS of throughput @ 2 GHz.
You could either say "the vector unit is 4 times as wide and that compensates for running at 1/4 the engine clock and so a single SIMD = a CPU core" or say that "putting all four SIMDs together compensates for running at 1/4 the engine clock and so a CU = a CPU core". We take the more conservative approach in our marketing blurb and talk about CPU cores and CUs.
For RDNA each CU has two SIMDs, each with a scalar processor and 1024-bit vector unit running at full engine clock, so it's probably easiest to say SIMD = CPU core. By that logic a 6900XT would have 160 cores each with "AVX-1024".
Originally posted by coder View Postand GPUs have famously loose memory consistency guarantees.Last edited by bridgman; 21 April 2021, 10:18 AM.
- Likes 1
Leave a comment:
-
Originally posted by c0d1f1ed View PostIt's a common misconception that a CPU core can be compared to a GPU core. What most GPUs call a core is actually a 32-bit SIMD lane. A CPU with 64 cores with two AVX-512 SIMD units each actually has 2K of those, and would be capable of 8 SP FMA TFLOPS of throughput @ 2 GHz.
Intel's latest Ice Lake CPUs feature up to 40 cores and 2x AVX-512 FMA per core. So, that'd be the equivalent of 1280 GPU "cores" or "shaders". Though its base freq is 2.3 GHz, 2.0 GHz probably isn't a bad estimate, since even Ice Lake still clock-throttles under heavy AVX-512 utilization. Assuming a 2x 16x fp32 FMAs per core per cycle, that amounts 5.1 TFLOPS @ 2.0 GHz. By comparison, a RTX 3090 is rated at 29.4 and a RX 6900 XT advertises 18.7 TFLOPS.
Memory bandwidth is another area in GPUs' favor. The Ice Lake 8380 has a nominal bandwidth of about 205 GB/s, whereas the RTX 3090 advertises 936 and RX 6900 XT has a nominal GDDR6 bandwidth of 512 GB/s (if we counted Infinity Cache, then we'd have to compare it with the CPU's L3 bandwidth).
However, as I mentioned, GPU performance isn't only about the shaders, or else they'd all look like AMD's new CDNA -- with no ROPs, texture samplers, tessellators, RT cores, etc. So, we'd really need to look beyond the TFLOPS. I know you, of all people, are well aware of this. I'm just mentioning it for arQon or anyone else who might not be paying attention to that stuff.
GPUs also have other advantages, like much greater SMT (Ampere is 64-way?) and many more registers (Ampere is up to 255 SIMD registers per warp). By comparison, x86 is just 2-way SMT (but has OoO) and AVX-512 has just 32 architectural registers per thread. Ampere also has other SIMD refinements you won't find in AVX-512, and GPUs have famously loose memory consistency guarantees.
CPUs are just no match for GPUs, at their own game. Intel didn't believe this, until 2 generations of Xeon Phi accelerators couldn't even catch the prior generations of GPUs' compute performance!Last edited by coder; 20 April 2021, 09:20 PM.
Leave a comment:
-
Originally posted by arQon View Postcool to see, but hardly a surprising outcome. 16 - or even 64 - general purpose cores are SO far from the 2K-3K specialised EUs/etc in even 7900/700-series GPUs that it's not even funny. The CPU may be 2x faster, but it's still facing a 100+x difference in throughput ability even before you factor in just how much faster those EUs are AT this kind of work.
Leave a comment:
-
cool to see, but hardly a surprising outcome. 16 - or even 64 - general purpose cores are SO far from the 2K-3K specialised EUs/etc in even 7900/700-series GPUs that it's not even funny. The CPU may be 2x faster, but it's still facing a 100+x difference in throughput ability even before you factor in just how much faster those EUs are AT this kind of work.
Leave a comment:
-
Congratulations to all involved! I mean - wow! Having something like DOOM run on a software stack is pretty much totally insane - whatever the frame rates are. And for software based rendering they are pretty good in my book, especially if there could still some low-ish hanging fruits be identifed.
Kudos to all of you
- Likes 1
Leave a comment:
-
Hey, at least it runs. I've heard that normally a software implementation of a graphics API would crash on a very intensive game...
Originally posted by torsionbar28 View Post8266752-core Fugaku is the solution.
Originally posted by Etherman View PostCool, up to 10 fpm of pure performane.
- Likes 3
Leave a comment:
-
-
Originally posted by coder View PostIt's interesting to hear how this is progressing.
I wonder if there's a good, generic way to profile JIT code. operf certainly hasn't done me much good, but then I haven't really looked into it, either.
An order of magnitude less than real GPU performance probably isn't unreasonable to hope for, though it probably depends somewhat on how much the app leans on HW features vs. generic shader code.
As for profiling, llvm has perf integration now, I can at least see in perf report what assembly is eating up CPU, though mapping that back to fragment shader source is always tricky.
Dave.
- Likes 6
Leave a comment:
Leave a comment: