As far as I know RDNA3 has/had some issues too.
The main difference vs RDNA2 is that it is dual issue architecture, but it's not always working. And it can't dual issue packed math - so in reduced precision workloads its peak performance is basically halved. Sadly, we don't have clpeak results for half precision here.
It wouldn't be a problem if the cards were not nerfed vs RDNA2 - for example we have 4608 cores in 6800XT vs 3840 cores in 7800XT.
And it seems to show somewhat, I ran clpeak on my undervolted 6800xt (2.2Ghz max)
- ~19814 in single precision,
- ~38800 half precision
- ~3860 in integer compute
- not to mention ~1225 double precision (RDNA2 has 1:16 FP64 instead of 1:32).
It reaches almost peak theoretical performance (4608 * 2200mhz * 2 = ~20Tflops). Since it beats higher clocked 7800XT. I guess there are still issues with RDNA3 dual issuing
Even for FP32.
Having a look here:
we can see that RDNA3 is faster only for FP32.
I wonder if there is performance difference when running XeSS (INT8) on RDNA2 vs RDNA3 if 6900XT can be 30% faster than 7900XTX.
Please take it with a grain of salt, as I am not an GPU expert in any way
But these RDNA3 results are way below their advertised performance.
7800XT should push 37 TFLOPS with FP32, not 16.5.
edit:
As a sidenote, I wonder what they are doing.
- they have CDNA which is strong at compute and with AI (WMMA instructions) but not for consumers
- and RDNA which doesn't have WMMA and only does FP16 (2x faster than FP32)
- and RDNA2 which doesn't have WMMA but is strong at general compute and rapid packed math down to INT4 (8x faster than FP32)
- and RDNA3 which is poor at packed math but has WMMA for AI.
- and RDNA4 which again will be something different.
I wonder how they could do FSR using AI if every single their GPU generation has different compute properties?
The main difference vs RDNA2 is that it is dual issue architecture, but it's not always working. And it can't dual issue packed math - so in reduced precision workloads its peak performance is basically halved. Sadly, we don't have clpeak results for half precision here.
It wouldn't be a problem if the cards were not nerfed vs RDNA2 - for example we have 4608 cores in 6800XT vs 3840 cores in 7800XT.
And it seems to show somewhat, I ran clpeak on my undervolted 6800xt (2.2Ghz max)
- ~19814 in single precision,
- ~38800 half precision
- ~3860 in integer compute
- not to mention ~1225 double precision (RDNA2 has 1:16 FP64 instead of 1:32).
It reaches almost peak theoretical performance (4608 * 2200mhz * 2 = ~20Tflops). Since it beats higher clocked 7800XT. I guess there are still issues with RDNA3 dual issuing

Having a look here:
we can see that RDNA3 is faster only for FP32.
I wonder if there is performance difference when running XeSS (INT8) on RDNA2 vs RDNA3 if 6900XT can be 30% faster than 7900XTX.
Please take it with a grain of salt, as I am not an GPU expert in any way

7800XT should push 37 TFLOPS with FP32, not 16.5.
edit:
As a sidenote, I wonder what they are doing.
- they have CDNA which is strong at compute and with AI (WMMA instructions) but not for consumers
- and RDNA which doesn't have WMMA and only does FP16 (2x faster than FP32)
- and RDNA2 which doesn't have WMMA but is strong at general compute and rapid packed math down to INT4 (8x faster than FP32)
- and RDNA3 which is poor at packed math but has WMMA for AI.
- and RDNA4 which again will be something different.
I wonder how they could do FSR using AI if every single their GPU generation has different compute properties?
Comment