gpuowl might be an interesting inclusion to the test suite. It's GPL, OpenCL, basically the GPU equivalent to Prime95 for PRP (P95's main function), runs on intel/amd/nvidia including iGPU's and has some of the characteristics of traditional HPC. As a primality tester it essentially does giant FFT's, FP64 compute is important as is memory bandwidth internally so HBM or ways to alleviate bandwidth like infinity cache can boost performance quite a bit.
Here's a community benchmark for an idea of the type of workload it is (times in yellow are estimates to be ignored, columns to care about at the end): https://docs.google.com/spreadsheets...5kayM8cMNTaNJV
It's interesting as things develop particularly because it taxes FP64 and memory which are areas that are currently changing not necessarily in an obvious "bigger number better" way. Traditionally AMD cards were better suited because GCN has a 1:16 FP64 ratio compared to nvidias 1:64, and AMD didn't skimp on bandwidth. AMD's mid/high as mediocre as it was at a lot of things compared to nvidia, tended to beat all non-hbm nvidia offerings here. Then Radeon VII came out with 1:4 or 1:8 FP64 and 1TB/s HBM in a consumer part and set a new high bar that wasn't matched in a consumer part until RDNA2, which compensated for no hbm and poorer ratio by just being that much beefier compute and speed wise and infinity cache made a massive difference. Radeon VII/6950XT are roughly the same speed, a 4090Ti is slower despite being relatively monstrous, datacentre parts breeze past the best consumer parts but are not cost-effective. RDNA3 is where things get weird, a 7900XTX is slower than all three of RadeonVII/6950XT/4090Ti. AMD reduced the fp64 with rdna3 to 1:32 which is a massive blow, but on paper still has more FP64, and memory bandwidth is higher than 6950XT. But now the infinity cache and other memory controller components are split 6 ways which might be a factor (that's all I can think of). If it affects gpuowl it probably affects other workloads.
tl;dr it's an interesting benchmark because it's affected by design choices that are currently diverging between brands and potentially the move to MCM, what the spec sheet says doesn't necesarily reflect performance.
Here's a community benchmark for an idea of the type of workload it is (times in yellow are estimates to be ignored, columns to care about at the end): https://docs.google.com/spreadsheets...5kayM8cMNTaNJV
It's interesting as things develop particularly because it taxes FP64 and memory which are areas that are currently changing not necessarily in an obvious "bigger number better" way. Traditionally AMD cards were better suited because GCN has a 1:16 FP64 ratio compared to nvidias 1:64, and AMD didn't skimp on bandwidth. AMD's mid/high as mediocre as it was at a lot of things compared to nvidia, tended to beat all non-hbm nvidia offerings here. Then Radeon VII came out with 1:4 or 1:8 FP64 and 1TB/s HBM in a consumer part and set a new high bar that wasn't matched in a consumer part until RDNA2, which compensated for no hbm and poorer ratio by just being that much beefier compute and speed wise and infinity cache made a massive difference. Radeon VII/6950XT are roughly the same speed, a 4090Ti is slower despite being relatively monstrous, datacentre parts breeze past the best consumer parts but are not cost-effective. RDNA3 is where things get weird, a 7900XTX is slower than all three of RadeonVII/6950XT/4090Ti. AMD reduced the fp64 with rdna3 to 1:32 which is a massive blow, but on paper still has more FP64, and memory bandwidth is higher than 6950XT. But now the infinity cache and other memory controller components are split 6 ways which might be a factor (that's all I can think of). If it affects gpuowl it probably affects other workloads.
tl;dr it's an interesting benchmark because it's affected by design choices that are currently diverging between brands and potentially the move to MCM, what the spec sheet says doesn't necesarily reflect performance.
Comment