No announcement yet.

Potential new benchmark: gpuowl

  • Filter
  • Time
  • Show
Clear All
new posts

  • Potential new benchmark: gpuowl

    gpuowl might be an interesting inclusion to the test suite. It's GPL, OpenCL, basically the GPU equivalent to Prime95 for PRP (P95's main function), runs on intel/amd/nvidia including iGPU's and has some of the characteristics of traditional HPC. As a primality tester it essentially does giant FFT's, FP64 compute is important as is memory bandwidth internally so HBM or ways to alleviate bandwidth like infinity cache can boost performance quite a bit.

    GPU Mersenne primality test. Contribute to preda/gpuowl development by creating an account on GitHub.

    Here's a community benchmark for an idea of the type of workload it is (times in yellow are estimates to be ignored, columns to care about at the end):

    It's interesting as things develop particularly because it taxes FP64 and memory which are areas that are currently changing not necessarily in an obvious "bigger number better" way. Traditionally AMD cards were better suited because GCN has a 1:16 FP64 ratio compared to nvidias 1:64, and AMD didn't skimp on bandwidth. AMD's mid/high as mediocre as it was at a lot of things compared to nvidia, tended to beat all non-hbm nvidia offerings here. Then Radeon VII came out with 1:4 or 1:8 FP64 and 1TB/s HBM in a consumer part and set a new high bar that wasn't matched in a consumer part until RDNA2, which compensated for no hbm and poorer ratio by just being that much beefier compute and speed wise and infinity cache made a massive difference. Radeon VII/6950XT are roughly the same speed, a 4090Ti is slower despite being relatively monstrous, datacentre parts breeze past the best consumer parts but are not cost-effective. RDNA3 is where things get weird, a 7900XTX is slower than all three of RadeonVII/6950XT/4090Ti. AMD reduced the fp64 with rdna3 to 1:32 which is a massive blow, but on paper still has more FP64, and memory bandwidth is higher than 6950XT. But now the infinity cache and other memory controller components are split 6 ways which might be a factor (that's all I can think of). If it affects gpuowl it probably affects other workloads.

    tl;dr it's an interesting benchmark because it's affected by design choices that are currently diverging between brands and potentially the move to MCM, what the spec sheet says doesn't necesarily reflect performance.‚Äč

  • #2
    We over at the Mersenne Forum noticed this new test was added, but may we request a new exponent be tested? 130000007 would do for the next few years. It's a little ahead of the current work and will be a good benchmark for people looking to purchase hardware for running gpuOwl.

    The existing exponent 77936867 would be good to keep as it's used for comparison purposes between hardware by (that page refers to CUDALucas, but benchmarks for gpuOwl are also wanted as it's used on AMD hardware).

    The existing exponent 57885161 is not really useful to benchmark, though it's noteworthy as 257885161-1 is prime.