devius A few of things. We're trying to estimate GPU performance through Arrayfire. While arrayfire is a fairly solid library, building ontop of the most optimized libraries itself (clMagma, cuBlas on the CUDA backend), we're going to see results like Cholesky_f64 occasionally. Here's a few factors I think are at play:
- The 780 Ti looks to be a reskinned slightly updated Geforce Titan (Kepler) - The Kepler Titan was a card famous still today (in academic circles) as one of the few good cards for FP64 and was released Feb 2013. Since kepler, f64 performance has basically did nothing but go down by strategic choices/design with the exception of Pascal Teslas.
- The 780 Ti actually has 336 MB/s memory bandwidth vs the 320 MB/s in the 1080, it's possible there are compute unit disparities leaning towards the 780 Ti as well.
- Kepler was the first superscalar series but perhaps vectorized implementations performed a bit better (need to confirm clMagma takes such code paths), I've heard this before a couple of times.
- The next - the 780 Ti's febuary 2013 date is close to the last update of clMagma. Basically there could be a tuning problem in that they didn't take advantage some changes, like thread-block size changes or something else that lead to underutilization of the 1080.
- And wrt AMD's placement - I suspect clMagma supported AMD gpus but has an inherent tuning bias to NVidia cards, this problem runs deep in academic code and linear algebra since it is closely related- although the datedness of clMagma again applies to these cards - OTOH they did quite well on Cholesky_f32 which suggests maybe AMD just sucked hard at fp64. They've been known too since Terrascale VLIW generations (basically anything GCN or after).
Comment