I saw the Himeno results in the article: https://www.phoronix.com/scan.php?pa...0x-linux&num=8 Something seemed way off, but then I saw the Forton version of Himeno doesn't behave the same at all. It seems the performance dip has nothing to do with AVX2 capability, but that the matrix indexing in the C version is very inefficient. All the scores are greatly improved by just fixing the indexing. A blog post here mentions this: https://blogs.fau.de/hager/archives/7850 You can also make this multithreaded using OpenMP as per the same blog post, but that's another discussion to be had. I made a repo with the indexing fix to make the C version of Himeno perform closer to its original Fortran variant and the performance anomaly on modern AMD CPUs goes away entirely. I put it in a repo here for review https://github.com/kowsalyaChidambaram/Himeno-Benchmark
Himeno Benchmark
Collapse
X
Comment