Announcement

Collapse
No announcement yet.

Intel MKL-DNN Deep Neural Network Library Benchmarks On Xeon & EPYC

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by coder View Post
    Assuming it wasn't intentionally rigged to make AMD look bad, my guess is it's just using instructions (probably AVX-512, at that) for which AMD has no equivalent. Then, AMD has to fall back on some scalar code path included for the sake of compatibility.

    If you look at specifically which tests are extremely Intel-biased, they're:
    • deconvolution
    • u8s8f32 (meaning: f32 += unsigned 8-bit * signed 8-bit ?)
    Lacking a key instruction used in the optimized deconvolution code path could break AMD in those benchmarks, and getting good performance on the u8s8f32 tests surely depends on having the right instructions for it.
    MKL-DNN chooses the same code path for Intel CPU or AMD given they have the same features.
    Since in the test the flasg -msse4.1 was used the code generated was using features which both CPU's have.

    Comment


    • #12
      Originally posted by Royi View Post
      Also, in DNN accuracy isn’t important, so why not use the Ofast flag which is basically O3 flag with non precise FP.
      I don't know that you can say accuracy is absolutely unimportant. People are willing to make measured tradeoffs, though. Without more domain expertise, I think he should compile it according to the optimized flags in the project's own buildsystem. That way, he'd just be taking the project maintainers' recommendations, with regard to speed/accuracy tradeoffs.

      Originally posted by Royi View Post
      Also, why not use AVX2 for compilation?
      Does --enable-multiarch get you that?

      Originally posted by Royi View Post
      Moreover, while MKL is known for discriminating non Intel CPUs this library doesn’t as it chooses code path based only on CPU features.
      Yeah, but if certain cases are built around using specialized datatypes that are currently supported only on Intel CPUs, then the net effect is the same.

      Comment


      • #13
        Originally posted by coder View Post
        Assuming it wasn't intentionally rigged to make AMD look bad, my guess is it's just using instructions (probably AVX-512, at that) for which AMD has no equivalent. Then, AMD has to fall back on some scalar code path included for the sake of compatibility.

        If you look at specifically which tests are extremely Intel-biased, they're:
        • deconvolution
        • u8s8f32 (meaning: f32 += unsigned 8-bit * signed 8-bit ?)
        Lacking a key instruction used in the optimized deconvolution code path could break AMD in those benchmarks, and getting good performance on the u8s8f32 tests surely depends on having the right instructions for it.

        But, the bottom line is that this benchmark really doesn't tell us how the CPUs compare, unless you happen to be running a workload that's dependent on this specific library. So, I would suggest such strongly-biased tests not be included in PTS.
        I don't think it was rigged. I am just questioning the inclusion as a CPU "performance" metric.
        AVX512 or not, I have a hard time accepting the 40x performance difference, never mind the tests that AMD actually did win.
        I came much to the same conclusion as you.

        Comment

        Working...
        X