Announcement

Collapse
No announcement yet.

Core i9 7900X vs. Threadripper 1950X On Ubuntu 17.10, Antergos, Clear Linux

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #41
    Originally posted by arjan_intel View Post
    this is not actually correct. FMA is still allowed even at O2;
    I don't know what "allowed even at O2" is supposed to mean. It happens at -O2 if you enable -mfma explicitly or indirectly through -march=...
    However, C99 requires that contractions only happen when there is a corresponding source level expression. GCC ignores this and will contract even where not allowed. Some references:
    https://gcc.gnu.org/c99status.html
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37845#c5

    Originally posted by arjan_intel View Post
    the result is not less accurate than without FMA... it's a little more accurate.
    The result is changed and becomes unpredictable.

    Comment


    • #42
      Originally posted by arjan_intel View Post

      It's mostly the math-heavy things (libm from glibc, the BLAS library of your choice, etc) where there is a real split in performance between generations. The AVX2+FMA line is a split where performance fundamentally changes.
      (On, say, Skylake cpus like this core i9, a float point add takes 4 cycles, a floating point mul takes 4 cycles, but a FMA (multiple and add) ALSO takes 4 cycles. this means that code that does lots of multiply and adds on FP can get significant benefit)
      So I've just done a quick test but it wasn't that exciting :
      I rebuilt yasm, glibc (that took a lot longer than I expected), ffmpeg and x264 with march=native (haswell) 02, and I gained about 20 seconds in encoding a 20m long video, 20 seconds looks not too bad, but that was actually only less than 3% of the total time...
      Maybe with 03 or other flags it would be a bigger percentage as in Michael's test.
      Last edited by geearf; 09-13-2017, 11:55 AM.

      Comment


      • #43
        With ffmpeg this is not surprising, since ffmpeg employs runtime CPU detection to automatically switch code paths. With -O3 and potentially other options you may see bigger differences.

        Comment

        Working...
        X