GCC Lands AVX-512 Fully-Masked Vectorization

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • max0x7ba
    replied
    Originally posted by phoronix View Post
    Phoronix: GCC Lands AVX-512 Fully-Masked Vectorization

    Stemming from looking at the generated x264 video encode binary and some performance inefficiencies, SUSE engineers have worked out AVX-512 fully masked vectorization support for the GCC 14 development code...

    https://www.phoronix.com/news/GCC-AV...-Masked-Vector
    Somehow, trying to apply avx-512 to short vectors and complaining about that makes big news and strong opinions.

    Applying avx-512 to long vectors to speed-up matrix multiplications works exceptionally well, even Linus regrets complaining about avx-512 and keeps quiet.

    Leave a comment:


  • fahrenheit
    replied
    from the commit message I think the most fun part is that the 128 bit implementation, when used for iterations that are a power of 2 are the fastest ones. Additionally the masked epilog version is also faster than the fully masked version. The table does highlight the bug in that the 512bit code was not very performant often losing to the 256 and 128 bit versions.

    One of the motivating testcases is from PR108410 which in turn
    is extracted from x264 where large size vectorization shows
    issues with small trip loops. Execution time there improves
    compared to classic AVX512 with AVX2 epilogues for the cases
    of less than 32 iterations.

    sz scal 128 256 512 512e 512f
    1 9.42 11.32 9.35 11.17 15.13 16.89
    2 5.72 6.53 6.66 6.66 7.62 8.56
    3 4.49 5.10 5.10 5.74 5.08 5.73
    4 4.10 4.33 4.29 5.21 3.79 4.25
    6 3.78 3.85 3.86 4.76 2.54 2.85
    8 3.64 1.89 3.76 4.50 1.92 2.16
    12 3.56 2.21 3.75 4.26 1.26 1.42
    16 3.36 0.83 1.06 4.16 0.95 1.07
    20 3.39 1.42 1.33 4.07 0.75 0.85
    24 3.23 0.66 1.72 4.22 0.62 0.70
    28 3.18 1.09 2.04 4.20 0.54 0.61
    32 3.16 0.47 0.41 0.41 0.47 0.53
    34 3.16 0.67 0.61 0.56 0.44 0.50
    38 3.19 0.95 0.95 0.82 0.40 0.45
    42 3.09 0.58 1.21 1.13 0.36 0.40


    'size' specifies the number of actual iterations, 512e is for
    a masked epilog and 512f for the fully masked loop. From
    4 scalar iterations on the AVX512 masked epilog code is clearly
    the winner, the fully masked variant is clearly worse and
    it's size benefit is also tiny.​

    Leave a comment:


  • SofS
    replied
    I would say that it is not so much AVX-512 itself, but how intel fragmented it. Look at how the subsets are supported (from wikipedia).

    maim.png

    The clock issue with Intel implementations also did not help.
    Last edited by SofS; 19 June 2023, 11:52 AM.

    Leave a comment:


  • raystriker
    replied
    Originally posted by pWe00Iri3e7Z9lHOX2Qx View Post
    Yes AMD's first implementation was better than Intel's first implementation, but it damn sure better be half a decade after their competitor did it and on a TSMC 5nm node vs Intel 14nm.
    We don't know if it's a node thing though?

    Leave a comment:


  • Anux
    replied
    Originally posted by pWe00Iri3e7Z9lHOX2Qx View Post
    Intel introduces AVX-512 in 2017: Boo! Hiss! Stop with the magic function garbage!
    You left out one important fact, by using AVX their whole CPU was throttled and everything worked slower, which ruled their implementation useless for mixed workloads.
    TSMC 5nm node vs Intel 14nm.
    The problem was not node specific, it was just a bad implementation.

    Leave a comment:


  • user556
    replied
    Intel's lacklustre efforts to remedy the power consumption problem is probably more than a little to do with them betting on the E-cores. If those didn't exist then the P-Cores would be better for it, IMHO.

    Leave a comment:


  • pWe00Iri3e7Z9lHOX2Qx
    replied
    Originally posted by schmidtbag View Post
    Devil's advocate here:
    AVX512 was very poorly received by the Linux community, primarily because Linus reamed them out for making it. Perhaps Intel was like "fine, then I'm not doing any more work".
    Makes me wonder too if dropping support for it on desktop platforms (I think it was Alder Lake?) was a way to test how much people were going to care if it went away.
    I did find the disparity in the community's reaction pretty silly.

    Intel introduces AVX-512 in 2017: Boo! Hiss! Stop with the magic function garbage!

    AMD adds AVX-512 in 2022: Yay! Amaze balls!

    Yes AMD's first implementation was better than Intel's first implementation, but it damn sure better be half a decade after their competitor did it and on a TSMC 5nm node vs Intel 14nm.

    Leave a comment:


  • Linuxxx
    replied
    Originally posted by Joe2021 View Post
    Dear Mr. Intel,

    I am sorry to point it out that directly, but your policy regarding AVX512 is no longer comprehensible. Solve this issue asap. You even can't blame AMD or someone else for the situation you are in - AVX512 is YOUR child and YOU had all the opportunities to create compelling products with it. I do not care for the reasons you failed in this regard. Just fix it. Just deliver.

    Sincerely,

    Mr. Customer.
    Working great on my Rocket Lake...

    Leave a comment:


  • Anux
    replied
    Originally posted by schmidtbag View Post
    AVX512 was very poorly received by the Linux community, primarily because Linus reamed them out for making it.
    But that has nothing to do with it's success, esle C++ or Nvidia would also have taken another route.

    Them leaving AVX512 away has more to do with their shortcomings in process nodes and inability to develop efficient hardware.

    Leave a comment:


  • schmidtbag
    replied
    Originally posted by Joe2021 View Post
    Dear Mr. Intel,

    I am sorry to point it out that directly, but your policy regarding AVX512 is no longer comprehensible. Solve this issue asap. You even can't blame AMD or someone else for the situation you are in - AVX512 is YOUR child and YOU had all the opportunities to create compelling products with it. I do not care for the reasons you failed in this regard. Just fix it. Just deliver.

    Sincerely,

    Mr. Customer.
    Devil's advocate here:
    AVX512 was very poorly received by the Linux community, primarily because Linus reamed them out for making it. Perhaps Intel was like "fine, then I'm not doing any more work".
    Makes me wonder too if dropping support for it on desktop platforms (I think it was Alder Lake?) was a way to test how much people were going to care if it went away.

    Leave a comment:

Working...
X