Announcement

Collapse
No announcement yet.

SUSE Linux Enterprise / openSUSE Leap Pursuing x86_64-v2 Optimized Libraries

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by Aryma View Post
    doesreally make a big difference between complied lib or software with or without AVX2 ?
    The main beneficiaries are the usual suspects:
    • software A/V codecs
    • HPC
    • machine learning
    • some graphics & imaging code

    Comment


    • #12
      Anyone any idea of expected performance gains ?

      Comment


      • #13
        Originally posted by Slartifartblast View Post
        Anyone any idea of expected performance gains ?
        Michael sometimes benchmarks -march=native, in addition to other compiler options. The delta between that and the default (which is currently equivalent to v1 or baseline x86-64) should give you an upper-bound on the benefit.

        Here's a recent example:


        It should be noted that some of those benchmarks use hand-optimized architecture-specific paths, based on runtime CPU-detection, meaning they show less difference than expected. However, it does show that where the biggest wins exist for the greatest number of users, we already tend to be harnessing the additional capabilities of our CPUs.

        All of that is to say that I don't expect a very notable improvement from most software, or for most users. On the flip side, this could motivate more packages to build with more aggressive auto-vectorization options, providing a little more upside than I'm expecting. Whatever the case, I expect Michael will publish benchmarks putting this new capability in a the best light possible*.

        * People forget that he cherry-picks which benchmarks to publish, and I think he tends to select ones that show the greatest impact from the independent variable.

        Comment


        • #14
          Originally posted by coder View Post
          Michael sometimes benchmarks -march=native, in addition to other compiler options. The delta between that and the default (which is currently equivalent to v1 or baseline x86-64) should give you an upper-bound on the benefit.

          Here's a recent example:


          It should be noted that some of those benchmarks use hand-optimized architecture-specific paths, based on runtime CPU-detection, meaning they show less difference than expected. However, it does show that where the biggest wins exist for the greatest number of users, we already tend to be harnessing the additional capabilities of our CPUs.

          All of that is to say that I don't expect a very notable improvement from most software, or for most users. On the flip side, this could motivate more packages to build with more aggressive auto-vectorization options, providing a little more upside than I'm expecting. Whatever the case, I expect Michael will publish benchmarks putting this new capability in a the best light possible*.

          * People forget that he cherry-picks which benchmarks to publish, and I think he tends to select ones that show the greatest impact from the independent variable.
          Thanks, from reading I believe V3 brings a much more marked improvement than V2.

          Comment


          • #15
            Originally posted by Slartifartblast View Post
            I believe V3 brings a much more marked improvement than V2.
            Agreed.

            I was a little disappointed that there's not a step for AVX1, but that would make for a pretty uneven jump in terms of the years between steps. V1 maps to about 2005, V2 -> 2009, Sandybridge launched in 2011, V3 -> 2013, and V4 started in 2016 but didn't hit mainstream desktop until 2021. So, the intervals would go: 4, 2, 2, 3+ or even 6, 2, 3+ if you simply moved V2 from Nehalem to Sandybridge. Instead, they're 4, 4, 3+ years.

            Anyway, since Sandybridge is now a decade old, I guess it shouldn't need special treatment.

            Comment

            Working...
            X