Announcement

Collapse
No announcement yet.

Queued Linux Patches To Better Track AVX-512, Allowing For More Optimal Task Placement

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Queued Linux Patches To Better Track AVX-512, Allowing For More Optimal Task Placement

    Phoronix: Queued Linux Patches To Better Track AVX-512, Allowing For More Optimal Task Placement

    After going through several rounds of patch review in recent months, a patch series providing for tracking AVX-512 usage of tasks and exporting it to user-space is poised to be part of the upcoming Linux 5.1 kernel...

    http://www.phoronix.com/scan.php?pag...12-Usage-Tasks

  • #2
    If the task placement was already optimal, how did they achieve the more optimal task placement? I guess the task placement only has improved...

    Comment


    • #3
      AVX-512 is talked about since 2013, hence I am a bit curious why it took Intel so long to get these kinds of ISA specific improvements and optimizations into crucial parts of the software ecosystem. I would guess Intel would feed early hardware to their own various software teams and other ISVs to start this kind of work sooner rather than later.

      Comment


      • #4
        Originally posted by ms178 View Post
        AVX-512 is talked about since 2013, hence I am a bit curious why it took Intel so long to get these kinds of ISA specific improvements and optimizations into crucial parts of the software ecosystem. I would guess Intel would feed early hardware to their own various software teams and other ISVs to start this kind of work sooner rather than later.
        Good question. The only reason I can come up with is "we have no reason to push development on this" due to Intel having practically no competition from anyone at the time (I don't think Nvidia was even much of a threat in the server market back then). From what I can tell, Intel deliberately held it off to give themselves some performance leverage at last minute. Now that Intel is being attacked from multiple angles, AVX-512 is kinda their "secret weapon", since Intel's AVX performance is currently better than AMD's.

        Comment


        • #5
          Originally posted by schmidtbag View Post
          Good question. The only reason I can come up with is "we have no reason to push development on this" due to Intel having practically no competition from anyone at the time (I don't think Nvidia was even much of a threat in the server market back then). From what I can tell, Intel deliberately held it off to give themselves some performance leverage at last minute. Now that Intel is being attacked from multiple angles, AVX-512 is kinda their "secret weapon", since Intel's AVX performance is currently better than AMD's.
          That wouldn't at all surprise me. On the business side, it makes sense to hold something back if you're already in the lead. If anyone starts to close in on that lead, you essentially have a magic bullet waiting for them. That's true of any business or industry. It sucks for consumers.

          I have to imagine that Intel being out of magical bullets is why they're getting into the discreet GPU market -- if AMD starts taking CPU numbers, take GPU numbers from Nvidia.

          Comment


          • #6
            Originally posted by ms178 View Post
            AVX-512 is talked about since 2013, hence I am a bit curious why it took Intel so long to get these kinds of ISA specific improvements and optimizations into crucial parts of the software ecosystem. I would guess Intel would feed early hardware to their own various software teams and other ISVs to start this kind of work sooner rather than later.
            It's not about using the vectorised instructions, which have been there for ages, just about tuning the cores usage, which is a small detail in the optimisation with some additional code complexity.

            Comment


            • #7
              Originally posted by feydun View Post

              It's not about using the vectorised instructions, which have been there for ages, just about tuning the cores usage, which is a small detail in the optimisation with some additional code complexity.
              AVX512 is only present on consumer parts in the latest generation.. Before it was restricted to some high end Xenons and Xenon Phi

              Comment


              • #8
                Originally posted by Spacefish View Post

                AVX512 is only present on consumer parts in the latest generation.. Before it was restricted to some high end Xenons and Xenon Phi
                Its only on SKYLAKE-X and XEONS no support on normal consumer CPUs.
                I have 7900X but I didn't get any nice improvements with AVX512. x265 runs a bit better but that's with low AVX offsets so the CPU clocks higher than normal.

                Comment


                • #9
                  Originally posted by Timon&Pumba View Post
                  If the task placement was already optimal, how did they achieve the more optimal task placement? I guess the task placement only has improved...
                  If I understand it correctly, it's about allowing userland software with a better understanding of the performance characteristics of the task it's performing to have access to the information it needs to do its own tuning of CPU affinities.

                  Comment


                  • #10
                    What I get from the patch, is that they want to maintain higher clock speeds when avx-512 context switches aren't needed (properly written avx512 code), but were speculatively done because it wasn't known whether the program had cleared the avx512 registers or not. From the wording that says something like "real world loads like linpack" I wouldn't be surprised if it's done to boost benchmark scores

                    Comment

                    Working...
                    X