Announcement

Collapse
No announcement yet.

Intel Xeon Max Performance Delivers A Powerful Combination With AMX + HBM2e

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Intel Xeon Max Performance Delivers A Powerful Combination With AMX + HBM2e

    Phoronix: Intel Xeon Max Performance Delivers A Powerful Combination With AMX + HBM2e

    The Intel Xeon Max 9480 flagship Sapphire Rapids CPU with HBM2e memory tops out at 56 cores / 112 threads, so how can that compete with the latest AMD EPYC processors hitting 96 cores for Genoa (or 120 cores with the forthcoming Bergamo)? Besides the on-package HBM2e that is unique to the Xeon Max family, the other ace that Xeon Max holds with the rest of the Sapphire Rapids line-up is support for the Advanced Matrix Extensions (AMX). In today's benchmarks of the Intel Xeon Max performance is precisely showing the impact of how HBM2e and AMX in order to compete -- and outperform -- AMD's EPYC 9554 and 9654 processors in AI workloads when effectively leveraging AMX and the onboard HBM2e memory.

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    It would be interesting to see OpenVINO GPU performance for reference, to see if running these kinds of workloads even makes sense on CPUs. Neural network inference is usually done on GPUs, so it's a little strange to see Intel trying to capture this market on CPUs.

    OpenVINO does support GPU compute, mostly targeting Intel devices. I would not be surprised if comparable performance can be achieved on a GPU that's several times cheaper than Xeon Max parts.

    Comment


    • #3
      I'm surprised HBM2e could make that big of a performance difference.

      Comment


      • #4
        Originally posted by Shnatsel View Post
        It would be interesting to see OpenVINO GPU performance for reference, to see if running these kinds of workloads even makes sense on CPUs. Neural network inference is usually done on GPUs, so it's a little strange to see Intel trying to capture this market on CPUs.

        OpenVINO does support GPU compute, mostly targeting Intel devices. I would not be surprised if comparable performance can be achieved on a GPU that's several times cheaper than Xeon Max parts.
        It's been on my TODO list albeit not a priority since it's just all consumer GPUs I have access to for testing.
        Michael Larabel
        https://www.michaellarabel.com/

        Comment


        • #5
          Originally posted by Michael View Post
          It's been on my TODO list albeit not a priority since it's just all consumer GPUs I have access to for testing.
          If you have an Intel Arc A770, that's cut from the same cloth as their Arctic Sound server GPUs. It'd be worth comparing to the Xeon Max + AMX, especially since it should be well-supported by OpenVINO.

          Their Data Center GPU Flex 170 is basically an A770.
          Their Data Center GPU Flex 140 is basically 2x A380 on a single card.

          Of course, these dGPUs have only up to 16 GiB of RAM. So, they won't be useful for large training workloads. They're mostly for smaller inferencing workloads, like face recognition. However, I think most/all of your OpenVINO benchmarks should run well on them.
          Last edited by coder; 07 July 2023, 11:17 AM.

          Comment


          • #6
            Originally posted by coder View Post
            If you have an Intel Arc A770, that's cut from the same cloth as their Arctic Sound server GPUs. It'd be worth doing, especially since it should be well-supported by OpenVINO.
            Right but basically either way someone will complain "but it's a consumer GPU, with ABC, you would have seen XYZ instead".... or "why no pro cards?", etc.... So in between the dozens of other test ideas floating around at any given time, with limited time and resources it's just not as high of a priority as some other tests/articles.
            Michael Larabel
            https://www.michaellarabel.com/

            Comment


            • #7
              Originally posted by Michael View Post
              Right but basically either way someone will complain "but it's a consumer GPU, with ABC, you would have seen XYZ instead".... or "why no pro cards?", etc....
              I just gave you an "out" for that. Compare the A770's specs to the Flex 170 - they're the same thing. As for "comparing with XYZ GPU", I think OpenVINO is a good excuse to stick with Intel GPUs.

              Anyway, it was just a suggestion. I know you're busy, but I think a lot of us would like to see it.

              Comment


              • #8
                Originally posted by schmidtbag View Post
                I'm surprised HBM2e could make that big of a performance difference.
                I'm sure that there are cases where it makes more of a difference than not. Clearly it's a winner when it comes to raw memory bandwidth. The real question is if the physical location of HBM2e being far closer to the cores than memory is what's really making a difference. There should be a very real latency benefit by not having all of those wires and traces going through the motherboard compared to just on an interposer.

                With that said, I'd love to see some latency numbers for these CPUs when run only on DDR5 and only on HBM2e. As far as we know it falls somewhere between your typical LLC and external memory latency. If that's the case, cache misses should theoretically be less expensive which could impact a lot of different kinds of workloads.

                Comment


                • #9
                  Originally posted by Shnatsel View Post
                  It would be interesting to see OpenVINO GPU performance for reference, to see if running these kinds of workloads even makes sense on CPUs. Neural network inference is usually done on GPUs, so it's a little strange to see Intel trying to capture this market on CPUs.

                  OpenVINO does support GPU compute, mostly targeting Intel devices. I would not be surprised if comparable performance can be achieved on a GPU that's several times cheaper than Xeon Max parts.
                  This actually isn't true. Mostly it's done on CPU. Some larger LLMs need GPUs to be efficient (a la your popular LLM models)

                  Sponsored Feature: Training an AI model takes an enormous amount of compute capacity coupled with high bandwidth memory. Because the model training can be


                  HBM matters a ton for AI workloads because they are generally memory bandwidth bound in a lots of cases. So memory size/bandwidth are key drivers.

                  Comment


                  • #10
                    There is a smol typo on the 2nd page:

                    moving to HBM caching and them HBM-only mode

                    Comment

                    Working...
                    X