Announcement

**Shnatsel** · 07 July 2023, 10:11 AM

It would be interesting to see OpenVINO GPU performance for reference, to see if running these kinds of workloads even makes sense on CPUs. Neural network inference is usually done on GPUs, so it's a little strange to see Intel trying to capture this market on CPUs.

OpenVINO does support GPU compute, mostly targeting Intel devices. I would not be surprised if comparable performance can be achieved on a GPU that's several times cheaper than Xeon Max parts.

**schmidtbag** · 07 July 2023, 10:11 AM

I'm surprised HBM2e could make that big of a performance difference.

**Michael** · 07 July 2023, 10:59 AM

Originally posted by Shnatsel View Post

It would be interesting to see OpenVINO GPU performance for reference, to see if running these kinds of workloads even makes sense on CPUs. Neural network inference is usually done on GPUs, so it's a little strange to see Intel trying to capture this market on CPUs.

OpenVINO does support GPU compute, mostly targeting Intel devices. I would not be surprised if comparable performance can be achieved on a GPU that's several times cheaper than Xeon Max parts.

It's been on my TODO list albeit not a priority since it's just all consumer GPUs I have access to for testing.

**coder** · 07 July 2023, 11:05 AM

Originally posted by Michael View Post

It's been on my TODO list albeit not a priority since it's just all consumer GPUs I have access to for testing.

If you have an Intel Arc A770, that's cut from the same cloth as their Arctic Sound server GPUs. It'd be worth comparing to the Xeon Max + AMX, especially since it should be well-supported by OpenVINO.

Their Data Center GPU Flex 170 is basically an A770.
Their Data Center GPU Flex 140 is basically 2x A380 on a single card.

https://ark.intel.com/content/www/us...tic-sound.html

Of course, these dGPUs have only up to 16 GiB of RAM. So, they won't be useful for large training workloads. They're mostly for smaller inferencing workloads, like face recognition. However, I think most/all of your OpenVINO benchmarks should run well on them.

**Michael** · 07 July 2023, 11:08 AM

Originally posted by coder View Post

If you have an Intel Arc A770, that's cut from the same cloth as their Arctic Sound server GPUs. It'd be worth doing, especially since it should be well-supported by OpenVINO.

Right but basically either way someone will complain "but it's a consumer GPU, with ABC, you would have seen XYZ instead".... or "why no pro cards?", etc.... So in between the dozens of other test ideas floating around at any given time, with limited time and resources it's just not as high of a priority as some other tests/articles.

**coder** · 07 July 2023, 11:24 AM

Originally posted by Michael View Post

Right but basically either way someone will complain "but it's a consumer GPU, with ABC, you would have seen XYZ instead".... or "why no pro cards?", etc....

I just gave you an "out" for that. Compare the A770's specs to the Flex 170 - they're the same thing. As for "comparing with XYZ GPU", I think OpenVINO is a good excuse to stick with Intel GPUs.

Anyway, it was just a suggestion. I know you're busy, but I think a lot of us would like to see it.

**jrdoane** · 07 July 2023, 11:25 AM

Originally posted by schmidtbag View Post

I'm surprised HBM2e could make that big of a performance difference.

I'm sure that there are cases where it makes more of a difference than not. Clearly it's a winner when it comes to raw memory bandwidth. The real question is if the physical location of HBM2e being far closer to the cores than memory is what's really making a difference. There should be a very real latency benefit by not having all of those wires and traces going through the motherboard compared to just on an interposer.

With that said, I'd love to see some latency numbers for these CPUs when run only on DDR5 and only on HBM2e. As far as we know it falls somewhere between your typical LLC and external memory latency. If that's the case, cache misses should theoretically be less expensive which could impact a lot of different kinds of workloads.

**TonyM** · 07 July 2023, 02:37 PM

Originally posted by Shnatsel View Post

It would be interesting to see OpenVINO GPU performance for reference, to see if running these kinds of workloads even makes sense on CPUs. Neural network inference is usually done on GPUs, so it's a little strange to see Intel trying to capture this market on CPUs.

OpenVINO does support GPU compute, mostly targeting Intel devices. I would not be surprised if comparable performance can be achieved on a GPU that's several times cheaper than Xeon Max parts.

This actually isn't true. Mostly it's done on CPU. Some larger LLMs need GPUs to be efficient (a la your popular LLM models)

Why AI Inference Will Remain Largely On The CPU

https://www.nextplatform.com/2023/04/05/why-ai-inference-will-remain-largely-on-the-cpu/

Sponsored Feature: Training an AI model takes an enormous amount of compute capacity coupled with high bandwidth memory. Because the model training can be

HBM matters a ton for AI workloads because they are generally memory bandwidth bound in a lots of cases. So memory size/bandwidth are key drivers.

**hasnoidea** · 07 July 2023, 02:56 PM

There is a smol typo on the 2nd page:

moving to HBM caching and them HBM-only mode

Announcement

Intel Xeon Max Performance Delivers A Powerful Combination With AMX + HBM2e

Intel Xeon Max Performance Delivers A Powerful Combination With AMX + HBM2e

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment