Announcement

**Jumbotron** · 22 September 2023, 06:48 AM

Broadwell support. This. This is why after, what, 15 years since the introduction of AMD's Fusion SoCs, 8 years after the HSA Foundation, 8 years after Zen and RDNA, CDNA and Infinity Architecture and ROCm....it's not just about great hardware, it's also about great software and APIs.

I just wonder if my trusty AMD Bristol Ridge APUs in my aging but still reliable and running desktop and laptop will finally see some of their potential be exploited by, not AMDs work, but Intel and their oneAPI and now UXL framework? Along with the Linux kernel's continuing work in Heterogenous Memory Management? I mean, after all, Bristol Ridge was AMD's pinnacle and last hurrah of the Fusion Era SoCs and were, like their Carrizo forefather, fully HSA conformant, Zero Copy capable with 48 bit unified memory addressing for both the 4 CPU cores and its integrated GPU.

**JEBjames** · 22 September 2023, 09:04 AM

Michael

Typo

"Intel engineers have published their Compoute Runtime".

... should probably be "Compute"

**Eirikr1848** · 23 September 2023, 06:33 AM

Originally posted by Jumbotron View Post

Broadwell support. This. This is why after, what, 15 years since the introduction of AMD's Fusion SoCs, 8 years after the HSA Foundation, 8 years after Zen and RDNA, CDNA and Infinity Architecture and ROCm....it's not just about great hardware, it's also about great software and APIs.

I just wonder if my trusty AMD Bristol Ridge APUs in my aging but still reliable and running desktop and laptop will finally see some of their potential be exploited by, not AMDs work, but Intel and their oneAPI and now UXL framework? Along with the Linux kernel's continuing work in Heterogenous Memory Management? I mean, after all, Bristol Ridge was AMD's pinnacle and last hurrah of the Fusion Era SoCs and were, like their Carrizo forefather, fully HSA conformant, Zero Copy capable with 48 bit unified memory addressing for both the 4 CPU cores and its integrated GPU.

Ivy Bridge and Haswell support would be the sweet, sweet icing on this cake.

On the AMD side, you might be right given OneAPI’s ability to leverage other OpenCL devices. With rusticl for OpenCL, you may be good.

However: with your GFX80x integrated: ROCM should work unofficially. From there you can leverage tensorflow on ROCM, etc for ML and your other needs

**Jumbotron** · 23 September 2023, 02:08 PM

Long ago had discussions here with Bridgman with AMD about this. Carrizo and Bristol Ridge at one time "technically" had "unofficial" support for ROCm but practically speaking it doesn't work. At least I never could so blame me not the stack. But heard from others their failures as well. Last I looked, which was probably a year or more ago, AMD made it official and killed off the Fusion Era APUs from being supported. I know that the whole Bulldozer/Piledriver/Excavator line of CPUs and APUs had their detractors particularly with the way AMD decides to use Clustered Multithreading and a shared L2 cache between all 4 cores. But Carrizo and particularly Bristol Ridge with how AMD used GPU masks to get more on to that 28nm die , is really a marvel in engineering. And the fact that it actually has unified memory addressing at 48 bits between CPU cores and its GPU. As long as they keep running I'll see what happens going forward just for interest sake. Performance wise even with Linux kernel work and Intel APIs wouldn't be worth writing home about. But it would be nice to see some usefulness at the margins concerning their infrastructure.

**bridgman** · 23 September 2023, 11:11 PM

One of the challenges with supporting older hardware is the "fat binary" model we use, where compiling an application generates GPU code for each of the supported GFX generations. The binaries get big pretty quickly and even sub-addressing them becomes a problem, although I haven't been able to spend enough time looking at it to see if that is a common isssue or just a corner case.

I don't think asking users to compile the apps they want to use themselves is a viable option for older hardware but the alternative seems to be compiling multiple times for different ranges of hardware. That seems clunky as well because it would require some way of matching CPU/GPU binaries with hardware at a distro or execution level.

A separate challenge is keeping the shader compiler working with older hardware. I think the answer there is to go back to a point where pre-Vega hardware (basically Polaris back to APUs) was working well and fork the code so the older support doesn't get broken. There's a complication that we are continuing to add new features but my impression is that if we could run common frameworks on older hardware that would go a long way to making people happy even if we didn't support the latest bleeding edge CUDA features.

I was really impressed with how well HSA worked on the older hardware - I am still running mostly Kaveri systems - in the Kaveri / Carrizo / Bristol Ridge range. Using IOMMUv2 was a bit of a learning curve for us but I was just blown away with how well it all worked... GPUs executing out of pageable memory over a decade ago. We (AMD and NVidia) are doing that today with HMM, Managed Memory and page faulting from GPU page tables but it is only just becoming as broadly usable as the IOMMUv2 support on APUs was a decade ago. Unfortunately during the transition from "HSA" to "ROCm" the hardware targets and priorities also changed from APUs to dGPUs. We did try to keep the APU code paths working but in the end decided that it would make more sense to move APUs to a dGPU programming model (using GPU page tables rather than IOMMUv2) to let us keep a wider range of hardware working.

There is ongoing work to improve "official" consumer hardware support and that looks promising, but I don't feel like we have a crisp answer for supporting older hardware. In principle it should be simple - migrate older hardware from the somewhat commercially focused release/support model to something more like what we do with open source graphics drivers - but there are some specific challenges related to the fat binary model that still seem like they need more work.

Announcement

Compute Runtime 23.30.26918.9 Released For Intel's Open-Source GPU Compute Stack

Compute Runtime 23.30.26918.9 Released For Intel's Open-Source GPU Compute Stack

Comment

Comment

Comment

Comment

Comment