Announcement

**ms178** · 05 March 2021, 01:44 PM

I still find it funny that Intel sponsors this work, let's wait and see if it gains any traction. I just hope that RDNA users aren't left in the dust, at least ROCm support is supposed to be added for Navi somewhen this year (https://github.com/RadeonOpenCompute...ment-770574455), let's wait and see here too if that also means full support for HIP.

**oleid** · 05 March 2021, 05:00 PM

That is the main reason why I did by a used 1080 Ti card, when by Radeon RX480 died a few months ago. The new RDNA2 GPUs weren't released, yet. And I was afraid that there was no ROCm support - after all I wanted to use tensorflow. It used to work - to some degree - with the RX480 using their ROCm backend.

The 1080Ti works fine, however, is a little memory limited. And I'm stuck with X11 (wayland/GNOME + nvidia is highly unstable for me). I'll wait and see how well RDNA2 based cards will work with tensorflow in half a year from now. If the GPUs are available for reasonable prices and ROCm works fine, then I'll replace the GPU and go OSS again

**ms178** · 05 March 2021, 05:15 PM

Originally posted by oleid View Post

That is the main reason why I did by a used 1080 Ti card, when by Radeon RX480 died a few months ago.

I hope you made a good deal, did you take advantage of the people panic-selling their 1080 Ti's? These people should have waited a couple of months longer.

To be honest, I would need it mostly for video related tasks but RDNA2 seems to be rather game-centric and while these cards would be probably fine for my modest needs, having a card which would be well at gaming and video/compute work for around 300 EUR would be my preference.

**polarathene** · 05 March 2021, 07:24 PM

Curious if hipSYCL will eventually enable running ZLUDA on AMD GPUs, that'd be sweet.

**oleid** · 06 March 2021, 10:03 AM

Originally posted by ms178 View Post

I hope you made a good deal, did you take advantage of the people panic-selling their 1080 Ti's? These people should have waited a couple of months longer.

Something like that, yes. Right at that time 3080 was released.

**Jabberwocky** · 07 March 2021, 07:22 AM

Every SYCL project targets different backends: triSYCL, hipSYCL, ComputeCCP, Intel LLVM SYCL, sycl-gtx.

AFAIK sycl-gtx is the only implementation that works on Windows and Linux for Intel/Nvidia/AMD GPUs. It appears to be based on someone's thesis and is not actively developed.

I would really like to know why hipSYCL targets OpenMP for CPUs and ROCm/CUDA (and now oneAPI Level Zero) for GPUs instead of targeting something agnostic like OpenCL/SPIR-V first. I understand abstraction makes it difficult to optimize but GPGPU is a mess.

OpenCL 1.x Currently the only thing that works cross-platform (OS/device)
OpenCL 2.x Nvidia chooses not to support
OpenCL 3.x Nvidia chooses not to support
CUDA Nvidia (ROCm tries to support too)
ROCm Need to use Linux and only works on select few GPUs
SYCL Hoping this is what everyone targets, but the adoption is very slow both in drivers and applications

I know these things take time, but from what I can tell it's not moving in the right direciton. Many peope working on important libraies do not have the latest over priced GPU and they do not run the same OS. People are not going to be motivated to port their code to SYCL if you need a Ph.D just to get your code to run on your GPU.

My objective is not to rant. I would like to know what are the technical reasons why something like sycl-gtx is not being used? What are some of the other ways that SYCL adoption can be accelerated? Can Vulkan be used as a backend?

**illuhad** · 07 March 2021, 09:48 AM

Originally posted by Jabberwocky View Post

Every SYCL project targets different backends: triSYCL, hipSYCL, ComputeCCP, Intel LLVM SYCL, sycl-gtx.

AFAIK sycl-gtx is the only implementation that works on Windows and Linux for Intel/Nvidia/AMD GPUs. It appears to be based on someone's thesis and is not actively developed.

I would really like to know why hipSYCL targets OpenMP for CPUs and ROCm/CUDA (and now oneAPI Level Zero) for GPUs instead of targeting something agnostic like OpenCL/SPIR-V first. I understand abstraction makes it difficult to optimize but GPGPU is a mess.

OpenCL 1.x Currently the only thing that works cross-platform (OS/device)
OpenCL 2.x Nvidia chooses not to support
OpenCL 3.x Nvidia chooses not to support
CUDA Nvidia (ROCm tries to support too)
ROCm Need to use Linux and only works on select few GPUs
SYCL Hoping this is what everyone targets, but the adoption is very slow both in drivers and applications

I know these things take time, but from what I can tell it's not moving in the right direciton. Many peope working on important libraies do not have the latest over priced GPU and they do not run the same OS. People are not going to be motivated to port their code to SYCL if you need a Ph.D just to get your code to run on your GPU.

My objective is not to rant. I would like to know what are the technical reasons why something like sycl-gtx is not being used? What are some of the other ways that SYCL adoption can be accelerated? Can Vulkan be used as a backend?

The new Level Zero backend already uses SPIR-V as its kernel format, so it's Level Zero+SPIR-V. The reason we don't use OpenCL for the runtime part is because in practice OpenCL+SPIR-V is also just as limited to Intel devices because nobody else supports SPIR-V in their OpenCL implementations. Additionally, some aspects of SYCL 2020 such as unified shared memory (USM) require OpenCL extensions that only Intel implements. So using OpenCL would accomplish nothing but take away control that we need in hipSYCL, while simultaneously making it more difficult to access certain functionality that we also need.

It's also important to remember that there are not many CPU vendors besides Intel that even have an OpenCL implementation for their CPUs. hipSYCL and its OpenMP backend can run on any CPU for which an OpenMP compiler exists - Intel, AMD, Power, ARM, probably RISC-V, you name it. Some of these architectures may not be very relevant for the regular consumer machine, but are relevant in high performance computing.

It would be nice if there were a common low-level API that works everywhere that SYCL implementations can just use, but unfortunately that just doesn't exist in practice. OpenCL 1.2 does not define an IR, so there's no format that SYCL implementations could compile to and feed into OpenCL. Except OpenCL C, but compiling to C code brings its own difficulties and I'm skeptical about how robust someting like this could be. OpenCL 1.2. is also missing functionality that SYCL 2020 needs, e.g. pointer kernel arguments.

The nice thing is that SYCL does not need explicit hardware vendor support (to achieve performance, functionality, robustness etc), because we can just implement it on top of whatever compute model a hardware vendor prefers. This is exactly the strategy that hipSYCL follows and this is what makes SYCL effectively immune to the adoption friction that OpenCL suffers from.

Once SYCL is widespread it might be possible to leverage that to try to revive OpenCL/SPIR-V.. but who knows.

sycl-gtx is not widely used because it cannot ingest regular SYCL code. It needs special macros to express basics like if statements, and is probably more of an experiment.

My understanding is that Vulkan has a different execution model than for example OpenCL or other compute-oriented models. This affects e.g. things like pointer arithmetics. While it might be possible to implement a subset of SYCL using Vulkan, AFAIK it is not yet possible to express everything that SYCL can do in Vulkan. What you would get out of this would be something with substantial caveats and probably far less robust than what current SYCL implementations provide.

I don't quite know why think that you would need a PhD to get your code running on GPU with SYCL. There's no way around some runtime library from vendors that SYCL (or any other portable model) can tie into. This means you need to install CUDA for NVIDIA, ROCm for AMD etc. It's exactly the same as with OpenCL. The fact that AMD does not support all their GPUs in their primary GPGPU platform ROCm is not a SYCL problem, but an AMD problem of their own making that they need to fix for their own benefit.

Both DPC++ as well as ComputeCpp support Windows, although only for Intel hardware.
hipSYCL has received a lot of patches for Windows support (for CPU+NVIDIA backends) recently and we are now in process of even adding Windows to CI: https://github.com/illuhad/hipSYCL/pull/476
However, for anything except the CPU backend this is experimental and not straight-forward. But this is not an hipSYCL issue, but a problem that the layers below hipSYCL, such as clang's CUDA toolchain and plugin infrastructure, are not well maintained/supported on Windows. DPC++ and their NVIDIA support also suffer from that. This is something that clang needs to address upstream.

**illuhad** · 07 March 2021, 10:08 AM

Originally posted by polarathene View Post

Curious if hipSYCL will eventually enable running ZLUDA on AMD GPUs, that'd be sweet.

"I have this car engine and this plane engine, can I attach the plane engine to the car engine to have lift off?" ;-)

ZLUDA is a solution that takes a CUDA program, intercepts all CUDA runtime calls and maps them to Level Zero calls, and recompiles embedded CUDA PTX kernels to SPIR-V to execute on Intel hardware.

hipSYCL is a SYCL implementation, so it takes SYCL code as input, compiles to various formats (SPIR-V/amdgcn/PTX) for its various backends, and at runtime manages the hardware using Level Zero/ROCm/CUDA.

I don't see how the two could work together in a meaningful way to achieve what you want. You cannot put ZLUDA in front of hipSYCL because hipSYCL wants SYCL code as input. You might be able to run a binary compiled with hipSYCL through ZLUDA to make hipSYCL's CUDA backend run on Intel.. But this would probably be far inferior with respect to functionality, stability and performance to just implementing Level Zero support directly in hipSYCL. As the phoronix article says, this is what we are working on at the moment.

If you want to run your CUDA applications on AMD, you would need to add an AMD backend to ZLUDA (if ZLUDA is designed to allow for a multi-backend architecture, which I don't know), or just implement something like ZLUDA but for AMD: Intercept CUDA API calls, map to HIP/ROCm, recompile kernels to amdgcn.

**polarathene** · 07 March 2021, 05:02 PM

Originally posted by illuhad View Post

ZLUDA is a solution that takes a CUDA program, intercepts all CUDA runtime calls and maps them to Level Zero calls, and recompiles embedded CUDA PTX kernels to SPIR-V to execute on Intel hardware.

hipSYCL is a SYCL implementation, so it takes SYCL code as input, compiles to various formats (SPIR-V/amdgcn/PTX) for its various backends, and at runtime manages the hardware using Level Zero/ROCm/CUDA.

You cannot put ZLUDA in front of hipSYCL because hipSYCL wants SYCL code as input.

Ah ok my bad!

I was sure I had read in the past that hipSYCL and Intel oneAPI were quite similar, basically SYCL2020 but Intel had tacked on a few third-party libs or something in addition to that? I misread the article, and thought it was about being able to compile (transpile?) oneAPI projects to alternative backends like CUDA and ROCm...which you've just pointed out it's not

I wouldn't have had any major concerns about it not being super efficient, as long as it was still faster than a CPU by a reasonable amount, the GPU could perform poorly at 25-50% of what it might have been expected to achieve with native ROCm implementation and I'd still be happy that I could use proprietary software relying on CUDA with an AMD GPU.

So this sort of thing (like ZLUDA port to hipSYCL if that makes any sense) wouldn't be possible with hipSYCL ever? Is hipSYCL sort of like ArrayFire in that sense? (they're more of an API with C++ and few wrapping libs in other languages that does JIT kernels for various GPU compute backends)

Announcement

hipSYCL Sees Work-In-Progress Support For Intel oneAPI Level Zero Backend

hipSYCL Sees Work-In-Progress Support For Intel oneAPI Level Zero Backend

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment