Announcement

**Iolaum** · 29 April 2019, 04:59 PM

Does this have to do anything with triSYCL ?

**Madgemade** · 30 April 2019, 05:07 AM

This is all well and good, but, who is actually going to use this? At the end of the day CUDA is so dominant in the compute space that AMD pretty much gave up on OpenCL. Once they got HIP working it became possible to run many CUDA programs and now they are seeing some real use in industry and packages like Tensorflow are working.

The original benefit of OpenCL was it's cross-platformness. SYCL doesn't have this (in practice). Maybe it is better than CUDA, but that isn't anything like enough. If it only runs on Intel then it's a dead end. It's not like this hasn't been tried before. AMD made the HC C++ API (developed from C++ AMP), it was quite nice, very easy to use. But it is now deprecated it favour of HIP because nobody used it.

Away from GPUs SYCL makes a lot more sense. Intel is a big player in the FPGA space now. SYCL will be a lot of use there. I just don't know why they don't make it clearer that while it might be "open-source" that doesn't mean other vendors are going to give it any time at all whatsoever. I highly doubt that it will be anything other than Intel only. AMD seem to be moving away from OpenCL and concentrating their efforts on HIP. Nvidia doesn't even need a mention, they are the dominant player why would they help out Intel..

**illuhad** · 01 May 2019, 04:18 PM

Originally posted by Madgemade View Post

This is all well and good, but, who is actually going to use this? At the end of the day CUDA is so dominant in the compute space that AMD pretty much gave up on OpenCL. Once they got HIP working it became possible to run many CUDA programs and now they are seeing some real use in industry and packages like Tensorflow are working.

Tensorflow has already been ported to SYCL and is working.

Originally posted by Madgemade View Post

The original benefit of OpenCL was it's cross-platformness. SYCL doesn't have this (in practice). Maybe it is better than CUDA, but that isn't anything like enough. If it only runs on Intel then it's a dead end. It's not like this hasn't been tried before. AMD made the HC C++ API (developed from C++ AMP), it was quite nice, very easy to use. But it is now deprecated it favour of HIP because nobody used it.

Why is SYCL not cross-platform in practice? Like OpenCL, it's an open standard with many implementations. The implementation that this article is about (Intel LLVM SYCL) targets OpenCL SPIR-V devices. While it's true that SPIR-V capable OpenCL is unfortunately not yet widespread outside of Intel-world, that doesn't mean that SYCL itself isn't cross-platform:
There's triSYCL which runs on CPUs with OpenMP and Xilinx FPGAs, there's ComputeCpp which runs on SPIR and SPIR-V devices and there's hipSYCL which runs directly on top of CUDA/HIP and targets NVIDIA CUDA devices and AMD ROCm GPUs. So, no matter what hardware you have, there's a very high chance that there is a SYCL implementation that supports it.

Originally posted by Madgemade View Post

Away from GPUs SYCL makes a lot more sense. Intel is a big player in the FPGA space now. SYCL will be a lot of use there. I just don't know why they don't make it clearer that while it might be "open-source" that doesn't mean other vendors are going to give it any time at all whatsoever.

Well, other vendors only need to have a SPIR-V capable OpenCL implementation, which comes automatically if they support recent OpenCL versions at some point. That's the only requirement to have LLVM SYCL run on these devices. So, if OpenCL-next gets traction, SYCL comes automatically.
In the meantime, projects like hipSYCL allow running SYCL directly on top of closed platforms like CUDA. So, even if NVIDIA doesn't like SYCL, there's nothing they can do about people running SYCL on their GPUs.

**Madgemade** · 01 May 2019, 04:48 PM

Originally posted by illuhad View Post

In the meantime, projects like hipSYCL allow running SYCL directly on top of closed platforms like CUDA. So, even if NVIDIA doesn't like SYCL, there's nothing they can do about people running SYCL on their GPUs.

I'm glad to hear that Tensorflow is ported over and that Intel is making good progress. The trouble is that it just isn't very inviting (at least to me) to program something and then have to put it through so many layers (ie. SYCL -> HIP -> CUDA) when you could just use CUDA to start with. Also seems like a recipe for many bugs with code being converted multiple times. Just like OpenCL was, it will still be seen as the secondary inferior option to CUDA that you only use if you don't have any choice, regardless of it's merit as Nvidia have the mindshare.
If SPIR-V can be forced upon Nvidia (by weaving it into Vulkan in games) then SYCL might work out well and be the next generation of OpenCL (with proper support).

Originally posted by illuhad View Post

Why is SYCL not cross-platform in practice?

What I meant was that in theory it's great but in reality it doesn't have support in place. As you say it has little support outside of Intel. Without Nvidia it will not be able to get traction.

As far as I am concerned, there is no reason to use SYCL yet. Yes it works on Intel GPUs, but they are incredibly slow iGPUs that no serious compute programmer will ever use. Yes it works on CPUs but why use it over C++ or any other normal language? For GPUs HIP is the best option at the moment as it's very similar to CUDA (SYCL is not) and is cross-platform.
Once Intel bring out a GPU with Compute performance on the same level as Nvidia and AMD that will all change.
In my option though it would make more sense to get HIP to work on top of SYCL, then it would be easy to take existing HIP/CUDA code and run it straight away. Who has time to completely reprogram their CUDA application into SYCL? This is the approach AMD took with HIP in the first place and it is paying off. No matter how much better SYCL might be people naturally get entrenched with what they are used to and don't like change, trying to force it isn't going to pay off.

**illuhad** · 02 May 2019, 06:27 AM

Originally posted by Madgemade View Post

The trouble is that it just isn't very inviting (at least to me) to program something and then have to put it through so many layers (ie. SYCL -> HIP -> CUDA) when you could just use CUDA to start with. Also seems like a recipe for many bugs with code being converted multiple times.

To be fair, HIP is an extremely thin layer. For the most part it's just

Code:

#define hipStuff cudaStuff

on nvidia. Because of that I feel that hipSYCL is more like SYCL->HIP/CUDA->Hardware. Other SYCL implementations are SYCL->OpenCL->Hardware, so in that sense, hipSYCL doesn't have any more layers than other SYCL implementations.
I can see though that the source-to-source transformation that hipSYCL uses behind the scenes may make people skeptical, although it generally works well (including complex applications). In any case, this transformation will be replaced soon (order of days) by a small clang plugin which allows the clang CUDA/HIP frontend to directly ingest SYCL code and compile for NVIDIA and AMD GPUs. This is not any more prone to bugs than e.g. Intel's LLVM SYCL implementation.

I think there are actually a lot of reasons to use SYCL features on top of CUDA/HIP instead of using raw CUDA/HIP directly. You get automatic resource management, automatic memory transfers, implicit task graph based programming with stuff like automatic overlap of compute and memory copies (no, CUDA 10 graphs don't really compare), no need for __device__ and __host__ attributes, true out-of-order queues (unavailable in CUDA), and since SYCL is pure C++ you can also run on CPU for debugging/development (not possible with CUDA). If you want to have such features, you can of course try to implement that directly in CUDA, but then you are effectively reimplementing hipSYCL

Of course you are right in saying that people will stick to what they know and that it will be very difficult to establish something new. But by that logic we will be stuck with CUDA for eternity. CUDA is over 10 years old now, and the programming model clearly shows its age. Even NVIDIA is slowly starting to push people to other higher level programming models, in particular OpenACC.

To be clear, I'm not saying that everybody should jump on the SYCL bandwagon immediately. I also don't think we are quite ready yet because we need some more time to get more implementations to standard conformance. At the moment, only ComputeCpp is fully conformant while the others (Intel SYCL, triSYCL, hipSYCL) are still work in progress. However, I think that at the moment things look promising and SYCL implementations are making good progress.

Originally posted by Madgemade View Post

In my option though it would make more sense to get HIP to work on top of SYCL, then it would be easy to take existing HIP/CUDA code and run it straight away. Who has time to completely reprogram their CUDA application into SYCL?

That's an interesting idea, unfortunately I don't think it's easily possible (at least for the general case) with pure standard SYCL 1.2.1 because SYCL 1.2.1 has inherited from OpenCL an abstracted treatment of memory with buffer objects while HIP operates directly on pointers. There's an interesting extension in Intel SYCL that adds pointers similarly to HIP/CUDA which would be very helpful here, but it's not yet in the standard (but hopefully will be at some point). Additionally, implementing HIP/CUDA requires a lot of compiler support to correctly parse kernel invocations and treat all those attributes like __shared__ and __constant__ correctly. If you want to have HIP on top of SYCL you need to have a CUDA frontend running on top of a SYCL compiler frontend...
By the way, what you are proposing - having existing code and running it straight away with SYCL - is possible with hipSYCL for AMD and NVIDIA GPUs. Anything that's available in HIP/CUDA is also available in hipSYCL since code is compiled like any other regular HIP/CUDA code. Of course this won't help the adoption of Intel hardware (I am not Intel, so I don't really care), but it helps people moving from HIP or CUDA to SYCL.
Also, I expect to see some source transformation tools soon to help in the transition from CUDA to SYCL

**Madgemade** · 02 May 2019, 10:17 AM

Originally posted by illuhad View Post

I think there are actually a lot of reasons to use SYCL features on top of CUDA/HIP instead of using raw CUDA/HIP directly. You get automatic resource management, automatic memory transfers, implicit task graph based programming with stuff like automatic overlap of compute and memory copies (no, CUDA 10 graphs don't really compare), no need for __device__ and __host__ attributes, true out-of-order queues (unavailable in CUDA), and since SYCL is pure C++ you can also run on CPU for debugging/development (not possible with CUDA).

Much of this was true of AMDs HC, which was based on C++ AMP. At least for automatic memory transfers (reminds me of cudaDeviceSynchronize which is not even available in HIP, except that in HC you didn't even need to make a call it was all automatic), being able to run all device code on the host, no attributes except for [[hc]] which was mostly not needed and did not prevent code from being run on the CPU (it just couldn't contain code that the GPU couldn't also run). The format of HC looks similar to SYCL, but the way lambdas are used seems more complex in SYCL.
HC sounds good, however it seems that few people even know/knew of it's existence and it's now deprecated. Only AMD knows why it was deprecated, presumably because it wasn't in any way cross-platform and for most uses HIP could be used instead.

Originally posted by illuhad View Post

In any case, this transformation will be replaced soon (order of days) by a small clang plugin which allows the clang CUDA/HIP frontend to directly ingest SYCL code and compile for NVIDIA and AMD GPUs.

This sounds interesting, could you link it? I'm not up to date with llvm progress on Nvidia, presumably this still requires nvcc to generate for Nvidia targets? From what I can tell it should be possible to allow for compiling straight from SYCL to AMDGPU ISA thanks to open-source, so if that's on the way to being implemented then it won't matter if AMD is not interested in helping out directly.
Of course there is much more to it than that (probably beyond my understanding). Ideally device binaries wouldn't be used at all because these lack binary for newer devices (released after compilation or no good long term), which is a problem with HIP/HC for AMD, not to mention bloated size when many copies for the different ISAs are needed.
I was going to say something about SPIR-V but in reality I don't actually know enough to say anything meaningful. It looks to me like it should be supported already as part of Vulkan?, but I couldn't really find out. The whole area of Vulkan for compute is something that I have been meaning to look into for a while but not yet found time.
If my limited knowledge is correct... Vulkan is a runtime for SPIR-V so does that mean: [SYCL/HIP -> SPIR-V -> (anything that runs Vulkan)] is possible (at least in theory)? That would seem like the ideal route as then SYCL could run on all major GPUs. If that has been the idea all along then sorry for being obtuse

, it's still not a year since I wrote my first GPU program (in HC).

**illuhad** · 02 May 2019, 05:42 PM

Originally posted by Madgemade View Post

Much of this was true of AMDs HC, which was based on C++ AMP. At least for automatic memory transfers (reminds me of cudaDeviceSynchronize which is not even available in HIP, except that in HC you didn't even need to make a call it was all automatic), being able to run all device code on the host, no attributes except for [[hc]] which was mostly not needed and did not prevent code from being run on the CPU (it just couldn't contain code that the GPU couldn't also run). The format of HC looks similar to SYCL, but the way lambdas are used seems more complex in SYCL.
HC sounds good, however it seems that few people even know/knew of it's existence and it's now deprecated. Only AMD knows why it was deprecated, presumably because it wasn't in any way cross-platform and for most uses HIP could be used instead.

I believe that SYCL was indeed inspired in part by C++AMP. The main difference is probably SYCL's out-of-order processing based on implicit task graphs. I would also suspect that the reason for HC deprecation was a lack of users due to lack of cross platform support. C++AMP itself probably was always prone to portability issues due to not being neutral enough, coming from Microsoft.

Originally posted by Madgemade View Post

This sounds interesting, could you link it?

Sure! See here for some discussions: https://github.com/illuhad/hipSYCL/issues/42
and here for the initial pull request: https://github.com/illuhad/hipSYCL/pull/60

It's not yet ready for prime-time, but hopefully will be very soon

Originally posted by Madgemade View Post

I'm not up to date with llvm progress on Nvidia, presumably this still requires nvcc to generate for Nvidia targets?

No, nvcc is not needed at all (thank god, it's a horrible compiler with terrible support for modern C++). clang/LLVM compiles CUDA directly to nvptx, i.e. the CUDA IR. There are still some components needed from the CUDA toolkit (which is expected - it's a closed platform, so at some point you just have to interface with closed source stuff). In particular the libdevice library which contains bitcode for device builtins and ptxas (ptx assembler), which is invoked by clang to translate nvptx into the top-secret chip specific SASS binary. This is then the binary that is actually executed on the GPU.
You can find more information here: http://llvm.org/docs/CompileCudaWithLLVM.html#id12

Originally posted by Madgemade View Post

From what I can tell it should be possible to allow for compiling straight from SYCL to AMDGPU ISA thanks to open-source, so if that's on the way to being implemented then it won't matter if AMD is not interested in helping out directly.

Yes

Originally posted by Madgemade View Post

Of course there is much more to it than that (probably beyond my understanding). Ideally device binaries wouldn't be used at all because these lack binary for newer devices (released after compilation or no good long term), which is a problem with HIP/HC for AMD, not to mention bloated size when many copies for the different ISAs are needed.

Yes, this would indeed be a problem if you're interested in distributing binaries. I work in HPC, where software usually is open source, so personally I don't have to worry about that

SPIR, SPIR-V or some other IR can help, nvptx (or hsail?) with some conservative architecture settings will also work but at some expense of performance.

Originally posted by Madgemade View Post

I was going to say something about SPIR-V but in reality I don't actually know enough to say anything meaningful. It looks to me like it should be supported already as part of Vulkan?, but I couldn't really find out. The whole area of Vulkan for compute is something that I have been meaning to look into for a while but not yet found time.
If my limited knowledge is correct... Vulkan is a runtime for SPIR-V so does that mean: [SYCL/HIP -> SPIR-V -> (anything that runs Vulkan)] is possible (at least in theory)? That would seem like the ideal route as then SYCL could run on all major GPUs. If that has been the idea all along then sorry for being obtuse

, it's still not a year since I wrote my first GPU program (in HC).

SPIR-V is part of Vulkan as well as recent versions of OpenCL. The problem is that OpenCL and Vulkan speak two different dialects of SPIR-V, which prevents a simple ingestion of OpenCL kernels compiled to SPIR-V into Vulkan. However, work is underway to converge the two (for example, have a look at the clspv project). If I remember correctly, Vulkan shaders have recently received support for pointers, which was as far as I know one of the main differences to OpenCL kernels. Long story short: There are active developments in the Khronos world to allow for Vulkan and OpenCL interoperability on SPIR-V level. Once this is in place, OpenCL can be implemented on top of Vulkan (which many people have expressed interest in), and then SPIR-V based SYCL implementations should run on all GPUs out of the box

So, SYCL on top of Vulkan seems very possible in the future

I'm not sure if SYCL could in principle already run directly on top of Vulkan with the current state of Vulkan compute shaders (instead of OpenCL). It's likely that some parts of the SYCL spec cannot be implemented with Vulkan SPIR-V, but I'm not familiar enough with the Vulkan side of things and which restrictions it puts exactly on its shaders.

In any case, even if we get SYCL on top of Vulkan, this will only be part of the way. We also need good tools like debuggers and profilers for GPGPU applications in order to properly develop SYCL applications on Vulkan. I'm not sure if current graphics profilers will suffice here. Tooling and libraries is one of the aspects to make it desirable to interface with HIP/CUDA, at least for the moment.

Announcement

Intel Continues Working On Their SYCL Compiler For Upstreaming To LLVM

Intel Continues Working On Their SYCL Compiler For Upstreaming To LLVM

Comment

Comment

Comment

Comment

Comment

Comment

Comment