Announcement

**brucethemoose** · 14 November 2021, 10:08 AM

This is a big deal, right? The dream is to just download a project built for Nvidia or whatever, and have it run on other hardware with minimal tweaks.

**DrYak** · 14 November 2021, 12:42 PM

Yup, indeed. As long as the high-level libraries (PyTorch, etc.) are written to use that as a backend it should indeed help make such code more portable accross platform.
Now we still need to persuade those devs who insist on only writing proprietary CUDA code.
(Though even on that front, AMD has been developing some portable compiler that are able to ingest CUDA).

**MadeUpName** · 14 November 2021, 01:10 PM

Isn't that what openCL was supposed to do?

**coder** · 14 November 2021, 06:03 PM

Originally posted by brucethemoose View Post

This is a big deal, right? The dream is to just download a project built for Nvidia or whatever, and have it run on other hardware with minimal tweaks.

Except that probably (almost) nobody uses LLVM to target compute workloads for Nvidia hardware. The normal way to develop for their stack is to use their CUDA API and compiler, which I think isn't even based on LLVM (but that's almost beside the point).

For a portable solution, you'd first need to be using a hardware agnostic host API, like OpenCL, Vulkan, or oneAPI. Then, your device code should be compiled to something like SPIR-V that supports the new operations, the backend will have to be LLVM-based, and you'd be limited to whatever operators LLVM supported, which isn't going to be a 1:1 match to what the hardware actually supports. That mismatch is going to result in some level of inefficiency, which is going to vary as a function of the hardware target and what operators you actually need/want.

I don't mean to pour cold water on this news, but it's merely a building block in a grander scheme. It takes us mostly back to the picture we had of device-portability before tensor instructions came onto the scene, some ~4 years ago.

What impresses me is that there's enough similarity between the devices & their native operations that we can even seriously talk about something like this. Unlike vector operations, I think there's a greater variety in how people have approached tensors.

**coder** · 14 November 2021, 06:09 PM

Originally posted by DrYak View Post

Yup, indeed. As long as the high-level libraries (PyTorch, etc.) are written to use that as a backend it should indeed help make such code more portable accross platform.
Now we still need to persuade those devs who insist on only writing proprietary CUDA code.

This is only the backend support. We'll still need SPIR-V support, and then (for those not directly targeting SPIR-V) extensions added to something like OpenCL.

This is an important first step, but only the first.

**coder** · 14 November 2021, 06:23 PM

Originally posted by MadeUpName View Post

Isn't that what openCL was supposed to do?

OpenCL* created a portable host API, common device language, and common intermediate representation, without most of which it wouldn't even matter if the device compiler supported a common set of tensor operations or not.

To round out the picture, oneAPI builds on OpenCL's foundations, though I honestly can't (yet) say much else about it. WebGPU sits a level further up the stack, but (I think) is more agnostic about whatever sits between it and the hardware.

* OpenCL was itself influenced by OpenGL and prior GPU compute languages & toolchains (including CUDA). Vulkan borrowed and extended OpenCL's SPIR intermediate representation.

**vegabook** · 14 November 2021, 06:43 PM

Originally posted by MadeUpName View Post

Isn't that what openCL was supposed to do?

This should open up tensor extensions, natively, to any language that uses LLVM. Think Swift, Rust, Haskell, Fortran, Julia, and many others, without going through C.

**coder** · 14 November 2021, 06:58 PM

Originally posted by vegabook View Post

This should open up tensor extensions, natively, to any language that uses LLVM. Think Swift, Rust, Haskell, Fortran, Julia, and many others, without going through C.

Whatever language you use to write the device code, you'll probably be limited to using some form of intrinsics, library functions, or a narrow range of idioms to get decent utilization out of the hardware. The most sensible approach is probably to build up a set of library primitives upon whatever common instructions they define, in order to do more useful, high-level operations.

**brucethemoose** · 14 November 2021, 07:28 PM

Originally posted by coder View Post

Except that probably (almost) nobody uses LLVM to target compute workloads for Nvidia hardware. The normal way to develop for their stack is to use their CUDA API and compiler, which I think isn't even based on LLVM (but that's almost beside the point).

For a portable solution, you'd first need to be using a hardware agnostic host API, like OpenCL, Vulkan, or oneAPI. Then, your device code should be compiled to something like SPIR-V that supports the new operations, the backend will have to be LLVM-based, and you'd be limited to whatever operators LLVM supported, which isn't going to be a 1:1 match to what the hardware actually supports. That mismatch is going to result in some level of inefficiency, which is going to vary as a function of the hardware target and what operators you actually need/want.

I don't mean to pour cold water on this news, but it's merely a building block in a grander scheme. It takes us mostly back to the picture we had of device-portability before tensor instructions came onto the scene, some ~4 years ago.

What impresses me is that there's enough similarity between the devices & their native operations that we can even seriously talk about something like this. Unlike vector operations, I think there's a greater variety in how people have approached tensors.

I was thinking this is something PyTorch, Tensorflow and so on could use on their end to help make projects more portable. If LLVM is targeting tensor operations (and not generic compute like the other APIs), wouldn't that have less overhead than, say, the Vulkan ncnn backend lots of projects use now?

Announcement

Tensor LLVM Extensions Proposed For Targeting AI Accelerators, Emerging Hardware

Tensor LLVM Extensions Proposed For Targeting AI Accelerators, Emerging Hardware

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment