Announcement

**coder** · 19 December 2021, 04:56 PM

Originally posted by tildearrow View Post

What I mean is CUDA on AMD hardware

I'll bet that was AMD's original goal. However, then the Oracle/Google lawsuit resulted a ruling that Oracle could indeed copyright the Java API (overturning decades of precedent), and AMD probably then shifted to building a translator instead of simply replicating the CUDA API.

**coder** · 19 December 2021, 04:59 PM

Originally posted by StillStuckOnSI View Post

Intel are the only GPU maker actually pushing SPIR-V and OpenCL these days, but they don't have the hardware to back it up.

DG2 should launch in a couple months. However, Intel's implementation on their iGPUs is decent.

There's also Ponte Vecchio, though I'm not sure about when it's supposed ship date is (i.e. outside of special HPC deployments already in progress).

**tildearrow** · 19 December 2021, 05:20 PM

Originally posted by coder View Post

I'll bet that was AMD's original goal. However, then the Oracle/Google lawsuit resulted a ruling that Oracle could indeed copyright the Java API (overturning decades of precedent), and AMD probably then shifted to building a translator instead of simply replicating the CUDA API.

Screw this and the unfair greedy monopoly NVIDIA has.
Google took the risky path, and easily took half of the entire mobile OS usage share.
And even after the lawsuit Google still exists.
I wish AMD just took the risk, because only 5% of people would be happy with translation tools, but 95% would be satisfied with a compatibility layer.

**coder** · 19 December 2021, 05:35 PM

Originally posted by tildearrow View Post

Screw this and the unfair greedy monopoly NVIDIA has.
Google took the risky path, and easily took half of the entire mobile OS usage share.
And even after the lawsuit Google still exists.

I don't know if they actually had to pay damages, since they appealed the ruling. I believe it actually got overturned, recently.

Remember that AMD wasn't in good financial shape at the time that HiP was first introduced. They weren't & still aren't Google.

Lastly, what Google did was considered safe, at the time they did it -- not risky! Remember the whole "decades of precedence" part?

Originally posted by tildearrow View Post

I wish AMD just took the risk, because only 5% of people would be happy with translation tools, but 95% would be satisfied with a compatibility layer.

I don't. Even if AMD had implemented their own version of CUDA, it's still an API that Nvidia 100% controls. And, at any time, they can throw a wrench in it that causes crap performance, bugs, etc. on AMD's hardware.

I wish AMD just stuck to OpenCL and maybe focused on providing translation tools & optimized runtime libraries for it.

**programmerjake** · 19 December 2021, 10:46 PM

Originally posted by coder View Post

Vulkan compute shaders only have precision requirements equivalent to GLSL, which is (in some cases much) lower than OpenCL.

So, even if you had the tooling to run OpenCL SPIR-V compute kernels on any Vulkan device, the results (often?) would be unsatisfactory if not downright unusable.

Most desktop gpus implement higher precision than required by the Vulkan spec, I'm sure it would be relatively easy to create a high-precision Vulkan extension that provides additional guarantees, if needed.

Also, the arithmetic operations needed for basically all path-tracing operations are accurate enough under Vulkan's guarantees that I doubt there will be issues -- the core operations needed are add, sub, mul, div, fma, sqrt, and rsqrt -- all of those have very good accuracy requirements in the Vulkan spec. The less accurate operations are the trigonometric and exponential/logarithmic operations, which are rarely used in path-tracing (probably only for procedural textures, assuming all transformation matrixes are pre-computed by the cpu) and could use a custom more-accurate implementation if necessary, rather than using Vulkan's built-in operations.

**coder** · 20 December 2021, 10:43 PM

Originally posted by programmerjake View Post

Most desktop gpus implement higher precision than required by the Vulkan spec, I'm sure it would be relatively easy to create a high-precision Vulkan extension that provides additional guarantees, if needed.

That's a pretty big backtrack. Your original post seemed aimed at running these kernels on virtually all Vulkan-capable GPUs. If you're going to require an extension, how is that really better than simply requiring OpenCL SPIR-V support?

Originally posted by programmerjake View Post

Also, the arithmetic operations needed for basically all path-tracing operations are accurate enough under Vulkan's guarantees that I doubt there will be issues -- the core operations needed are add, sub, mul, div, fma, sqrt, and rsqrt -- all of those have very good accuracy requirements in the Vulkan spec.

Depends on what it's for. But, we're talking about compute workloads, not graphics, because that's what CUDA & HIP are for.

Here's what the specs actually say:

Some of the more glaring examples are exp()/exp2() and atan()/atan2()/asin()/acos(). In the latter case, Vulkan allows up to 4096 ULP vs. OpenCL allowing only 6. That's about 683x as much error tolerance! The former is data-dependent, but I think worst-case is 173 vs. 3 ULP or about 58x as much.

So, you can perhaps now appreciate that these aren't simply hand-wavy differences, where one could blindly take CUDA/HIP code and run it under Vulkan with full faith in the accuracy of the results. Hardware that's designed specifically for graphics (and maybe also deep learning) probably doesn't have abundant precision, as that would be wasteful of die area and power.

Originally posted by programmerjake View Post

The less accurate operations are the trigonometric and exponential/logarithmic operations, which are rarely used in path-tracing (probably only for procedural textures, assuming all transformation matrixes are pre-computed by the cpu) and could use a custom more-accurate implementation if necessary, rather than using Vulkan's built-in operations.

Or, instead of writing a custom GPU math library (probably with abysmal performance compared to vendor-native implementations), maybe just stick to running GPU compute kernels via GPU compute APIs?

**programmerjake** · 21 December 2021, 12:51 AM

(The forum doesn't display any quotes inside a quote, so I manually adjusted it)

Originally posted by programmerjake View Post

Most desktop gpus implement higher precision than required by the Vulkan spec, I'm sure it would be relatively easy to create a high-precision Vulkan extension that provides additional guarantees, if needed.

Originally posted by coder View Post

That's a pretty big backtrack. Your original post seemed aimed at running these kernels on virtually all Vulkan-capable GPUs. If you're going to require an extension, how is that really better than simply requiring OpenCL SPIR-V support?

Because the Vulkan drivers are often higher quality, and because they work places where OpenCL might not work well at all. Also, that extension wouldn't be required, just the absence would enable any precision mitigations that are required -- almost certainly just the trig/exp/log functions, if those are too inaccurate for Cycles (which has yet to be seen, it's quite possible that the trig. inaccuracies cause no problems whatsoever).

Originally posted by programmerjake View Post

Also, the arithmetic operations needed for basically all path-tracing operations are accurate enough under Vulkan's guarantees that I doubt there will be issues -- the core operations needed are add, sub, mul, div, fma, sqrt, and rsqrt -- all of those have very good accuracy requirements in the Vulkan spec.

Originally posted by coder View Post

Depends on what it's for. But, we're talking about compute workloads, not graphics, because that's what CUDA & HIP are for.

Cycles is much more like a graphics workload, in accuracy requirements, last-bit precision is usually not necessary. Vulkan actually provides identical precision to OpenCL for add, sub, mul, div, and rsqrt -- most of the operations I mentioned. Also, I did reread through the spec when I wrote that list of operations you quoted above...I'm quite familiar with Vulkan's accuracy requirements -- I've been working on building a GPU for the last 3 years as part of the Libre-SOC project and have been the primary source of Vulkan expertise for the project.

Originally posted by coder View Post

Here's what the specs actually say:

Some of the more glaring examples are exp()/exp2() and atan()/atan2()/asin()/acos(). In the latter case, Vulkan allows up to 4096 ULP vs. OpenCL allowing only 6. That's about 683x as much error tolerance! The former is data-dependent, but I think worst-case is 173 vs. 3 ULP or about 58x as much.

Yup, Vulkan has pretty atrocious accuracy requirements for the trig/log/exp functions, which is exactly why I stated that they are less accurate and suggested using custom implementations if/when that causes problems. Those custom implementations, on the GPUs with awful accuracy, are likely very similar to the actual implementation that the vendor uses for OpenCL anyway, so would have similar performance.

Originally posted by coder View Post

So, you can perhaps now appreciate that these aren't simply hand-wavy differences, where one could blindly take CUDA/HIP code and run it under Vulkan with full faith in the accuracy of the results.

Yup, I wasn't advocating for blind translation, but the more reasonable approach of adaptation for Vulkan's quirks, one of which is the accuracy.

Originally posted by coder View Post

Hardware that's designed specifically for graphics (and maybe also deep learning) probably doesn't have abundant precision, as that would be wasteful of die area and power.

Well, where it will likely actually matter for Cycles, Vulkan already has very strong requirements, so the mere fact that the GPU can implement Vulkan, and not just OpenGL ES 2 (which has very poor requirements, by contrast), means that the floating-point arithmetic hardware on the GPU has good enough accuracy for the operations Cycles likely needs for path tracing. (This comes from my experience, having written several ray-tracers and path-tracers myself.)

Originally posted by programmerjake View Post

The less accurate operations are the trigonometric and exponential/logarithmic operations, which are rarely used in path-tracing (probably only for procedural textures, assuming all transformation matrixes are pre-computed by the cpu) and could use a custom more-accurate implementation if necessary, rather than using Vulkan's built-in operations.

Originally posted by coder View Post

Or, instead of writing a custom GPU math library (probably with abysmal performance compared to vendor-native implementations), maybe just stick to running GPU compute kernels via GPU compute APIs?

**Boland** · 22 December 2021, 01:35 AM

Originally posted by programmerjake View Post

Cycles is much more like a graphics workload, in accuracy requirements, last-bit precision is usually not necessary. Vulkan actually provides identical precision to OpenCL for add, sub, mul, div, and rsqrt -- most of the operations I mentioned. Also, I did reread through the spec when I wrote that list of operations you quoted above...I'm quite familiar with Vulkan's accuracy requirements -- I've been working on building a GPU for the last 3 years as part of the Libre-SOC project and have been the primary source of Vulkan expertise for the project.

Yup, Vulkan has pretty atrocious accuracy requirements for the trig/log/exp functions, which is exactly why I stated that they are less accurate and suggested using custom implementations if/when that causes problems. Those custom implementations, on the GPUs with awful accuracy, are likely very similar to the actual implementation that the vendor uses for OpenCL anyway, so would have similar performance.
Yup, I wasn't advocating for blind translation, but the more reasonable approach of adaptation for Vulkan's quirks, one of which is the accuracy. Well, where it will likely actually matter for Cycles, Vulkan already has very strong requirements, so the mere fact that the GPU can implement Vulkan, and not just OpenGL ES 2 (which has very poor requirements, by contrast), means that the floating-point arithmetic hardware on the GPU has good enough accuracy for the operations Cycles likely needs for path tracing. (This comes from my experience, having written several ray-tracers and path-tracers myself.)

See here.

2021-08-31 Blender Rendering Meeting

https://devtalk.blender.org/t/2021-08-31-blender-rendering-meeting/20206/17

All information that we can make public about plans for other GPUs is public, that’s all I can say about that. Vulkan has limitations in how you can write kernels, in practice you can’t currently use pointers for example. But also, GPU vendors will recommend certain platforms for writing production renderers, provide support around that, and various renderers will use it. Choosing a different platform means you will hit more bugs and limitations, have slower or no access to certain features, ar...

Vulkan compute isn't ready for Cycles. It currently has too many limitations/driver bugs/lack of support. If it was easy, then Otoy would have released Octane with the backend that they finished (but barely worked because of driver bugs on anything outside of Nvidia).

See here.

https://twitter.com/OTOY/status/1321494782254804993?s=20

And here.

https://twitter.com/OTOY/status/1437878707117383681?s=20

**programmerjake** · 22 December 2021, 04:39 AM

Originally posted by Boland View Post

See here.

2021-08-31 Blender Rendering Meeting

https://devtalk.blender.org/t/2021-08-31-blender-rendering-meeting/20206/17

All information that we can make public about plans for other GPUs is public, that’s all I can say about that. Vulkan has limitations in how you can write kernels, in practice you can’t currently use pointers for example. But also, GPU vendors will recommend certain platforms for writing production renderers, provide support around that, and various renderers will use it. Choosing a different platform means you will hit more bugs and limitations, have slower or no access to certain features, ar...

Vulkan compute isn't ready for Cycles. It currently has too many limitations/driver bugs/lack of support. If it was easy, then Otoy would have released Octane with the backend that they finished (but barely worked because of driver bugs on anything outside of Nvidia).

See here.

https://twitter.com/OTOY/status/1321494782254804993?s=20

And here.

https://twitter.com/OTOY/status/1437878707117383681?s=20

well, that's disappointing...

for the part about Vulkan's limitations around pointers, that's addressed by VK_KHR_vulkan_memory_model, which is a requirement of Vulkan 1.2.

Announcement

LLVM Working On "HIPSPV" So AMD HIP Code Can Turn Into SPIR-V And Run On OpenCL

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment