Announcement

**Daktyl198** · 07 March 2024, 04:22 PM

Originally posted by Jabberwocky View Post

I have tested ZLUDA and recent ZLUDA forks extensively. IMO the gap between rocm and CUDA is still wide open. ZLUDA just shows us another way the gap can be closed. It doesn't close the gap by any means yet. Nobody has stepped up to say they are going to close that gap. I am doubtful about zluda in the long run.

I agree with you about the legal stuff. Nvidia isn't happy about it. They will probably implement something that makes it more difficult for zluda to work.

The gap in performance is still there, I just meant that ZLUDA surpassed "native" ROCm codepaths sometimes by quite a bit, and even got within 10% performance difference with NVidia cards that cost far more. ZLUDA is never going to be the best at performance, but at the very least it showcases how much performance is left on the table with AMD and ROCm when it comes to 3rd party applications. It's basically calling out every single 3rd party application for using a sub-par implementation of ROCm.

**Jabberwocky** · 07 March 2024, 06:48 PM

Originally posted by Daktyl198 View Post

The gap in performance is still there, I just meant that ZLUDA surpassed "native" ROCm codepaths sometimes by quite a bit, and even got within 10% performance difference with NVidia cards that cost far more. ZLUDA is never going to be the best at performance, but at the very least it showcases how much performance is left on the table with AMD and ROCm when it comes to 3rd party applications. It's basically calling out every single 3rd party application for using a sub-par implementation of ROCm.

Well said. Yes, it's really awesome for telling AMD look what your hardware can do if you had good software.

Not only that but Zluda is helping support, for example Windows pytorch support where rocm is still lagging behind. It's not easy to get zluda+rocm to run your pytorch apps on Windows but it's easier than using rocm by itself.

I hope I'm wrong about zluda in the long run. It would be amazing to have things like DLSS running on AMD.

**blackiwid** · 07 March 2024, 07:50 PM

Originally posted by pWe00Iri3e7Z9lHOX2Qx View Post

I do wish AMD would actually lead the way with some important new GPU tech. We get free and open source versions of similar tech a while later which is great in and of itself. But it's always so reactionary to what NVIDIA is doing. G-SYNC, ray tracing, DLSS. "Hey let's try to do those same things almost as good but for free and open source!" isn't usually a strategy to surpass your competition.

Do you really would want AMD Gsync the fast and ugly way like Nvidia did it with some spezial AMD only specific Monitors that cost 200 Dollars extra? That was the only way Nvidia could be faster.

Yes with FSR they slept, probably thought the mining boom never ends and tried to compete on that front with Nvidia ignoring gaming. With ray tracing well even Nvidia <700 dollar cards all are not good enough for Raytracing and if so then in absurd Museum resolutions like 1080p.

If with the current gen would be like planed 10-20% faster and had less power issues for the same cost, nobody would cared, even with the suprising problems they had with the current gen, they gain market share in Desktop gaming, their real problem is their crappy offering for laptops and or the marketing around it.

Regarding Desktop Gaming they win on the for most people most important factor their bang for a buck ratio, there they kick Nvidias ass. Also they basically are often equal currently with Nvidia because if you compare similarily priced gpus istead of namewise a rx 7700xt beats a 4060 ti with raytracing on depending on if you want the 8 or 16gb version the Nvidia is only slightly cheaper or more expensive. yet the 7700xt is faster Raytracing and 20% faster than the 4060 ti in raster so you have to come up with to much power draw living in a country with expensive electrecity and playring each day 15hours or so... or like some niche things otherwise the 7700xt is just better in every single aspect (including Raytracing which was the topic).

Also Nvidia now is behind on the Handheld front, that is a innovation the Steam deck they did together with steam, now you could argue that Nvidia was first with the Switch but it's such a weak hardware that Nvidia can't be happy that they are the china cheap slow BS hardware and AMD has the strong hardware in this sector, and they only had 1 modell while AMD now has a whole new class of devices, every oem one after another releases a handheld or multiple and except one that got bribed by intel all use AMD hardware.

They basically created Vulkan by first creating Metal and then let the Vulcan people nearly 1:1 copy it and make it a open standard. Also they have the technology now to on driver level activate FSR while DLSS is not supported by all games and those that don't can't use it at all at the moment... so it's not like they only copy Nvidia that is not true.

**AmericanLocomotive** · 07 March 2024, 10:13 PM

It really boggles my mind how the ROCM team and the AMD 3D-Driver teams (both closed and open source) seem to exist in two completely different universes.

- 3D Driver team: Mostly has their stuff together, understands how creating goodwill with the community pays dividends, seems to be working hard to improve hardware support (especially for new/upcoming GPUs), things seem very streamlined (the drivers just work, for the most part)

- ROCM: Very weird and limited hardware support. Doesn't seem to understand that consumer goodwill will spill over into enterprise sales wins. ROCM seems largely broken with huge issues. Number one complaint I see is that it seems nearly impossible to install. Documentation is spotty and rough. Like what? You should just be able to install it as a package on every modern distribution. Why does it seem that anyone who wants to play with ROCM needs to get a PhD in the arcane arts to get ROCM to install?

**coder** · 08 March 2024, 04:30 AM

Originally posted by Railander View Post

it's a typo and i can't edit that comment.

So, the part about Nvidia making only GPUs was a typo? ...because that's the part I was talking about.

**coder** · 08 March 2024, 04:34 AM

Originally posted by sobrus View Post

Can you please provide source for this?

This:

Nvidia updates GeForce EULA to prohibit data center use

https://www.datacenterdynamics.com/en/news/nvidia-updates-geforce-eula-to-prohibit-data-center-use/

Instead, it wants you to buy the significantly more expensive Tesla

Contrary to what that article says, you didn't necessarily have to use a Tesla card - they would permit Quadro cards, also.

**coder** · 08 March 2024, 04:42 AM

Originally posted by mb_q View Post

Nvidia made a ton of money on it, but nevertheless the idea of GPGPU is stupid, eventually some ASIC will always come, offering a leap in efficiency and speed. IMHO this is what AMD is doing, investing in Xilinx, putting those "tensor cores" into Ryzens... It should be noted that the chip development process takes several years, the current tech was probably more influenced by the crypto bubble and shader-based ray-tracing than the AI bubble.

You're wrong on every point.

None of the purpose-built AI processors are hard-wired ASICs. Excluding the GPU-derived architectures, the rest are all based on DSP-like programmable cores.
The XDNA IP from Xilinx, that AMD integrated into their latest APUs aren't hard-wired or even FPGAs. They're based on DSP cores (see above).
Nvidia was early on the AI bandwagon. They've been crowing about it for more than a decade. This has a lot to do with why they're in the lead. Their Tensor "cores" first featured in Volta, which had no ray tracing hardware.

**sobrus** · 08 March 2024, 04:51 AM

Originally posted by coder View Post

This:

https://www.datacenterdynamics.com/e...ta-center-use/

Contrary to what that article says, you didn't necessarily have to use a Tesla card - they would permit Quadro cards, also.

Thanks, that's a new level of limitations, basically they can forbid you from anything they want. Usually, they only crippled FP64 performance, or other features.
But I don't know if TinyBox is actually data center use. It's an AI box. The issue they had with nvidia is that GeForce cards have P2P functionality blocked, which hampers multi GPU usage. And Radeon still supports it (yet).

I have no doubts, that AMD will do similar things to protect their DataCenter offerings. In fact, they have been doing it for years, just took a little different way than nVidia. And RDNA3 has FP64 performance halved compared to RDNA2. All companies do their best to separate professional and consumer usage, just take different approach.

Anyway, I still think it's a change for AMD. We do have AI boom now, and ppl want to run stable diffusion on their rigs. Showing that TinyBox (which is exactly AI machine build out of toys) uses AMD hardware, would be a huge PR win for AMD, while on hardware level this is still nowhere near their CDNA offerings. And if TinyCorp helps them with ironing out bugs, something that they apparently haven't been able to do for several years, that's even better

**coder** · 08 March 2024, 05:30 AM

Originally posted by sobrus View Post

Usually, they only crippled FP64 performance, or other features.

People like to quote that, but it hasn't actually happened since Kepler. No consumer Nvidia GPU since then ever had substantial fp64 performance, in the hardware. I mean, there was Titan V, but at $3k, can you really call that a gaming card? Plus, they didn't even cripple its fp64 performance, just disabled one stack of HBM2.

More recently, what they've been crippling is the performance of tensor ops needed for training. I think those are the dot products with fp32-accumulate.

The other thing these GPU manufacturers all like to restrict is virtualization pass-thru. That's how they dissuade people from using gaming cards in the cloud.

Come to think of it, I think Nvidia might also restrict the number of NVENC engines consumer cards can use.

Originally posted by sobrus View Post

I don't know if TinyBox is actually data center use. It's an AI box.

It's rackmount and burns a non-trivial amount of power. It therefore would typically go into an air conditioned server room, at least. But, it's also AMD and not Nvidia, so the whole "datacenter" thing is probably moot.

Originally posted by sobrus View Post

RDNA3 has FP64 performance halved compared to RDNA2.

That's probably just because they don't want to waste die space on a feature gamers don't use, rather than including it in the hardware but restricting access to it.

Originally posted by sobrus View Post

All companies do their best to separate professional and consumer usage, just take different approach.

The main difference is memory capacity.

Originally posted by sobrus View Post

Showing that TinyBox (which is exactly AI machine build out of toys) uses AMD hardware, would be a huge PR win for AMD,

Eh, I can see some potential value in it. If Tiny upstreams their firmware mods, then maybe AMD will integrate some of those changes into their Pro cards.

Originally posted by sobrus View Post

while on hardware level this is still nowhere near their CDNA offerings. And if TinyCorp helps them with ironing out bugs, something that they apparently haven't been able to do for several years, that's even better

The practical reality is that AMD probably can't meet the demand for CDNA GPUs through the end of the year. That's apparently how long production capacity of HBM is currently booked up. So long as AMD's CDNA products remain oversubscribed, people doing AI on their RDNA GPUs is basically pure win.

Plus, as of now, I think AMD's investors are probably very impatient for AMD to start cashing in more on the AI boom. So, it would look really bad if AMD brushes off apparent offers by Tiny to help optimize their RDNA GPUs for AI.

**sobrus** · 08 March 2024, 09:30 AM

Originally posted by coder View Post

People like to quote that, but it hasn't actually happened since Kepler. No consumer Nvidia GPU since then ever had substantial fp64 performance

That's true, but is this actually cut out from the chip, or just locked? I was under impression, that they just lock it, since making different chips is costly. Usually, each nVidia GPU generation consists of only a few chips. And the same chips, differently configured, were used in Tesla or Titan.

And it's actually quite understandable that they crippled it - you've got CUDA on consumer cards, but with sluggish FP64.
AMD, on the other hand, can leave it at 1:2, as there is no software to run on these cards anyway. Save for OpenCL which is nowhere as popular as it should be.

Originally posted by coder View Post

That's probably just because they don't want to waste die space on a feature gamers don't use, rather than including it in the hardware but restricting access to it.

Either this if they don't plan any Pro version of this chip, or just locked it, because together with WMMA and 24GB memory this would make nice Titan out of it.
Isn't it, that on hardware level, FP64 is 1:2 of FP32 and if it's lower - it's been intentionally crippled?

Originally posted by coder View Post

Eh, I can see some potential value in it. If Tiny upstreams their firmware mods, then maybe AMD will integrate some of those changes into their Pro cards.

I wish they opened this ecosystem to the point community could provide fixes, optimizations, and bring it to any Radeon chip it can run on. Just like let's say RADV. But I don't set my hopes too high. Maybe Intel will push them a bit.

Announcement

Tiny Corp At "70%" Confidence For AMD To Open-Source Some Relevant GPU Firmware

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment