Announcement

**artivision** · 20 July 2019, 08:04 AM

Is there a Wave32 game?

**mannerov** · 20 July 2019, 08:35 AM

So the architecture supports both 32 and 64-length wavefronts ? Is there a performance interest to use 32 on the ALU side ? 64 has many advantages over 32 to me, relative to memory access patterns, but I agree for some workloads, 32 could be a better fit.

**dungeon** · 20 July 2019, 09:03 AM

I was thinking it is now 32 only, but who knows 64 might be in a way left for something

Starting with the architecture itself, one of the biggest changes for RDNA is the width of a wavefront, the fundamental group of work. GCN in all of its iterations was 64 threads wide, meaning 64 threads were bundled together into a single wavefront for execution. RDNA drops this to a native 32 threads wide. At the same time, AMD has expanded the width of their SIMDs from 16 slots to 32 (aka SIMD32), meaning the size of a wavefront now matches the SIMD size. This is one of AMD’s key architectural efficiency changes, as it helps them keep their SIMD slots occupied more often. It also means that a wavefront can be passed through the SIMDs in a single cycle, instead of over 4 cycles on GCN parts.

AMD Announces Radeon RX 5700 XT & RX 5700: The Next Gen of AMD Video Cards Starts on July 7th At $449/$379

https://www.anandtech.com/show/14528/amd-announces-radeon-rx-5700-xt-rx-5700-series/2

**marty1885** · 20 July 2019, 11:12 AM

Originally posted by mannerov View Post

So the architecture supports both 32 and 64-length wavefronts ? Is there a performance interest to use 32 on the ALU side ? 64 has many advantages over 32 to me, relative to memory access patterns, but I agree for some workloads, 32 could be a better fit.

I think now it is easier to have the ALUs all utilized and less penalty when you have to perform some operatoins at the cost of worse memory access pattern. (I can't tell for sure unless AMD releases Navi architecture documents like Vega). It's the AMD fine-wine problem, theoretically GCN wave 64 have a higher performance, but it is hard to fully utilize the entire GCN core. Like how VLIW was a good idea until we found that optimizing VLIW is a nightmare, so we moved back CISC and now even RISC.

**cb88** · 20 July 2019, 11:53 AM

Originally posted by artivision View Post

Is there a Wave32 game?

I has nothing to do with the game... it is purely in how the driver handles the code. That said most compute code does care, and is optimized for wave64 so AMD retained support for running both, the way they did this is the wave64 code executes across 2 cycles instead of one. Up untill now the open source graphics driver has been issuing wave64 code, running in this two cycle mode, with wave32 code being issued we will see better ALU utilization, as previously with wave64 code lots of the compute resources could go underutilized in many cases.

**cb88** · 20 July 2019, 11:56 AM

Originally posted by dungeon View Post

I was thinking it is now 32 only, but who knows 64 might be in a way left for something

https://www.anandtech.com/show/14528...-5700-series/2

It can definitly still do wave64... that is what the entire driver is doing up until now on Navi... it just takes 2 cycles instead of one to do the work but the Navi CUs are 2x as fast to begin with so, you only notice a small overhead for doing this for compute code, while graphics shaders that sometimes have difficulty utilizing a wave64 and may only use only a quater of it's resources can now be broken down into wave32 for better utilization since the resources are scheduled more finely.

They give a fairly detail high level overview of how all this works in the tech slides I'm supprised people are commenting on this without bothering to read it: https://www.techpowerup.com/256660/a...tc#g256660-121

Doing this is the main reason they got a 1.25x IPC improvement... so current Mesa code we've seen benchmarked so far has not even taken advantage of this at all.

**dungeon** · 20 July 2019, 12:16 PM

Originally posted by cb88 View Post

They give a fairly detail high level overview of how all this works in the tech slides I'm supprised people are commenting on this without bothering to read it

I had read it of course, but understood by this example there is no native wave64 anymore, just 2 wave32

AnandTech | Gallery - Next Horizon Gaming: Radeon Architecture - 22 Photos

https://www.anandtech.com/Gallery/Album/7172#12

**cb88** · 20 July 2019, 12:53 PM

Originally posted by dungeon View Post

I had read it of course, but understood by this example there is no native wave64 anymore, just 2 wave32

https://www.anandtech.com/Gallery/Album/7172#12

The driver doesn't know... that's the whole point, it's executed natively just in a different way across 2 cycles. It's not doing emulation or anything like that.. similar to how AVX instructions on Zen 1 take 1 or 2 cycles depending on length etc (128bit is single, 256 is 2 cycle).... there is native support for the instructions but the hardware is not setup to execute them in a single cycle.

**artivision** · 20 July 2019, 01:34 PM

Originally posted by cb88 View Post

I has nothing to do with the game... it is purely in how the driver handles the code. That said most compute code does care, and is optimized for wave64 so AMD retained support for running both, the way they did this is the wave64 code executes across 2 cycles instead of one. Up untill now the open source graphics driver has been issuing wave64 code, running in this two cycle mode, with wave32 code being issued we will see better ALU utilization, as previously with wave64 code lots of the compute resources could go underutilized in many cases.

If it was like that, then why they keep compatibility with Wave64 in expense of more transistors? They could just execute everything per 32 threads.

Announcement

RadeonSI Gallium3D Driver Adds Navi Wave32 Support

RadeonSI Gallium3D Driver Adds Navi Wave32 Support

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment