Announcement

**bsp2020** · 07 July 2016, 01:12 PM

Originally posted by duby229 View Post

Are you sure they are in order? The diagram for a compute unit doesn't look at all like an in order architecture. Look at the diagram in the post above this one and then compare it to an out of order architecture. They look very similar.

https://www.amd.com/Documents/GCN_Architecture_whitepaper.pdf

Search for "in-order", "To preserve in-order execution, each instruction must also come from a different wavefront;" in CU FRONT-END section. This is about instruction execution.
Search for "out-of-order", "Tasks complete out-of-order, which releases resources earlier, but they must be tracked in the ACE for correctness." in SYSTEM ARCHITECTURE. This is about Asynchronous Compute.

I don't think there are any throughput optimized architecture that executes instructions out-of-order. In fact, the whole point of building throughput optimized architecture is to use more transistors for calculation instead of instruction execution order tracking.

**Oguz286** · 07 July 2016, 01:13 PM

Originally posted by duby229 View Post

I wrote a reply, but it's in the mod que.

EDIT: Basically the gist was that for AMD architectures it takes a compute unit to be a core.

http://images.anandtech.com/doci/4455/GCN-CUTh.png

Take a good look at this diagram, you can clearly see how the front end, fetch, and decode are at the compute unit. the stream processors by themselves aren't capable of doing anything. The logic to function exists at the compute unit level, which means the compute unit is the core.

I know, that's what I meant. Both NVIDIA and AMD call each ALU a core. The RX 480 has 2304 "cores", the same as my GTX 780. They should be called ALUs because that's what they are.

Originally posted by Passso View Post

So this will be 480 VS 1060. At last a real battle, the winner will get my money.

Fight!

HADOUKEN!

**juno** · 07 July 2016, 01:32 PM

Originally posted by atomsymbol

GTX 1060: 1280 ALUs * 1.5GHz = 1900
RX 480: 2304 ALUs * 1.2GHz = 2700

Equal performance in Windows would mean GTX 1060 has 2700/1900=1.4 IPC advantage over RX 480.

Actually, the 1060 has 1.7 GHz.
However, the 480 is still in front, obviously, in terms of FLOPS. It has been like this for a while now, but it hasn't been a big problem for AMD because they were able to compensate it with wider GPUs. By 'wider' I mean basically more ALUs. AMD's GPUs are also more dense so they don't get much bigger (=more expensive).
Now Nvidia has this Maxwell-shrink and raises clocks to a completely new dimension without extraordinary high voltages and AMD has its GCN shrink and can't rise clocks really much with quite high voltages. That's what really strokes them, imho. And that's what I don't understand. They had this problem before, now it is even worse. Did they think 14 LPP would fix all that automatically?

**bug77** · 07 July 2016, 01:34 PM

Originally posted by dungeon View Post

Ha, ha, marketing does that always

One of the reasons I do my best to ignore marketing and wait for actual product launches and reviews.
Edit: Yes, I know most reviews start by regurgitating all the marketing slides I try to ignore.

**bridgman** · 07 July 2016, 01:41 PM

Originally posted by atomsymbol

I don't understand the meaning of in-order and out-of-order in the context of a SIMD processor. Radeon GPUs cannot execute other instructions while waiting for data to arrive from memory for example?

They can not execute other instructions from the same instruction stream (the usual definition of out-of-order processing), but they can switch to another thread on the next clock and execute from that instruction stream instead. Each SIMD has 10 program counters associated with it (10 threads) for a total of 40 threads per CU.

What makes the terminology tricky is that the other thread may be executing the same shader program but with different data and a different program counter, but IMO that does not count as "out of order processing"... "block multithreading" is probably a good description.

**bridgman** · 07 July 2016, 01:45 PM

Originally posted by atomsymbol

I don't understand the meaning of in-order and out-of-order in the context of a SIMD processor. Radeon GPUs cannot execute other instructions while waiting for data to arrive from memory for example?

Bah... forum software ate my post again (not moderated, just started processing the post, then stopped, redrew the screen again, and my post was gone forever).

So... GCN GPUs can not execute other instructions from the same instruction stream while waiting for data to arrive from memory. They can, however, switch to another instruction stream on the next clock cycle and continue executing seamlessly. I believe "block multiplexing" is the usual name for this.

Each SIMD has 10 program counters associated with it, for a total of 40 threads per CU.

**bridgman** · 07 July 2016, 01:48 PM

Auggh, that's two posts eaten one after another. Each SIMD is associated with 10 program counters, so 40 threads per CU. When waiting for memory etc... the shader core switches to another instruction stream on the next clock cycle, allowing it to continue execution but not on the same instruction stream.

**bridgman** · 07 July 2016, 01:48 PM

Bleah... two posts eaten, third post auto-moderated. I bet this post goes through just fine since it has no useful content...

EDIT - yep

**psycho_driver** · 07 July 2016, 02:10 PM

Originally posted by justmy2cents View Post

unless it performs at least 2x as fast as 480, there is no way i would ever buy it. OSS drivers, ftw.

At 1080p the 2x will probably be about right. It will be interesting to see if there are 1080p results this go around.

Also, AMD was the first company in this industry I saw using goofy graphs like that back around the time of the bulldozer launch.

**duby229** · 07 July 2016, 02:17 PM

Originally posted by atomsymbol

I don't understand the meaning of in-order and out-of-order in the context of a SIMD processor.

Radeon GPUs cannot execute other instructions while waiting for data to arrive from memory for example?

Look at the block diagram for a compute unit, you can clearly see the execution path. There can be at least 4 in flight simd operations. And it certainly does look like they can be issued out of order. Several people here said that's not the case though, but it sure does look like it can.

Announcement

NVIDIA Announces The GeForce GTX 1060, Linux Tests Happening

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment