Announcement

**coder** · 15 May 2018, 11:47 PM

Originally posted by DrYak View Post

Is there any key difference between a machine learning card vs. a gaming GPU ?

A traditional GPU has a number of specialized units, designed for graphics workloads: ROPs, texture engines, geometry processors, tessellation engines, etc. For machine learning, these would usually sit idle, wasting die space (which costs $).

Originally posted by DrYak View Post

Probably the pricing would be different (more expensive, because "Enterprise" target)

The reason to target machine learning first is that they're more cost-insensitive, so they can more easily foot the bill for a big chip on a new process node. Like what Nvidia did with their leading-edge Pascal and Volta chips. GP100 and GV100 are so expensive to make, that they'll never be viable mass market gamer GPUs.

Originally posted by DrYak View Post

UVD would also be useful on a AI-only card, given that lots of AI intelligence is done on image/video recognition.

Yes. Well said.

**DrYak** · 16 May 2018, 11:50 AM

Originally posted by coder View Post

ROPs

Indeed, thanks for pointing out.

Originally posted by coder View Post

texture engines

Back when I was writing CUDA code, texture make nice look-up tables (even if not using any interpolation capability).

Originally posted by coder View Post

geometry processors, tessellation engines, etc.

I was under the impression that these are computed on the main SIMD engine of GPUs ?
(i.e.: modern GPUs just have shaders. Not specifically pixel, geometry and tessellation shaders).

**marek** · 16 May 2018, 07:09 PM

Originally posted by DrYak View Post

(i.e.: modern GPUs just have shaders. Not specifically pixel, geometry and tessellation shaders).

Modern GPUs still have vertex, tessellation control, tessellation evaluation, geometry, and pixel shaders.

The tessellation itself is done on fixed-function tessellation units outside of shaders. It happens exactly between both tessellation shaders.

**coder** · 16 May 2018, 08:15 PM

Originally posted by DrYak View Post

Back when I was writing CUDA code, texture make nice look-up tables (even if not using any interpolation capability).

Most of the hardware in a texture engine is for doing decompression, fancy interpolation, etc. I'm not even sure normal shader memory accesses would go through a texture engine, in fact.

**coder** · 16 May 2018, 08:29 PM

Originally posted by marek View Post

Modern GPUs still have vertex, tessellation control, tessellation evaluation, geometry, and pixel shaders.

As logical constructs, but we only have so much visibility into how that's actually implemented. I assume most of the programmable shaders actually run on the SMT/SIMD cores.

Details aside, the point is that there's a significant amount of graphics-specific hardware on modern GPUs. Not the majority, but probably enough to cost real $ (in terms of dies per wafer and yield).

If AMD thinks they're going to do enough business in the machine learning & HPC markets, it's probably worth their while to target them with a specialized bit of silicon. Remember, AMD does a lot of semi-custom products for folks like Sony, Microsoft, Intel, Tesla, etc. So, you'd imagine their designs are fairly customizable.

**marek** · 17 May 2018, 05:44 PM

Originally posted by coder View Post

As logical constructs, but we only have so much visibility into how that's actually implemented. I assume most of the programmable shaders actually run on the SMT/SIMD cores.

Details aside, the point is that there's a significant amount of graphics-specific hardware on modern GPUs. Not the majority, but probably enough to cost real $ (in terms of dies per wafer and yield).

If AMD thinks they're going to do enough business in the machine learning & HPC markets, it's probably worth their while to target them with a specialized bit of silicon. Remember, AMD does a lot of semi-custom products for folks like Sony, Microsoft, Intel, Tesla, etc. So, you'd imagine their designs are fairly customizable.

Yes, all shaders run on the same CUs. I wouldn't use the term SIMD or SIMT, because GCN shaders have both vector and scalar instructions, and both vector and scalar registers, and there are both vector units (SIMDs) and scalar units. The vector unit operates on 64 elements (threads). The scalar unit does work that is common for a group of 64 threads, and then broadcasts the results to the threads.

**coder** · 17 May 2018, 07:23 PM

Originally posted by marek View Post

Yes, all shaders run on the same CUs. I wouldn't use the term SIMD or SIMT, because GCN shaders

GCN CU's are SMT with scalar and SIMD pipes. Simple as that.

And it's plain disinformation to call the SIMD lanes "threads". They're not threads in the classical CPU sense of the word. Using that terminology, you can't talk about SMT. It would be SMW, because Nvidia refers to actual threads as Warps (AMD calls them Wavefronts).

**marek** · 17 May 2018, 08:13 PM

They are not CPU threads of course. They are GPU threads in the GPU sense, and the GPU sense is pretty arbitrary between vendors.

**bridgman** · 17 May 2018, 09:50 PM

Originally posted by coder View Post

And it's plain disinformation to call the SIMD lanes "threads". They're not threads in the classical CPU sense of the word. Using that terminology, you can't talk about SMT. It would be SMW, because Nvidia refers to actual threads as Warps (AMD calls them Wavefronts).

AFAIK "threads" is NVidia terminology although it has become pretty broadly used - officially I think we call them "work items".

Agree that a warp/wavefront is arguably closest to what we think of as a CPU thread (since each warp/wavefront has its own program counter), however the argument for applying the "thread" term to an individual work item is that branching in the shader program logically happens at a thread/work item level.

The result of branching is that the SIMD can execute more than one path through the shader program, with some threads/work items enabled on one path and the rest enabled on another path.

**DrYak** · 18 May 2018, 09:21 AM

Originally posted by coder View Post

I'm not even sure normal shader memory accesses would go through a texture engine, in fact.

Back in those days, in CUDA it was left to the programmer's choice :

- normal pointer math is direct memory access (direct to video RAM, thus with some latency (that you can hide through hyper-threading) and you needing to watchout how you access thing so the requests can be grouped on the memory bus)

- accessing textures using special pseudo-functions goes through the texture engine. And benefits from the texture engine's caching and fast access. But at the cost that texture are opaque object (due to the cache-friendly tiling / swizzling) you can read them using the pseudo-function, but you can't directly read-write the memory, you can only load them in advance using loading function on the CPU side. So they were mostly useful for fast cached look-up tables.

Announcement

AMD Publishes Open-Source Driver Support For Vega 20

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment