Announcement

**juno** · 12 September 2018, 08:17 AM

Originally posted by coder View Post

That's not exactly clear, since graphics versions of the P100 and V100 have both been released. So, we don't know whether there are actually two versions of the silicon, or if the server versions simply don't use the graphics units. I've read some hints which suggest it's actually the same silicon.

I didn't say there are graphics and compute versions of GP100 and GV100. I said there are graphics top dogs like GM200, GP102, GT102 and there HPC top dogs GP100, GV100 now. The HPC chips obviously still have the graphics hardware though.

Originally posted by coder View Post

They might, but AMD hasn't said as much. The performance numbers they have revealed didn't suggest anything like that.

Yeah, we can't tell yet. What we know is that it has new instructions specifically for DL stuff.

Originally posted by coder View Post

I don't recall hearing anything about that, but it would be great for them.

This info had been leaked a long time ago and basically all the other information from that leak has been confirmed in the meantime. Also the xGMI link I forgot to mention in the previous post as the counterpart to NVLink.
It's long overdue as well. Their last GPU with 1/2 FP64 performance was Hawaii in 2013 which was outstanding for this time (2.5 TeraFLOPS in <300 Watts) but nowadays very outdated obviously.

Originally posted by coder View Post

I sure hope AMD saw this one coming.

I'm still left wondering where, exactly, they would put the RT cores. Their GPUs get less graphics performance per mm^2 than Nvidia's. Unless that situation changes, I don't see how they could justify burning extra silicon on RT hardware. They can't afford to release something that's unable to be cost-competitive with Nvidia. I think that's already happened, with Vega. Its price hasn't dropped to match current GeForce pricing. [checks current prices] ...although it might be getting close.

There have been some papers suggesting that the actual RT stuff doesn't cost much die space at all. It does only some of the work, the accelerated ray intersection. The rest is "just" more compute, where AMD is in a good shape already.
Radeons also haven't been cost-competitive with GeForces for years now. They always have to sell larger dice to compete with smaller ones and Nvidia is still able to sell for higher prices. Look at their margins.

Originally posted by coder View Post

GT106? I hadn't heard about that. According to this, RTX 2070 uses basically the same silicon as RTX 2080.

List of Nvidia graphics processing units - Wikipedia

https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#GeForce_20_series

Also, you forgot Polaris20.

Polaris20 is Polaris10, "born" from GlobalFoundries' minor fabrication progress and not a new chip. I think Nvidia's RTX 2070 uses TU106 (sorry, they are actually called TU, not GT) which is a little weird but we can't be 100% sure yet. The source given in Wikipedia doesn't have actual info on this.

**coder** · 13 September 2018, 12:40 AM

Originally posted by juno View Post

There have been some papers suggesting that the actual RT stuff doesn't cost much die space at all. It does only some of the work, the accelerated ray intersection.

I hope so.

Originally posted by juno View Post

The rest is "just" more compute, where AMD is in a good shape already.

Heh, sadly not. The tensor cores used for denoising the global illumination rays are good for like 58 to 106 fp16 TFLOPS (ranging from RTX 2070 to 2080 Ti; base clocks). Even 7 nm Vega won't be in that league.

**juno** · 13 September 2018, 08:01 AM

Originally posted by coder View Post

Heh, sadly not. The tensor cores used for denoising the global illumination rays are good for like 58 to 106 fp16 TFLOPS (ranging from RTX 2070 to 2080 Ti; base clocks). Even 7 nm Vega won't be in that league.

That's a different story. Denoising is not part of the ray tracing algorithm, it's an additional step that reduces the required amount of samples per pixel. While denoising on tensor cores is interesting, it's limited and costly to implement (requires training for each game or even scene?, deploy all the trained networks, ...) and not the only possible optimisation. We also have not seen real results of that, it might tend to have artifacts, might look perfect, but we don't know. What we know is that the Battlefield devs decided not to use it for whatever reason.
Temporal anti aliasing is a natural denoiser that every modern game implements and it as well helps reducing the amount of spp. Additional denoising could be achieved through compute shaders, the new Battlefield is doing that instead of using tensor cores. Cost could further be reduced on AMD with async compute. Another possibility is to get more RT hardware capable of more rays/s onto the chip so there won't be as much noise. Trade tensor for RT space.
I won't be overly optimistic, but there is no reason to be overly pessimistic as well. Just being curious and excited about the new things to come

**coder** · 13 September 2018, 11:36 PM

Originally posted by juno View Post

That's a different story. Denoising is not part of the ray tracing algorithm, it's an additional step that reduces the required amount of samples per pixel.

If you're doing global illumination, you've got noise.

Originally posted by juno View Post

Temporal anti aliasing is a natural denoiser that every modern game implements and it as well helps reducing the amount of spp.

TAA has its own issues, and isn't strong enough to begin to deal with GI noise.

Originally posted by juno View Post

Additional denoising could be achieved through compute shaders,

A lot of people have been working on GI denoising filters, for a long time. It's not exactly a solved problem, which is why it's a ripe target for deep learning.

Originally posted by juno View Post

Another possibility is to get more RT hardware capable of more rays/s onto the chip so there won't be as much noise. Trade tensor for RT space.

You underestimate how many more rays you'd need. I was just talking with an author of an upcoming ray tracing book who mentioned seeing with/without denoising on a film render that used 1600 rays/pixel. He had to agree that denoising was definitely still needed, even at that ridiculous level of illumination. Granted, film renders are probably at 8k, but a lot of gamers are coming to expect 4k gaming and > 60 Hz framerates. You're not going to get there by simply adding more rays.

Now, no one is saying they need to use ray tracing for global illumination. There are other options, none of which are quite as elegant or accurate.

Originally posted by juno View Post

I won't be overly optimistic, but there is no reason to be overly pessimistic as well. Just being curious and excited about the new things to come

I write what I think and expect, although I'm ready to be surprised. AMD's hardware strategy isn't going to be influenced by my posts, nor do I think I'm influencing anyone not to buy current gen AMD hardware out of concerns for competitive prospects of future products. Let's be reasonable.

AMD surprised us with Zen, and I am hoping they can work a similar miracle in their GPU division. That said, ray tracing might be too recent a development for NGG. We'll see. But they're going to have to answer Nvidia's tensor cores with something. So, even if we accept deep learning is the answer to GI denoising, that still doesn't necessarily put AMD out of the running.

**msroadkill612** · 23 September 2018, 10:54 PM

The team of champions vs a champion team thing is an; apt, true and common usage sporting adage.

Amd, cpu & gpu, warts and all, are a team. Intel and nvidia are ~competitors. In the real world, there are considerable advantages to a single ecosystem for cpu & gpu.

It is very likely TR and epyc will swamp intel as the AI compute platform. That will mean new owners taking a hard look at amd as a gpu solution also.

As an outsider looking in, as u note, I cant comment further on the "necessity" of cuda/tensor/avx.., but AI sounds like it involves vast amounts of costly memory, and Vega addresses this with the HBCC processor/controller.

The Shape Of AMD HPC And AI Iron To Come

https://www.nextplatform.com/2017/08/08/shape-amd-hpc-ai-iron-come/

In the IT business, just like any other business, you have to try to sell what is on the truck, not what is planned to be coming out of the factories in

"

- even with that 8GB of VRAM under the New Vega GPU micro-architecture that Vega’s new included HBCC/HBC technology can utilize that 8GB of VRAM as a last level cache to leverage the regular system DRAM as secondary VRAM pool with Vega’s HBCC able to perform the swapping in the background to and from that 8GB of HBM2 based VRAM cache to effectively increase the VRAM size to whatever the system DRAM size may be.
  
  The Vega GPU IP also includes in that HBCC/HBC/memory controller subsystem the ability to manage in its Vega GPU micro-architecture based GPU SKUs the GPU’s own virtual memory paging swap space of up to 512TB of total GPU virtual memory address space.
  
  So that’s 512TB of addressable into any system DRAM and onto the system’s memory swap space of any system NVM/SSD or hard drive storage devices attached. And Vega’s HBCC/HBC manages all that rather effortlessly.
  
  There is also the Radeon SSG SKUs that also make use of their own PCIe card included NVM stores for that SKUs needs, and there will be Vega micro-architecture based “SSG” branded variants for both acceleration and AI workloads.
  
  8GB of VRAM is nothing laugh at if that 8GB of GPU VRAM is actually a VRAM cache that can leverage at its disposal up to 512TB virtual memory address space across regular system DRAM or SSD/NVM and only keep in that 8GB of VRAM cache what data/textures the GPU actually requires for it more immediate needs. "

Announcement

AMD Nearing Full OpenCL 2.0 Support With ROCm 2.0 Compute Stack

Comment

Comment

Comment

Comment

Comment