Announcement

**LukeP** · 09 October 2018, 07:57 PM

i too think they are about right from what ive seen. but something is fishy. a TC can do 1024 calcs per clock, and there are 500/640 of them. think about that. they are not being occupied because you should get 5X (100TF vs 20TF) performance for matrix mul, which is the bulk of deep learning processing time.

**audir8** · 10 October 2018, 02:59 AM

Originally posted by LukePoga View Post

i too think they are about right from what ive seen. but something is fishy. a TC can do 1024 calcs per clock, and there are 500/640 of them. think about that. they are not being occupied because you should get 5X (100TF vs 20TF) performance for matrix mul, which is the bulk of deep learning processing time.

From here: https://www.nvidia.com/content/dam/e...Whitepaper.pdf

"Eight Tensor Cores in an SM perform a total of 512 FP16 multiply and accumulate operations per clock, or 1024 total FP operations per clock."

There are 8 TC in an SM, and there are 68 SMs in a 2080ti. So it's 640 TC in Titan V and 544 in a 2080ti.

**LukeP** · 22 October 2018, 03:28 AM

any updates why 2080ti is running half speed? keen to know if it will be fixed before purchasing

**coder** · 22 October 2018, 09:39 PM

Originally posted by LukeP View Post

any updates why 2080ti is running half speed? keen to know if it will be fixed before purchasing

This is probably not the best resource for deep learning benchmarks.

I recommend finding a site that's exclusively focused on deep learning. Or perhaps Nvidia's own developer forums.

**LukeP** · 22 October 2018, 09:44 PM

i already posted on devtalk. noone seems to know why.

one possibility might be (speculation only) nvidia did a swifty and put in inference cores, not the full tensor cores. in that case it would be false advertising of TFlops, which could result in mass refunds. hopefully not true and will be resolved at some stage.

it really would be a good news article IMHO lol.

**coder** · 23 October 2018, 12:55 AM

Originally posted by LukeP View Post

i already posted on devtalk. noone seems to know why.

Have others reproduced Michael's results?

I'd be curious whether the same test setup can reproduce any official results from Nvidia.

**LukeP** · 23 October 2018, 01:11 AM

you can run the Cuda 10 example cublasTensorCore and its the same result. 2080ti is half as fast as titan v.

i dont have either card to test myself. im waiting on these peculiarities before i buy one.

**LukeP** · 25 October 2018, 05:58 AM

Originally posted by coder View Post

Have others reproduced Michael's results?

I'd be curious whether the same test setup can reproduce any official results from Nvidia.

I was trying very hard to find out why 2080Ti tensor cores were half as fast as Titan V tensor cores.

The reason is that they can only do FP32 accumulate at half speed. Titan V tensors and infact Quadro RTX tensors(!!) do full speed.

So they did gimp the tensor cores for the consumer models of RTX.

https://www.purepc.pl/image/mini_rec...fikacja_16.png

It is documented in their Turing Whitepaper.

**coder** · 25 October 2018, 05:25 PM

Originally posted by LukeP View Post

I was trying very hard to find out why 2080Ti tensor cores were half as fast as Titan V tensor cores.

The reason is that they can only do FP32 accumulate at half speed. Titan V tensors and infact Quadro RTX tensors(!!) do full speed.

Thanks for following up with this. Now that you mention it, I think I do remember reading about the reduced accumulator. However, I didn't know the Quadro RTX models lacked this limitation.

Let's hope they unlock it on the Titan card, although I'm fearing they probably won't.

**JanW** · 06 November 2018, 08:37 AM

Question about the RTX 2080Ti which may be a wee bit off-topic here:
There have been recent reports about the cards degrading rapidly (see link). I have a machine with dual RTX 2080Ti on order to use for machine learning. Now I'm worried my cards will be RIP before I know it.

phoronix : Have you been hitting your RTX 2080Ti sufficiently hard since you got it to assess whether there is an issue with your sample, Michael?

I know that this would be rather anecdotal evidence with a sample size of 1, just trying to be reassured here.

Announcement

NVIDIA GeForce RTX 2080 Ti To GTX 980 Ti TensorFlow Benchmarks With ResNet-50, AlexNet, GoogLeNet, Inception, VGG-16

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment