Originally posted by AlB80
View Post
If computation were as cheap as you say, why is there a performance difference between the RX 6800 XT and 6900 XT, even though both use the same memory bus width and clock speed? Same for the RX 5700 vs. RX 5700 XT and RX 5600 vs. RX 5600 XT and RX 5500 vs. 5500 XT? I guess the people who buy the more expensive versions are paying for nothing and all the benchmarks are Fake News? And it's not just AMD -- Nvidia RTX GPUs with the same bandwidth span all the way from the RTX 2060 Super to the RTX 2080!
If computation were free, then they'd just make each GPU die with enough compute power that it's entirely bandwidth-constrained, and the only spec you'd need to look at would be its memory bandwidth. Sadly, that's not the case.
Given that you have a fixed amount of compute and a fixed amount of bandwidth, the task of a GPU programmer is to focus on optimizing whichever is the bottleneck. If your workload is computationally cheap, then you're bandwidth-limited and your goal is to manage data movement to minimize the amount of time the compute sits idle. That's different than saying computation is free -- just that, in your case, it's not the main bottleneck. Now, were you computing digits of pi or rendering fractals, you wouldn't even care about bandwidth -- it would be all about optimizing compute.
It also happens to be the case that graphics, in particular, is very bandwidth-intensive, which results in a lot of emphasis being placed on data movement. However, just because they drill it into you to optimize your data movement to keep the compute resources busy doesn't mean that compute isn't also a bottleneck, in many cases. The fact is that there's only a fixed amount of compute power and so, as a programmer, your challenge for milking the most performance from it is going to be to ensure that the compute engines stay busy. That's different than saying that if there were more compute engines, your code wouldn't run any faster. Maybe so, if you're doing some extremely cheap computation that's very bandwidth-intensive, but it's not necessarily so. There are reasons they deploy the compute:bandwidth ratio that they do, and it's about striking a balance.
Comment