Originally posted by linuxgeex
View Post
First of all, the depth of pipelineing doesn't help at all with IPC. Actually, it can make things worse (all the things in the pipeline need to be kept somewhere, and the deeper the pipeline, the more record keeping you need). Maybe the size of the re-order buffer can help. Still, given that it takes hundreds of cycles to get to main memory, you really don't wanna go to main memory twice, in series (the second load depends on the first load, so they can't be paralelized). Even the largest ROBs can't compensate for that.
Second, the branch mispredict cost is far smaller than going all the way to memory. Won't matter here.
Third - SMT is irrelevant here- it's a multi-threaded throughput technique, not a single-threaded latency thing. Actually, it can make single-threaded latency worse.
Please, read:
1. https://people.freebsd.org/~lstewart.../cpumemory.pdf
2. some Hennessy and Patterson, a recent edition
Most likely, last time you looked at these issues, the latency of main memory was much much lower than it is right now. Indeed, you mention gcc 1.0, and yeah, in 1987 we barely had L1 caches. There's a reason we have L3 and even L4 caches these days - the gap between processor speed and memory speed has increased considerably. But, you haven't been keeping up.
Comment