Originally posted by mdedetrich
View Post
Originally posted by mdedetrich
View Post
For what it's worth, my take on why/where SMT makes sense is quite different from yours, although we might end up with the same conclusion.
The primary value of SMT in a modern CPU is the ability to build a very wide (high peak IPC) core that can efficiently execute both well-optimized code and older / less optimized code. Making good use of a wide core's execution resources can be done in a few different ways:
- optimize the code so that a reasonably deep OOO execution engine can find enough ready-to-execute micro-ops in a single instruction stream to keep most/all of the execution resources busy
- use SMT so that on average you only have to find enough ready-to-execute micro-ops in a single instruction stream to keep 1/2 or 1/4 of the execution resources busy - or put differently you have to find enough ready-to-execute micro-ops in 2 or 4 instruction streams to keep most/all of the execution resources busy
- significantly expand the OOO capabilities of the core to improve the chances of finding enough ready-to-execute micro-ops in a single instruction stream - M1 is the poster child for this but so far the ARM-designed cores are more in line with x86-64. IIRC the Neoverse V-1 core in Graviton3 has a 256 entry re-order buffer, the same as Zen3.
Leave a comment: