Originally posted by mdedetrich
View Post
Originally posted by mdedetrich
View Post
Furthermore, one can do comparative analysis of such APIs, to identify specific features and characteristics which have such consequences.
Originally posted by mdedetrich
View Post
The problem here is that you're taking a limited set of data and imputing meaning that it's simply insufficient to support. In other words, you're merely speculating why SMT hasn't featured more prominently, in ARM cores. You're not allowing for the possibility that you're wrong, but that possibility is very real.
I think there's more to be learned in looking at cases where Intel has and hasn't employed it. Specifically, how none of their E-cores have had it, after the original Atom (which was an in-order core, with 4-way SMT). A notable exception is the modified Silvermont core that they employed in 2nd Gen (KNL) Xeon Phi, which is an OoO core with 4-way SMT. This suggests the driving factor in whether to employ SMT is probably one of power-efficiency. This aligns with the data you cited about ARM cores, as all of ARM's own cores, as well as Apple's, have been mobile-first.
Another noteworthy data point is that Xeon Phi scaled up to 72 cores, which is an order of magnitude beyond the scales we see in Phone SoCs.
Originally posted by mdedetrich
View Post
Originally posted by mdedetrich
View Post
Originally posted by mdedetrich
View Post
Originally posted by mdedetrich
View Post
Originally posted by mdedetrich
View Post
Originally posted by mdedetrich
View Post
Originally posted by mdedetrich
View Post
Originally posted by mdedetrich
View Post
Originally posted by mdedetrich
View Post
Originally posted by mdedetrich
View Post
Originally posted by mdedetrich
View Post
It's interesting that you pose micro-ops as the solution to "x86/64 instructions generally doesn't contain control order flow", after already positioning SMT as a solution to this same problem.
I think the main reason Intel adopted micro-ops is due to the complex and multi-faceted nature of x86 instructions. Aspects like memory operands and address arithmetic are easier to manage and optimize, if you break them into separate operations.
Originally posted by mdedetrich
View Post
There are lots of ways to crack this nut, but it is an issue. In Intel's Tremont E-core, they employed two parallel 3-wide decoders that can concurrently decode instruction streams from different branch targets. That's not as good as SMT, but it gets you part way there.
Comment