Announcement

**s_j_newbury** · 20 November 2017, 07:07 AM

Originally posted by jacob View Post

There is no way compiler tech could ever be where Itanium needed it, because scheduling instructions for such an ISA im the general case is equivalent to thw Turing machine halting problem. It was a horrible idea from the start and frankly I'm amazed that someone manages to get HP and Intel to take it seriously at all.

Now don't get me wrong: an explicitly parallel core is GOOD(tm), but the parallel dispatch must occur dynamically, not at compile time. All modern CPUs do that and it's called hyperthreading. An instruction queue is continuously filled from two (or more) *INDEPENDENT* program threads and, after translation, these instructions are fed to a two-way (or more) internal EPIC/VLIW core. Because unlike the compiler, the internal dispatcher has a runtime view of the pipelines and current latencies, it can send individual instructions on the fly as needed. That's the right way to do it.

As for the Alpha, it is important to remember that there waa nothing magical about it. It was a stock RISC ISA, just like MIPS, SPARC etc. What made it so fast had nothing to do with its design, it came from the fact that the core was largely drawn by hand and hand-optimised to the last degree. That and a "dumb" (and thus quick) but large cache allowed it to reach 500MHz at a time when 100MHz was the norm.

All good points. The Alpha did introduce new features and technologies that made it into later architectures though, even just the EV7 interconnect which AMD licensed as their HyperTransport* link in the K8 (amd64). The team working on it did know a few things about CPU design, I guess that was my point, rather than anything magical.

I'm not entirely convinced though. The trend in mainstream CPU design has certainly been to increase complexity of the on die run-time management of program execution while simplifying the execution units themselves and having more of them. This does form a bottleneck in the execution as dispatch can only occur as fast as the dispatcher can handle and offload fetching, decoding, scheduling/reordering etc despite this being parallelised as much as possible. This limits the number of actual execution units that can be integrated together to improve performance. Simplifying CPU design by performing operations like instruction scheduling at compile time is part of reducing the overhead of program flow management so that ideally each execution unit can eventually act as a node in a self-organising network, like the neural network of synapses and neurons in a biological brain. This network on a chip concept is actually what's used with the Sunway TaihuLight supercomputer.

* This seems to have been revised from history, certainly there's no mention of it on the relevant Wikipedia entry. But I'm certain it was the case, and there are still Google hits to back it up.

Announcement

QEMU 2.11-RC1 Released: Drops IA64, Adds OpenRISC SMP & More

Comment