Announcement

**coder** · 29 January 2021, 03:10 AM

Originally posted by CommunityMember View Post

One of the problems with IA64 was that it was too far ahead of compiler technology of the time,

Nonsense. VLIW chips and compilers to optimize for them weren't particularly novel, in the late 90's. I spent a couple years writing C for an embedded VLIW chip and had little trouble approaching its theoretical performance limits. But IA64 is not VLIW -- it's EPIC (Explicitly Parallel Instruction Computer). The ISA maps out the data-dependencies so the CPU doesn't have to ascertain them at runtime, for the precise reason of facilitating runtime scheduling.

Now, Itanium's architects made a decision to spend their silicon budget on more execution units, but they certainly could've gone out-of-order. And not having to divine the data-dependencies at runtime should give IA64 an efficiency advantage (if slight) over other ISAs of our modern era, such as ARMv8-A and RISC-V.

Originally posted by CommunityMember View Post

Another problem with IA64 was that Intel was unwilling to take the leap of faith and fully commit and put it on their most advanced lithography and displace existing (and profitable) x86 processors which were already supply constrained, so all the IA64 processors were a generation or two or more behind in speeds and feeds.

True, Intel already had a history of building impressive chips and then cancelling them to focus back on x86. There was the i860, and rumor had it that its successor was indeed quite impressive. I gather there were others, before that.*

After Itanium's underwhelming debut/reception, Intel quickly got distracted by their x86 race with AMD and never really gave IA64 the TLC it needed to properly mature.

* This reminds me that Intel even made ARM chips, for a time. Branded StrongARM, I think they were part of an acquisition from DEC/Compaq and eventually sold off again.

**Weasel** · 29 January 2021, 10:32 AM

Originally posted by coder View Post

Nonsense. VLIW chips and compilers to optimize for them weren't particularly novel, in the late 90's. I spent a couple years writing C for an embedded VLIW chip and had little trouble approaching its theoretical performance limits. But IA64 is not VLIW -- it's EPIC (Explicitly Parallel Instruction Computer). The ISA maps out the data-dependencies so the CPU doesn't have to ascertain them at runtime, for the precise reason of facilitating runtime scheduling.

Now, Itanium's architects made a decision to spend their silicon budget on more execution units, but they certainly could've gone out-of-order. And not having to divine the data-dependencies at runtime should give IA64 an efficiency advantage (if slight) over other ISAs of our modern era, such as ARMv8-A and RISC-V.

No, it just doesn't work, because the data-dependencies at runtime often depend on runtime predictions. You simply can't solve it statically.

**phuclv** · 29 January 2021, 11:03 AM

Originally posted by CommunityMember View Post

One of the problems with IA64 was that it was too far ahead of compiler technology of the time, and to get good performance advantages with it required compiler capabilities that were not widely available (hand assembly could show impressive results, but that is not practical for large code bases). Another problem with IA64 was that Intel was unwilling to take the leap of faith and fully commit and put it on their most advanced lithography and displace existing (and profitable) x86 processors which were already supply constrained, so all the IA64 processors were a generation or two or more behind in speeds and feeds.

One of the biggest issues with IA64 was the unpredictability of data paths. People thought leaving the registers and logic units allocation job to the compiler will help saving transistors on the CPU die. But in fact only some of those can be done at compile time. You can't know when a cache miss occurs ahead of time, therefore in practice a core stalls most of the time waiting for data. That's why a dynamic approach used by Transmeta was extremely successful, achieving performance near x86 competitors while still more power efficient

**vladpetric** · 29 January 2021, 11:15 AM

Originally posted by Weasel View Post

No, it just doesn't work, because the data-dependencies at runtime often depend on runtime predictions. You simply can't solve it statically.

And data caches are the most unpredictable of them all.

Until 2000 it was ok to have a processor design which stalled on cache misses. After that, it wasn't ... EPIC is in-order, and yes, that means stalling on cache misses.

**bridgman** · 29 January 2021, 11:41 AM

Originally posted by phuclv View Post

One of the biggest issues with IA64 was the unpredictability of data paths. People thought leaving the registers and logic units allocation job to the compiler will help saving transistors on the CPU die. But in fact only some of those can be done at compile time. You can't know when a cache miss occurs ahead of time, therefore in practice a core stalls most of the time waiting for data. That's why a dynamic approach used by Transmeta was extremely successful, achieving performance near x86 competitors while still more power efficient

That is an important point that doesn't get mentioned much - Itanium was arguably the last of the in-order CPUs to still have high performance expectations, but by the time it came to market OOO CPUs had come to dominate because of the ever-growing gap between CPU clocks and DRAM speeds, and the associated dependency on caches to maintain performance.

**torsionbar28** · 29 January 2021, 11:57 AM

Originally posted by coder View Post

* This reminds me that Intel even made ARM chips, for a time. Branded StrongARM, I think they were part of an acquisition from DEC/Compaq and eventually sold off again.

Yup, I had a PDA powered by StrongArm. The SA-1110 processor to be exact. It was the Compaq iPaq, given to me at work (I worked for Compaq at the time) and like most PDA's of the era, it was more of a novelty item than a useful business tool. Still, in 2000/2001 timeframe, it was fun having a color screen and wifi on a device that fit into a jacket pocket.

**L_A_G** · 29 January 2021, 12:27 PM

Originally posted by rmfx View Post

Hey Intel, time to switch to risc-v, you wont waste half your transistors dealing with a totally bloated obsolete ISA like x86_64, and support is not leaving anytime soon like your unloved IA64.

The decoder being a big part of the silicon budget hasn't really been true in about a decade at this point. Sure, it took up a lot of space on-die when they began "RISC-ifying" x86 with heavy use of superscalarity and other associated features, but that was a long time ago.

**coder** · 29 January 2021, 01:49 PM

Originally posted by Weasel View Post

No, it just doesn't work, because the data-dependencies at runtime often depend on runtime predictions. You simply can't solve it statically.

What's an example of a data-dependency that a CPU would use as a runtime instruction scheduling constraint that can't be determined at compile-time?

**coder** · 29 January 2021, 01:50 PM

Originally posted by vladpetric View Post

And data caches are the most unpredictable of them all.

Until 2000 it was ok to have a processor design which stalled on cache misses. After that, it wasn't ... EPIC is in-order, and yes, that means stalling on cache misses.

Again, EPIC is not inherently in-order. That's merely how they implemented it. You could build an out-of-order IA64 CPU, and it would probably be as fast or faster than anything we have today.

**coder** · 29 January 2021, 01:57 PM

Originally posted by bridgman View Post

That is an important point that doesn't get mentioned much - Itanium was arguably the last of the in-order CPUs to still have high performance expectations, but by the time it came to market OOO CPUs had come to dominate because of the ever-growing gap between CPU clocks and DRAM speeds, and the associated dependency on caches to maintain performance.

An important point that doesn't get mentioned much - GPUs are in-order! And you're a GPU guy!

Also, I would point out that VLIW (in-order, for those who don't know) has remained very popular in DSPs and AI processors, mostly on the basis of their applications having tight loops and fairly predictable latencies (with on-chip memories or cache prefetchers able to help).

Now, I'm not making the argument that Itanium's lack of OoO wasn't a problem. Quite the contrary, I think we're all in agreement that was its primary flaw.

Announcement

Linux Kernel Orphans Itanium Support, Linus Torvalds Acknowledges Its Death

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment