Itanium IA-64 Was Busted In The Upstream, Default Linux Kernel Build The Past Month

Space Heater replied

18 January 2021, 09:17 PM
Originally posted by cb88 View Post

Actually RDNA1/2 is has brought back some of the VLIW design.... and its likely that the same is true of CDNA. They are calling it Super SIMD.

Can you explain how this is like VLIW? I'm not seeing where RDNA is breaking with the GCN ISA and *requiring* that compilers now statically scheduling independent instruction pairs/bundles. If anything, this sounds like superscalar concepts being mapped to SIMD, where the hardware assumes responsibility for issuing multiple instructions and checking for hazards.
Likes 1
Leave a comment:
vladpetric replied

18 January 2021, 07:59 PM
Originally posted by cb88 View Post

No it really was the Pentium 3.... which morphed into Pentium M and later conroe and core architecture, the P4 is the exact thing the Transmeta CPUs were good at combating since you could not fit a P4 in a tablet sized device. As far as the rest of the comment you clearly have no idea what you are talking about... most CPUs have to do a static form of code morphing anyway and but cannot do runtime optimizations because patents but that is coming

The Transmeta CPU is acutally an x86 CPU with a VLIW front end exposed all the hardware around the VLIW core including the registers is essentially an x86 CPU. The whole point was to get ride of hardware implementations of things that could be done in software *better* and with lower power use. Modern CPUs get around some of this by clock gating inactive silicon... another advantage of VLIW is you can implement fast paths for all the instructions at low cost... since implementing an instruction is the job of the software. On the contrary intel tends to force instructions to be slow to try to get people to stop using them.

If you're gonna pull an ad hominem, at the very least get it right (still an ad-hominem, but a correct ad hominem). Yeah, I only have a PhD in microarchitecture with dissertation cited in two Intel preliminary patents.

The whole idea that software can do instruction scheduling better has been shown to be patently false (billions of dollars wasted later, I should add). As I said earlier on this thread, compilers (or translation software in Transmeta, for that matter) can't handle the dynamicity of the memory hierarchy. Stalling when you have misses to main memory in the order of ~500 cycles is really a retarded idea (stalling is what in-order processors do, including VLIW).

The problem is that many people (primarily compiler writers) don't want to accept that OoO instruction scheduling is a really good idea (even mobile processors and the Atoms do that today), so completely ignore all the factual evidence. Much like the political parties of today - when the other side says something meaningful, they just ignore it.

So yeah, make VLIW great again!
Likes 2
Leave a comment:
cb88 replied

18 January 2021, 07:32 PM
Originally posted by vladpetric View Post

Transmeta's performance was lackluster (actually its competitor was P4, a mediocre core; and it didn't manage to make inroads against that). Code morphing - kinda' cool, though in the end not that helpful (20 years later, we have open as in free instruction sets anyway ...). VLIW - bad.

No it really was the Pentium 3.... which morphed into Pentium M and later conroe and core architecture, the P4 is the exact thing the Transmeta CPUs were good at combating since you could not fit a P4 in a tablet sized device. As far as the rest of the comment you clearly have no idea what you are talking about... most CPUs have to do a static form of code morphing anyway and but cannot do runtime optimizations because patents but that is coming

The Transmeta CPU is acutally an x86 CPU with a VLIW front end exposed all the hardware around the VLIW core including the registers is essentially an x86 CPU. The whole point was to get ride of hardware implementations of things that could be done in software *better* and with lower power use. Modern CPUs get around some of this by clock gating inactive silicon... another advantage of VLIW is you can implement fast paths for all the instructions at low cost... since implementing an instruction is the job of the software. On the contrary intel tends to force instructions to be slow to try to get people to stop using them.

Last edited by cb88; 18 January 2021, 07:35 PM.
Leave a comment:
vladpetric replied

18 January 2021, 06:21 PM
Originally posted by cb88 View Post

Not entirely true, a counter example is Transmeta (good transistor density to power and performance) and Nvidia's VLIW ARM CPUs which are quite fast. In fact I won't be the least but surprised to see VLIW with runtime embedded optimization resurface once the patents expire (imminently)

Transmeta's last CPU was roughly a competitor to a P3 and even had SSE3 (TM88xx chips) so it could run up to windows 10 etc... in theory if you had enough ram which granted is unlikely.

The kicker with Transmeta's code morphing was that it could optimism the code as it was running.. similar to how a java virtual machine etc... does except it can do it for any code.

Transmeta's performance was lackluster (actually its competitor was P4, a mediocre core; and it didn't manage to make inroads against that). Code morphing - kinda' cool, though in the end not that helpful (20 years later, we have open as in free instruction sets anyway ...). VLIW - bad.
Leave a comment:
cb88 replied

18 January 2021, 06:15 PM
Originally posted by Rallos Zek View Post

How so? I'll disagree just because all major GPU architectures have moved away from being VLIW based to being Simd/RISC based architectures because they saw that VILW sucks for GPU compute (i.e. CUDA/OpenCL) . And it's 2021 making a compiler work for VILW/EPIC is still as shit as it was 20 years ago.

Actually RDNA1/2 is has brought back some of the VLIW design.... and its likely that the same is true of CDNA. They are calling it Super SIMD.
Leave a comment:
cb88 replied

18 January 2021, 05:19 PM
Originally posted by jabl View Post

This. A VLIW style architecture might work well for a DSP where you can carefully tune the code for the exact workload, but for a general purpose architecture it's a massive failure. Compilers were never able to efficiently pack instructions into bundles for general purpose code, leading to lots of NOPS and thus wasted instruction bandwidth.

And once you go to OoO HW, that instruction encoding with bundles etc. is just a waste.

Not entirely true, a counter example is Transmeta (good transistor density to power and performance) and Nvidia's VLIW ARM CPUs which are quite fast. In fact I won't be the least but surprised to see VLIW with runtime embedded optimization resurface once the patents expire (imminently)

Transmeta's last CPU was roughly a competitor to a P3 and even had SSE3 (TM88xx chips) so it could run up to windows 10 etc... in theory if you had enough ram which granted is unlikely.

The kicker with Transmeta's code morphing was that it could optimism the code as it was running.. similar to how a java virtual machine etc... does except it can do it for any code.

Last edited by cb88; 18 January 2021, 05:23 PM.
Likes 1
Leave a comment:
Rallos Zek replied

18 January 2021, 04:59 PM
Originally posted by zexelon View Post

Taken as a whole, I would say Itanium was a very impressive engineering undertaking and in some ways it was quite successful. It was a market failure yes, but so are many architectures. I would not fully discount the tech in the Itanium. I am sure that in the probably not to distant future, pieces of it will be resurrected. Tech developed in GPU's is solving a lot of the issues with with Itanium. The one issue that can not be fixed is backwards compatibility.

Probably pieces of Itanium will eventually be unknowingly included in Xeon Phi or some future Intel "accessory" processor or GPU.

How so? I'll disagree just because all major GPU architectures have moved away from being VLIW based to being Simd/RISC based architectures because they saw that VILW sucks for GPU compute (i.e. CUDA/OpenCL) . And it's 2021 making a compiler work for VILW/EPIC is still as shit as it was 20 years ago.
Likes 2
Leave a comment:
Space Heater replied

18 January 2021, 04:30 PM
Originally posted by WorBlux View Post

Risc-V does follow with predicated instructions, and even x86 got CMOV that can let the compiler eliminate some branches in code.

Since when does RISC-V have any form of predicated move/select? They seem to be religiously against predication and claim that branch prediction is always better.
Likes 1
Leave a comment:
vladpetric replied

18 January 2021, 04:17 PM
Originally posted by jabl View Post

IIRC 32bit arm had lots of support for predication, but they got rid of almost all of it for aarch64, leaving only csel , roughly similar to cmov on x86

So while it has its uses for poorly predictable branches, "the more the merrier" probably isn't the answer either.

They're great for min/max . Beyond that ...
Leave a comment:
vladpetric replied

18 January 2021, 03:22 PM
Originally posted by WorBlux View Post

Sorry, my mistake, I got confused by what libreSoC was trying to do with it.

I think it's a good idea to add cmovs at least . Hope they succeed.
Leave a comment:

Announcement

Itanium IA-64 Was Busted In The Upstream, Default Linux Kernel Build The Past Month

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: