Announcement

Collapse
No announcement yet.

Itanium IA-64 Was Busted In The Upstream, Default Linux Kernel Build The Past Month

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • WorBlux
    replied
    Originally posted by microcode View Post

    Yeah, the dual I$/symmetric basic block thing is super cool; I have very low expectations of Mill in practice though, it is absolutely perfect vaporware, no offense to the wise man.

    The thing that tells me it's not going to happen is that, as far as I can tell, they haven't shown a functioning demo of any form. Not a simulator, not an implementation on FPGA, and zero tapeouts to date.
    We'll see, I've been following it fairly closely. It's been slow, but I'm expecting another 4-5 patents* to drop this year along with some sort of announcement. It's definately still active, if not particularly visible. *At least one dealing w/ coherence, and one dealing with scalable vectors/streams.

    I understand the skepticism, but I wouldn't count them out quite yet.
    Last edited by WorBlux; 19 January 2021, 06:17 PM.

    Leave a comment:


  • microcode
    replied
    Originally posted by WorBlux View Post

    If you haven't looked at the Mill, I'd suggest you do. The answer is pretty weird. Elided no-ops, implicit destination register, model-specific entropy optimized binary encoding, a Split instruction stream, and dual I$. The bigger question is weather they'll ever get enough funding, and if their load solutions are enough to overcome cache nondeterminism.
    Yeah, the dual I$/symmetric basic block thing is super cool; I have very low expectations of Mill in practice though, it is absolutely perfect vaporware, no offense to the wise man.

    The thing that tells me it's not going to happen is that, as far as I can tell, they haven't shown a functioning demo of any form. Not a simulator, not an implementation on FPGA, and zero tapeouts to date.

    Leave a comment:


  • WorBlux
    replied
    Originally posted by microcode View Post
    In reality, intel had money out the wazoo at that time; if compiler engineers could figure out how to run general purpose code quickly on an in-order VLIW without mountains of NOP slots and I$ abuse, intel would have hired them.

    If there is any place in general purpose software for VLIW, it is as a supplement or assist to OoO, rather than a replacement for it.
    If you haven't looked at the Mill, I'd suggest you do. The answer is pretty weird. Elided no-ops, implicit destination register, model-specific entropy optimized binary encoding, a Split instruction stream, and dual I$. The bigger question is weather they'll ever get enough funding, and if their load solutions are enough to overcome cache nondeterminism.

    Leave a comment:


  • bofkentucky
    replied
    Originally posted by f0rmat View Post

    That is what I remember, too. What is fascinating to me is that at the time Intel introduced the Itanium, the vastly overwhelming majority of code out there was x86 with some 16 bit thrown in there. Why Intel thought that everybody would drop all of there x86 code and transfer to Itanium was a mystery to me at the time. Especially since at the time, AMD was providing some serious competition to them. Not only had they just introduced the first true 64 bit processor, they had also just recently had the first processor that broke the 1 GHz threshold.
    Itanium was targeted at the legacy UNIX vendors that had Alpha, MIPS, SPARC, PA-RISC, and POWER as live architectures at that point, with 64-bit capability already or fast on the roadmap. Now we're down to POWER and the patent pool that was Alpha ended up influencing the Athlon enough that AMD was able to squeeze out the toe-hold that was AMD64

    Leave a comment:


  • microcode
    replied
    In reality, intel had money out the wazoo at that time; if compiler engineers could figure out how to run general purpose code quickly on an in-order VLIW without mountains of NOP slots and I$ abuse, intel would have hired them.

    If there is any place in general purpose software for VLIW, it is as a supplement or assist to OoO, rather than a replacement for it.

    Leave a comment:


  • zexelon
    replied
    Nelson Well, I pretty much have to agree with the majority of your assessment here! I think you are spot on with your consideration of the energy failings of Itanium. In recent decades this has become a metric more important than raw performance for a large swath of applications and you are bang on about the rise of ARM being largely related to this.

    I dont think a mobile version of Itanium would have ever been a feasible proposition... not without basically a total redesign and I think the very idea at its core is probably not able to meet the power usage of say ARM.

    Leave a comment:


  • Nelson
    replied

    Originally posted by zexelon View Post

    I would suggest this assessment might be a bit unfair. The architecture is actually incredibly elegant, very well implemented (in hardware) and amazingly flexible for future development.

    It had several key failures though:
    1. ...
    2. Turns out its borderline impossible to write an effective compiler. The whole architecture turns much of commonly accepted computer engineering paradigms on their head... it moved all the scheduling, parallelism, and hardware complexity into the compiler... genius idea for the hardware engineers, it theoretically made it cheaper to produce. However it made the compiler severely more complicated to produce... and as several architectures in history have shown... the very best most amazing CPU turns out to be useless if you cant compile software for it!
    Personally I think point 1 may have been the key one. If they could have made the market excited about it and got more CPU designers and manufacturers on board, it would have spread the risk and development of the compilers would have perhaps progressed further!

    This is all bonus work for the concepts of the RISC-V group... maybe some day we will see an Unobtanium-V group
    I don't know that I'd describe it as elegant. It lacked bold leadership in design and had all of the hallmarks of design by committee. Intel's IA-32 was getting beat up by RISCish designs due to the lack of registers, IA-64 had 128 general registers and 128 floating point registers, it wasn't even clear that more than 32 that RISC chips generally had were needed but more is more. EPIC sounded neat too but it might not have been bold enough, it was 3 instructions wide and instruction fetchers at the time already starting to get pretty good at multiple dispatch.

    Around that time profile guided optimizations to prevent branch mis-predicts was sort of the hotness in the compiler world, Merced generally didn't have branch mispredicts because it could just take both branches and throw out the mis-predicted results. It drank power like nobody's business.

    Building a compiler for it wasn't super difficult, I think that is overblown. Compilers already were ordering and scheduling instructions loop unrolling was very popular already but it was a move in a different direction from where compilers had been going that didn't seem to offer up all that much. It must have been SGI's guys and maybe the IBM Toronto compiler group had already demonstrated solutions to some of the problems compilers had with RISC chips that people thought were too difficult to solve a decade earlier and then those RISC chips are more simple, more power efficient and Itanium never really matched the highend, let alone beat it. Since it was expensive and it had good but not great performance it never really got the compiler attention of other platforms, I think of it as more of a general tooling shortage. Take Java, I think BEA J-Rockit ran on Itanium but I don't remember the Sun JVM ever being really current, it was always built for IA-64 quite a bit later. Intel got the Cygnus guys to port stuff to it, but it was kind of abandoned since noone ever used it.

    Maybe the biggest failing was it wasn't an enterprise grade part at the time, it didn't have the RAS requirements that the big buys needed and only HP bought on. Without a substantial highend play, it was just a very expensive part that didn't bring enough performance to justify the cost. And it really sucked at running your existing software. The lesson, basically Gordon Moore invented it, was that you had to be cheaper and faster to really take over; cheaper *or* faster might win or compete but you really needed both. Just a couple years after Merced was released, power specific parts became a thing and so ARM has made progress due to focusing on the power envelope and by making cheaper and more efficient parts.


    Leave a comment:


  • Space Heater
    replied
    Originally posted by cb88 View Post

    GCN isnt' a single ISA anyway... its a design architecture really, each version of the GCN ISA is not binary compatible with the others in all cases.

    It's kind of pointless to discuss this from the ISA level as we don't know the internals, when the radeon GPU code leaked awhile back it turns out its wildly different internally than you would think. You can think of RNDA as VLIW2 with a hardware scheduler that handles the VLIW bit in hardware.
    The whole concept of VLIW is that it is done at the ISA level, the name "Very Long Instruction Word" provides a strong hint that it relates to the software-hardware interface. VLIW architectures punt the complexities of grouping multiple independent instructions to software, and they do this by exposing hardware details (e.g. bundle size, what permutations of instruction types are allowed in the same bundle etc.) in the ISA. Therefore with VLIW architectures the compiler is responsible for creating bundles of independent instructions, not the hardware. If the hardware is handling "the VLIW bit" then it's not VLIW-like at all, instead it's like a traditional superscalar processor.

    Leave a comment:


  • WorBlux
    replied
    Originally posted by Space Heater View Post

    Since when does RISC-V have any form of predicated move/select? They seem to be religiously against predication and claim that branch prediction is always better.
    Again my mistake. When LbreSOC was looking at proposing a risc-v vector extension, they considered overloading all the branch commands as predicates. I got confused for a minute.

    Leave a comment:


  • cb88
    replied
    Originally posted by Space Heater View Post

    Can you explain how this is like VLIW? I'm not seeing where RDNA is breaking with the GCN ISA and *requiring* that compilers now statically scheduling independent instruction pairs/bundles. If anything, this sounds like superscalar concepts being mapped to SIMD, where the hardware assumes responsibility for issuing multiple instructions and checking for hazards.
    GCN isnt' a single ISA anyway... its a design architecture really, each version of the GCN ISA is not binary compatible with the others in all cases.

    It's kind of pointless to discuss this from the ISA level as we don't know the internals, when the radeon GPU code leaked awhile back it turns out its wildly different internally than you would think. You can think of RNDA as VLIW2 with a hardware scheduler that handles the VLIW bit in hardware.

    Leave a comment:

Working...
X