Announcement

**microcode** · 06 April 2022, 01:58 PM

It's too bad that they're so full of shit, because the technology is actually cool.

**brucethemoose** · 06 April 2022, 02:33 PM

Originally posted by sinepgib View Post

Why, exactly?

No shuffling data around over slow, power-inefficient buses like PCIe. And it eliminates redundancy, as all the accelerators in a core can share the same decoders, the same cache and so on. If the design uses heterogenous cores (that all still share an ISA), then it still makes sharing LLC and other resources easier.

On the software side, no more split code between different devices, and less software infrastructure to maintain. And I assume compiler optimization and language support would be easier/better when a whole app is targeting a single ISA.

**coder** · 06 April 2022, 03:17 PM

Originally posted by soulsource View Post

What I'm missing here are tech details on the hardware.

I have no idea what they're doing, but I'm imagining you could implement a variable-ISA CPU by having a sufficiently general backend, sitting behind a frontend that's implemented using FPGA-like configurability (although more specialized to the task of instruction decoding than a normal FPGA would be).

Of course, there's no way that'd be faster or more efficient than the fully hard-wired implementations it'd be competing with, so I doubt that's what it really is.

**coder** · 06 April 2022, 03:21 PM

Originally posted by timofonic View Post

current architecture mess is a total disaster,

You mean because x86?

Originally posted by timofonic View Post

prices skyrocketting,

That's a fab capacity problem, mostly. More players in the market wouldn't hurt, but to the extent they're just contending for pieces of the same size pie, it's not going to make a huge difference in pricing.

**coder** · 06 April 2022, 03:26 PM

Originally posted by microcode View Post

It's too bad that they're so full of shit, because the technology is actually cool.

What technology? I visited their website, but it doesn't really say how they do it. Did you look through patents or something?

**coder** · 06 April 2022, 03:42 PM

Originally posted by brucethemoose View Post

No shuffling data around over slow, power-inefficient buses like PCIe. And it eliminates redundancy, as all the accelerators in a core can share the same decoders, the same cache and so on. If the design uses heterogenous cores (that all still share an ISA), then it still makes sharing LLC and other resources easier.

This sounds better than it is. You talk about these things as if they're comparable, but unless you're using a really small deep learning model, the overhead of shipping your data over PCIe or CXL is negligible, by comparison with the time that inferencing takes. We're talking about microseconds vs. milliseconds, at least.

And when you're running a deep learning workload, it would thrash the hell out of your system-level cache, making it next to useless for CPU jobs. IMO, you really don't want these things in the same package. It's just a matter of cost/size/convenience for mobile platforms that they are.

Originally posted by brucethemoose View Post

On the software side, no more split code between different devices, and less software infrastructure to maintain. And I assume compiler optimization and language support would be easier/better when a whole app is targeting a single ISA.

Intel tried that with Xeon Phi, remember? It went so poorly they killed it and then spent the last 5+ years beefing up their iGPUs into what's now known as Ponte Vecchio (Xe-HPC).

Now, someone is probably going to point to Fujitsu's A64FX, which is a legit point. However, it's not great at truly general-purpose compute. It tries to be a hybrid between a CPU and GPU, but is as good at neither workload as the corresponding processor. Its main benefits are that you only need a single toolchain and it can tackle a broader scope of problems than a traditional GPU.

**soulsource** · 06 April 2022, 04:02 PM

Originally posted by coder View Post

I have no idea what they're doing, but I'm imagining you could implement a variable-ISA CPU by having a sufficiently general backend, sitting behind a frontend that's implemented using FPGA-like configurability (although more specialized to the task of instruction decoding than a normal FPGA would be).

Of course, there's no way that'd be faster or more efficient than the fully hard-wired implementations it'd be competing with, so I doubt that's what it really is.

I'm of course a bit biased here. Some years ago I worked in material research, on the topic of magnetic semiconductors to be precise. Those would have the potential to allow fast runtime switching of a logic gate's behaviour. If their tech was something in that direction, I'd be quite hyped.
(Not that I would actually expect that magnetic semiconductors will work at room temperature anytime soon or ever, but there might be other materials I am not aware of that might offer similar potential.)

**coder** · 06 April 2022, 04:36 PM

BTW, I'm not really sure what technology they used, but I'm reminded of a reconfigurable computing company called Stretch, Inc. One thing I've heard is that they're so fast at re-configuring their on-chip logic that they'd do it up to several times, in the course of compressing a single H.264 video frame. It seems they've been acquired.

stretchinc

https://www.maxlinear.com/stretchinc

**sinepgib** · 06 April 2022, 05:01 PM

Originally posted by coder View Post

BTW, I'm not really sure what technology they used, but I'm reminded of a reconfigurable computing company called Stretch, Inc. One thing I've heard is that they're so fast at re-configuring their on-chip logic that they'd do it up to several times, in the course of compressing a single H.264 video frame. It seems they've been acquired.

https://www.maxlinear.com/stretchinc

That sounds like a stretch.

**Ladis** · 06 April 2022, 07:00 PM

If you remember Transmetta/Efficeon (Linus Torvald worked on it, too), it was also a "morphing" CPU which had the same performance like Intel, but at a fraction of power consuption and smaller/cheaper chip. It used profiling and JIT re-compilation of code pages in RAM, so the final stage of optimization effectivelly fed in-order the VLIW architecture. The problem was, the CPU was slow at the ordinary tasks. It required a single task running "in cycle" to optimize it, e.g. playing a video (software decoded at the time), playing a game, ... But in random tasks (open .doc file in Word, run spell checker, load a page in the web browser - each page having different scripts, most running only on page load, ...) it was slow (unoptimized code using only fraction of units of VLIW). The next Intel CPU had the same low power consuption for notebooks, so Transmetta was not needed anymore. PS: An example, they showed, was running Quake 3 with 2 CPU ISAs used to render single frame. One x86 and one some "Java CPU" (e.g. Sun had CPUs with hardware accelerated Java) and it switched the ISA during rendering each frame.

Announcement

Tachyum Gets FreeBSD Running On Their Prodigy ISA Emulation Platform For AI / HPC

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment