Tachyum Gets FreeBSD Running On Their Prodigy ISA Emulation Platform For AI / HPC

coder replied

06 April 2022, 04:36 PM
BTW, I'm not really sure what technology they used, but I'm reminded of a reconfigurable computing company called Stretch, Inc. One thing I've heard is that they're so fast at re-configuring their on-chip logic that they'd do it up to several times, in the course of compressing a single H.264 video frame. It seems they've been acquired.

stretchinc

https://www.maxlinear.com/stretchinc
Likes 1
Leave a comment:
soulsource replied

06 April 2022, 04:02 PM
Originally posted by coder View Post

I have no idea what they're doing, but I'm imagining you could implement a variable-ISA CPU by having a sufficiently general backend, sitting behind a frontend that's implemented using FPGA-like configurability (although more specialized to the task of instruction decoding than a normal FPGA would be).

Of course, there's no way that'd be faster or more efficient than the fully hard-wired implementations it'd be competing with, so I doubt that's what it really is.

I'm of course a bit biased here. Some years ago I worked in material research, on the topic of magnetic semiconductors to be precise. Those would have the potential to allow fast runtime switching of a logic gate's behaviour. If their tech was something in that direction, I'd be quite hyped.
(Not that I would actually expect that magnetic semiconductors will work at room temperature anytime soon or ever, but there might be other materials I am not aware of that might offer similar potential.)
Likes 2
Leave a comment:
coder replied

06 April 2022, 03:42 PM
Originally posted by brucethemoose View Post

No shuffling data around over slow, power-inefficient buses like PCIe. And it eliminates redundancy, as all the accelerators in a core can share the same decoders, the same cache and so on. If the design uses heterogenous cores (that all still share an ISA), then it still makes sharing LLC and other resources easier.

This sounds better than it is. You talk about these things as if they're comparable, but unless you're using a really small deep learning model, the overhead of shipping your data over PCIe or CXL is negligible, by comparison with the time that inferencing takes. We're talking about microseconds vs. milliseconds, at least.

And when you're running a deep learning workload, it would thrash the hell out of your system-level cache, making it next to useless for CPU jobs. IMO, you really don't want these things in the same package. It's just a matter of cost/size/convenience for mobile platforms that they are.

Originally posted by brucethemoose View Post

On the software side, no more split code between different devices, and less software infrastructure to maintain. And I assume compiler optimization and language support would be easier/better when a whole app is targeting a single ISA.

Intel tried that with Xeon Phi, remember? It went so poorly they killed it and then spent the last 5+ years beefing up their iGPUs into what's now known as Ponte Vecchio (Xe-HPC).

Now, someone is probably going to point to Fujitsu's A64FX, which is a legit point. However, it's not great at truly general-purpose compute. It tries to be a hybrid between a CPU and GPU, but is as good at neither workload as the corresponding processor. Its main benefits are that you only need a single toolchain and it can tackle a broader scope of problems than a traditional GPU.
Likes 3
Leave a comment:
coder replied

06 April 2022, 03:26 PM
Originally posted by microcode View Post

It's too bad that they're so full of shit, because the technology is actually cool.

What technology? I visited their website, but it doesn't really say how they do it. Did you look through patents or something?
Leave a comment:
coder replied

06 April 2022, 03:21 PM
Originally posted by timofonic View Post

current architecture mess is a total disaster,

You mean because x86?

Originally posted by timofonic View Post

prices skyrocketting,

That's a fab capacity problem, mostly. More players in the market wouldn't hurt, but to the extent they're just contending for pieces of the same size pie, it's not going to make a huge difference in pricing.
Likes 1
Leave a comment:
coder replied

06 April 2022, 03:17 PM
Originally posted by soulsource View Post

What I'm missing here are tech details on the hardware.

I have no idea what they're doing, but I'm imagining you could implement a variable-ISA CPU by having a sufficiently general backend, sitting behind a frontend that's implemented using FPGA-like configurability (although more specialized to the task of instruction decoding than a normal FPGA would be).

Of course, there's no way that'd be faster or more efficient than the fully hard-wired implementations it'd be competing with, so I doubt that's what it really is.
Likes 1
Leave a comment:
brucethemoose replied

06 April 2022, 02:33 PM
Originally posted by sinepgib View Post

Why, exactly?

No shuffling data around over slow, power-inefficient buses like PCIe. And it eliminates redundancy, as all the accelerators in a core can share the same decoders, the same cache and so on. If the design uses heterogenous cores (that all still share an ISA), then it still makes sharing LLC and other resources easier.

On the software side, no more split code between different devices, and less software infrastructure to maintain. And I assume compiler optimization and language support would be easier/better when a whole app is targeting a single ISA.
Likes 1
Leave a comment:
microcode replied

06 April 2022, 01:58 PM
It's too bad that they're so full of shit, because the technology is actually cool.
Leave a comment:
sinepgib replied

06 April 2022, 01:07 PM
Originally posted by brucethemoose View Post

but it would be advantageous to have those various components as part of the same core, using the same ISA.

Why, exactly?
Likes 2
Leave a comment:
brucethemoose replied

06 April 2022, 12:22 PM
Originally posted by sinepgib View Post

It's dubious that you can have all three. Look at general purpose CPUs. You know why they don't excel at AI, gfx, HPC, etc, all of their uses? Because specialization leads to better thermal, silicon use, power consumption and ISA properties, that's why the current trend is adding more units. GPUs excel at what they do, which is transformation of floating point values in a highly parallel, dependency free (or close to) setting. The price is that they're completely useless to drive a system. You can't have your cake and eat it too, whoever says you do is either delusional or a liar.
Now, you can make things cheaper. If you assume you have specialized hardware you can simplify your general purpose CPUs, for example. For the tasks that do remain in your CPU, you could also probably make it somewhat faster. But 10x lower power consumption and 1/3 of the price while being faster? Nope, really unlikely to become true unless we talk about a maaaajor breakthrough. I guess they play with not needing as many chips for the 1/3 cost claim, but the scale only will probably make that assertion false.

There's definitely room for a more unified ISA like this is going for.

Right now you shuffle data to the CPU, or GPU, or tensor processor or whatever with its own instructions... but it would be advantageous to have those various components as part of the same core, using the same ISA.

AMD pushed this but kinda fumbled it, particularly in the AI space. Nvidia and Intel are going there as well (with Tegra/AMX), albeit very slowly, as they are heavily invested in their own discrete niches.
Leave a comment:

Announcement

Tachyum Gets FreeBSD Running On Their Prodigy ISA Emulation Platform For AI / HPC

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: