Originally posted by coder
View Post
Announcement
Collapse
No announcement yet.
GCC 10 Compiler Drops IBM Cell Broadband Engine SPU Support
Collapse
X
-
Originally posted by torsionbar28 View PostThis has been the trend lately in the Linux kernel world. Linux used to be a choice OS for re-purposing old hardware. Not so much any more. Probably NetBSD is the only OS left that targets all kinds of old and obscure hardware.
Originally posted by torsionbar28 View PostPersonally, I think the kernel and toolchain folks should keep support for old hardware
Comment
-
Originally posted by coderSo, are any indy devs still making any PS3 games or demos? I don't know if the PS3 will ever this sort of vintage, but devs are still working with far more obscure hardware:
Comment
-
It's ok if newer versions of GCC don't have Cell support as long as there's a place to collect any patches made for pre-9.x compilers. If you want to do any development on the PS3, you're going to be stuck on something circa Ubuntu 10.04 anyway. There could be a "historical" Linux distro which focuses on things like this maybe. If you throw plain-text patches and a Gentoo ebuild on github, anybody will be able to rebuild it with a single command and add more patches. Is it any easier with other distros?
In a naive way, you could say both the Cell and Itanic asked a bit too much from compiler writers, who now-a-days have no problems writing all the compilers which make up Tensorflow. The Cell was the first HSA platform which was a moderate mainstream success because Sony stuck with it, and the amount of development ultimately done for it is nothing to sneeze at. Really, I'm looking forward to more HSA architectures with things like the aforementioned SYCL.
Comment
-
Originally posted by audir8 View PostIn a naive way, you could say both the Cell and Itanic asked a bit too much from compiler writers, who now-a-days have no problems writing all the compilers which make up Tensorflow.
Originally posted by audir8 View PostThe Cell was the first HSA platform which was a moderate mainstream success because Sony stuck with it, and the amount of development ultimately done for it is nothing to sneeze at. Really, I'm looking forward to more HSA architectures with things like the aforementioned SYCL.
But, the thing is that I don't even know how meaningful that is, when the SPEs have to use DMA to even touch main memory. Other than avoiding the need to lock memory pages, I don't know how much you even gain by it.
Comment
-
Originally posted by coder View PostReally? The SPEs are just 2-way, in-order, with a few k of local RAM that should have fairly low, consistent latency. That's a pretty far-cry from anything VLIW-like. And, at the time, VLIW chips & their compilers had already been around for decades. I believe AMD (if not also Nvidia) GPUs were even VLIW-based, around then
https://en.wikipedia.org/wiki/TeraSc...e)#TeraScale_1
Very interesting. I had no idea.
But, the thing is that I don't even know how meaningful that is, when the SPEs have to use DMA to even touch main memory. Other than avoiding the need to lock memory pages, I don't know how much you even gain by it.
I think better abstractions at every level matter a lot, they do eventually lead to more speed, more correct programs, more optimizations, and lower development time.
- Likes 1
Comment
-
Originally posted by coder View PostBut, the thing is that I don't even know how meaningful that is, when the SPEs have to use DMA to even touch main memory. Other than avoiding the need to lock memory pages, I don't know how much you even gain by it.
- Likes 1
Comment
-
Originally posted by LoveRPi View PostLocking is one of the areas that create huge bottlenecks in large scale systems. DMA and explicit synchronization was by design. Think about cache coherency problems from locking on 64+ core systems. On these large scale systems, you have to explicitly synchronize your workload to optimize for performance of your application so Cell just enforced this from a design perspective.
I also found this in the Cell IBM redbook on page 73: http://www.redbooks.ibm.com/redbooks/pdfs/sg247575.pdf
3.7.4 Multi-SPE software cache
We want to define a large software cache that gathers LS [256KB SPE Local Storage] space from multiple
participating SPEs.
Forces
We want to push the software cache a bit further by allowing data to be cached
not necessarily in the SPE that encounters a “miss” but also in the LS of another
SPE. The idea is to exploit the high EIB bandwidth.
Solution
We do not have a solution for this yet. The first step is to look at the cache
coherency protocols (MESI, MOESI, and MESIF)8 that are in use today on
multiprocessor systems and try to adapt them to the Cell/B.E. system.
I actually do remember reading about this when the Cell came out, and people treating the SPEs as individual processors was common because doing any synchronization was so hard. CS has come a long way since the Cell, and having hardware cache coherency probably is necessary in a few workloads, but you are still free to do coarse or fine-grained locking, or use lock-free algorithms/data structures built with atomics as needed.
Java's LongAdder reduces contention by making several copies of a variable, and processors providing hardware cache coherency between cores are no different than a distributed system providing Consistency and Availability from the CAP theorem. If hardware cache coherency is too slow, you can move towards a more lock-free solution, but at least you'll have something working. The more work that the hardware and compiler can do, the better IMO.
Comment
Comment