Originally posted by coder
View Post
Announcement
Collapse
No announcement yet.
OpenBLAS 0.3.20 Adds Support For Russia's Elbrus E2000, Arm Neoverse N2/V1 CPUs
Collapse
X
-
Originally posted by coder View PostThis is a structure used for instruction scheduling, specifically when you're executing them out of order..
And not to get to distracted here, the evolution of my thought process here went from "surprised Elbrus havent done something similar to the new M1" to "hmmm these are quite similar" while some random other was insisting "long instruction words that contain multiple RISC instructions that run in parallel" is a terrible design that will never compete with x86.
Comment
-
Originally posted by mSparks View PostWell, at least we got past the M1 not being "long instruction words that contain multiple RISC instructions that run in parallel"
So, back to my earlier question.
In contrast, with VLIW the idea is to encode opportunities for parallelism directly in the ISA allowing the CPU to dispense with all this OoO machinery.
They call it "Ultra wide instruction arch". Doesn't mention that either.
- Likes 2
Comment
-
Originally posted by jabl View Post
In contrast, with VLIW the idea is to encode opportunities for parallelism directly in the ISA allowing the CPU to dispense with all this OoO machinery.
Or
They run in parallel and therefore have no order.
Reasonably sure there is no middle ground there.
Comment
-
Originally posted by mSparks View PostIsn't that the whole point of VLIW?
It's a bit hard to explain with words, but try to think of a VLIW instruction stream in terms of a 2D grid. The Y-axis would be time (i.e. instruction cycle), while the X-axis would be "slots". Each slot has certain restrictions on it, such as which types of execution units it can target.
Anyway, that's the basic idea. There's been lots written about it. VLIW is a basic category of CPU ISA that's been around for at least 40 years.
You can find several modern examples of it, here:
BTW, something else mentioned in this thread is EPIC, which is a term Intel introduced with IA64 and the Itanium. It's a hybrid between VLIW and RISC. @tux3v has suggested (some) ELBRUS processors might be EPIC, rather than VLIW.
Last edited by coder; 24 February 2022, 10:34 PM.
Comment
-
Originally posted by mSparks View PostEither they run in a sequence and therefore have an order,
Or
They run in parallel and therefore have no order.
Reasonably sure there is no middle ground there.
Comment
-
Originally posted by coder View PostDo soldiers marching in formation have an order? I'd say they do. They don't have a linear sequence, but they have an order you can express in 2 dimensions.
"Out of Order" literally means running them in parallel on the same clock cycles -> the ordering was removed - they were taken "out of order" because the ordering wasn't important, and there was spare execution units available to run them in parallel (and if there wasn't taking them out of order would have no benefit).
afaict VLIW makes it explicit and bundles instructions together where the order/sequence they are executed in is not important - in your diagram, all the opcodes in each instruction are run "out of order" , for example there is no left to right ordering - or no?Last edited by mSparks; 24 February 2022, 10:39 PM.
Comment
-
Originally posted by mSparks View PostNot without a general to put them in order.
In a VLIW compiler, the General is called the compiler. He tells the soldiers where to go, in advance. He knows enough about each soldier to guess which ones might get held up by certain obstacles, and arranges them accordingly.
Not a perfect analogy, but I tried.
: )
Originally posted by mSparks View Post"Out of Order" literally means running them in parallel on the same clock cycles -> the ordering was removed - they were taken "out of order" because the ordering wasn't important, and there was spare execution units available to run them in parallel (and if there wasn't taking them out of order would have no benefit).
If the CPU were in-order, then whatever comes after that memory read instruction would have to wait until the read completes, even if the next instruction didn't use the result of the read.
Once you understand how a CPU can reorder instructions serially, then it's a simple step to see how the same principles apply for dynamically assigning them to run in parallel multiple execution units.
Originally posted by mSparks View Postafaict VLIW makes it explicit and bundles instructions together where the order/sequence they are executed in is not important - in your diagram, all the opcodes in each instruction are run "out of order" , for example there is no left to right ordering - or no?
As long as the compiler doesn't break these dependencies, it can schedule the instructions into slots of those instruction words. This is the same thing an out-of-order CPU is trying to do, on-the-fly.
I'm not sure how clear that is. I'm no professor. Again, you can find a lot written about these subjects, if you're interested.Last edited by coder; 24 February 2022, 11:58 PM.
- Likes 2
Comment
-
Originally posted by coder View PostLook, I'm not trying to have a debate about pure VLIW vs OoO. I'm just trying to understand why you said VLIW scales poorly with frequency. If your point was just about memory latencies, then I simply wanted a confirmation that's what you were talking about.
In same scenario OoO will stall each "subtask" individually and for minimum required time each.
This is not true. VLIW has better power-efficiency, if you can keep it from stalling. That's by avoiding scheduler overhead.
So, for signal-processing applications that tend to have regular data access patterns,
it can be a significant win. There are lots of DSPs and AI chips that use VLIW. Older GPUs also did so, until they figured out that wide SIMD + SMT was a better solution (but still in-order!).
VLIW can't compete with OoO in general use single threaded performance, it can't compete with scalar SIMD with GPU-like tasks. There are no niche for them. There are at least same efficiency alternative solution for every task VLIW tried to solve.
Also, you're limited in your thinking. You only talk about classical VLIW, not EPIC. EPIC saves less runtime overhead than VLIW, but still allows for things like OoO and speculative execution. Compared with classical OoO, you save on having to detect data dependencies.
- Likes 1
Comment
-
Originally posted by jabl View Postand commits the results in the original program order.
For weak order they just have to ensure this write op performs after it was confirmed (speculative execution marks some ops like 'not sure it should be executed'). With different cache strings week ordered CPU (arm for example) can write second data to it's L1 cache while waiting for previous write cache synchronisation.
- Likes 1
Comment
Comment