Originally posted by L_A_G
View Post
Announcement
Collapse
No announcement yet.
Qualcomm Sampling 10nm 48-Core Server SoC
Collapse
X
-
-
Originally posted by L_A_G View Post
Sure, we don't have the exact details on this, but Xeon Phi didn't just rely on loads of cores with beefy vector instruction units, they also had really heavy SMT (or Hyper Threading as Intel likes to call it) to maximize the utilization of those vector instruction units. I've never heard of anyone creating an ARM core with SMT, so if they've done that this could basically be Xeon Phi knockoff with ARM cores rather than Atom cores. If this is the case then this thing may have a point for compute loads, but I'm not so sure if it's all that great of a thing seeing how the Xeon Phi failed to sell all that well despite how hard Intel tried to push them.
But more generally, ARM as a corporation is against SMT and will not support it for its designs. SMT is not a sign of strength, it is a sign of weakness. It says that your core is so expensive in area that you need to add even more complexity to maximize the value; ARM believes that their cores are small enough that if you want more throughput just add more of them. SMT has never delivered the performance naive users expect, largely because the single most constrained resource on a CPU is the L1 caches, and SMT halves their effective size. So Intel SMT gives you about the equivalent of a 25% speed boost. ARM's answer would be --- if you want 5 CPU's worth of performance, stick 5 CPUs on the die rather than 4 with SMT.
If you look at who supports SMT (Intel, Oracle, IBM) it's hard not to conclude that its there mainly as a workaround to stupid SW licensing rules. Since ARM (at least right now...) isn't running software of that sort, it doesn't need the workaround.
So why did Broadcom add it to Vulcan? No idea. Vulcan came out of a networking team, and there may be something about network processors (packet classification and deep inspection, that sort of thing) that makes SMT valuable in that context and its flaws (giving each thread a much smaller effective L1) much less problematic, because caches aren't too useful anyway with network processing?
Comment
-
Originally posted by name99 View PostSeriously? Are you living in the 90s? Are you unaware of Apple's cores which have been running on top-of-the-line nodes for years and are anything but simple. Hell, even if you hate Apple in your bones, the high-end Android cores are likewise built on top-of-the-line nodes are are hardly trivial. ARM designs more than just M0's you know.
Originally posted by name99 View Post...
Comment
-
Originally posted by name99 View Post...
I'm not saying that it's a bad design that can't compete with other chips in the HPC market, what I am saying that it's basically doing something that has more or less been done already so it's not really anything to get super excited about.
Comment
-
Originally posted by name99 View Post
First the whole POINT of the instructions (which are NOT usefully thought of as an extension of neon) is that they can be implemented in arbitrary width. You can implement the instructions on hardware that's, say, 128 bits wide then TRANSPARENTLY upgrade next year to 256 bit hardware, and the year after that to 512 bit hardware. That's precisely why they are not "Neon-like" (or SSE/AVX-like).
Second don't be sure you know what Apple wants from vector instructions. Apple ships not two but THREE Neon hardware pipelines in their chips (starting, I think, with the A9, I think the A8 and A7 had two pipelines). Clearly they think this is hardware worth spending transistors on, which means they believe they support a number of workloads that can benefit from substantial vector performance but which ALSO are too small to be shipped over to the GPU.
and yes, that is the maximum, not the minimum
I realise that the new Wisconsin spec is designed to scale, as I said, UPTO 2048, but I had a few things in mind: 1) if 128 is enough (and, you're right, we don't really know if it is) it's cheaper to stick with neon, 2) we don't know how much silicon one of these units will require, but unless you are able to use them in place of a neon unit you are starting to looking at a pretty large increase to your silicon budget (don't forget this thing might need its own scheduler and it will certainly need a huge issue buffer)
I said they might be interested but that cost would be a factor. Keep in mind that sip/pop intended for mobile networking often include dsps which can be multi-purposed.
Btw, it appears that it was cyclone/a7 where apple moved to the 3 unit neon (http://www.anandtech.com/show/7910/a...cture-detailed).
Comment
-
Originally posted by name99 View Post
This may be technically true but it's irrelevant. The future is AArch64. Apple has been sending strong messages for years now to developers that shipping 32-bit iOS code is unacceptable. (iOS10 now puts up a scary warning when you use a 32-bit app that doesn't exactly say "this app is a PoS and probably malware" but strongly implies it.) I would not be at all surprised if the A11 is the core where Apple drops AArch32 support.
Already some server chips don't even bother shipping AArch32. and for all I know, QC is one of them.Last edited by darkblu; 21 January 2017, 08:22 AM.
Comment
Comment