Qualcomm Sampling 10nm 48-Core Server SoC

liam replied

07 December 2016, 11:20 PM
Originally posted by L_A_G View Post

Sorry, but I don't really see the point in a 48 core ARM chip.

The main point of ARM is good performance at low wattage, but with this many cores it's not going to be low wattage, which puts it squarely in the territory of Intel's Xeon and AMD's upcoming Zen-based Opteron chips. Additionally this number of cores really isn't all that useful for anything except for compute workloads, would would put it in the line of fire of Intel's Xeon Phi accelerators along with Nvidia and AMD's GPGPU products. I'd go as far as call this thing just a flat-out solution in search of a problem.

Assuming the bus isn't terribly designed, this lets you pay for the dram,nic(s),accelerators ONCE per 48 cores. In a best case scenario all 48 cores will be able to interleave their responses and only be responsible for 1/48 of the power budget. The worst case is only 1 core is active (HOPEFULLY the others are either hotplugged or in a very low C-state) while occasionally servicing requests and paying for all the other hardware that would otherwise be amortized.
If you want a specific application, qualcomm mentioned hadoop and spark. To me, that suggests rather low ipc (so, relying on stupidly parallel workoads and the new arm neon instructions (http://www.eetimes.com/document.asp?doc_id=1330339)
Leave a comment:
liam replied

07 December 2016, 11:11 PM
Originally posted by Brane215 View Post

x86 <snip>
has horrific ISA format
<snip>

Could you elaborate, please?
I know this used to be an issue but now ARM also decodes to ucode (gosh, since....cortex a9, or so).
Leave a comment:
Happy Heyoka replied

07 December 2016, 09:24 PM
Originally posted by Spacefish View Post

So to feed 48 Cores you will probably need a very large RAM and a very very fast memory bus.. Otherwise they need to do some segmenting where each core has it´s private memory lanes and the application is aware of which core connects to which memory segment!

It's a bit hard to be specific about this CPU (Centriq 2400) since Qualcomm haven't really released much detail that I can find - but all designs have some kind of tradeoff (pin count, die size, whatever). Maybe Qualcomm have a specific application category in mind... maybe they're just late to the party... hard to tell without more detail.

What you're talking about gets complicated rapidly... in your theoretical "ideal" CPU where each core has it's own memory you either need (in this case) 48 memory controllers and 48 memory modules or 48 lots of on-core memory. And then what happens if one core needs more memory than is attached?
Have a look at the bus topology of a multiprocessor Xeon system or a Knights Landing (Xeon Phi). Things get more complicated from there (crossbar memory etc).

This is a good read:
NUMA (Non-Uniform Memory Access): An Overview
http://queue.acm.org/detail.cfm?id=2513149
Leave a comment:
LinuxID10T replied

07 December 2016, 07:18 PM
Originally posted by L_A_G View Post

I really wouldn't say that any of that is correct... In terms of software there really isn't anything that beats x86 when comes to ecosystem and licensing is really only an issue for companies who want to make their own chips, which is something relatively few companies in the server space actually want to do.

Best NON-X86 ecosystem, sheesh. Also, apparently Qualcomm wants in...
Likes 1
Leave a comment:
Spacefish replied

07 December 2016, 07:12 PM
I think the benchmark of Apache Spark points into the direction where this will be used.. Very large batch jobs which can run in parallel, but don´t need the high single thread performance! Running a map-reduce Job on 100th of cores makes sense!
However Spark is horrible in performance.. as it´s a java application with a lot of overhead.. It´s a quantom leap away from old hadoop map-reduce jobs but it´s still slow!
The memory footprint is horrible as well.. So to feed 48 Cores you will probably need a very large RAM and a very very fast memory bus.. Otherwise they need to do some segmenting where each core has it´s private memory lanes and the application is aware of which core connects to which memory segment!
Leave a comment:
Brane215 replied

07 December 2016, 06:33 PM
x86 is power hungry and has horrific ISA format, which means higher code footprint, hungrier decoder unit and caches and extra complications with instruction translations.
And even when all this is solved on technical level, you still end up with legal and licensing limitations. Intel'r only alternative is AMD and that's it.

ARM svcene is wide open to new players and by its nature it doesn't even insist on ARM. Whoever decided to recompile his/her code for ARM, know that there isn't much to stop him from doing it again for something completely different.

Also, now that applications are using multithreading more and more, single thread performanc eis not that essential any more, whijch means oprating in area, where ARM is much more comfortable - with higher count of more power efficient cores.

Also, Samsung, Qualcomm and the likes aren't that much behind Intel WRT to pure CPU muscle nor uncore material.

If nice,speedy 32 or 64-core ARM/MIPS/Power were available on xATX board, I wouldn't lose a nanosecond contemplating Zen.
Likes 2
Leave a comment:
DMJC replied

07 December 2016, 06:01 PM
Who cares about X86 if your application is entirely written in HTML, and being run on Linux?
Likes 3
Leave a comment:
thelongdivider replied

07 December 2016, 05:47 PM
This makes a lot of sense. Power efficiency is the name of the game for servers, and someone as big as Qualcomm may be able to fight against our Intel overlords.
Likes 1
Leave a comment:
L_A_G replied

07 December 2016, 05:43 PM
Originally posted by LinuxID10T View Post

Not entirely. Due to X86 licensing, pretty much any new player to the server CPU market would have to use ARM or some alternative architecture. Seeing as ARM has the most robust ecosystem, it isn't surprising that is what they would choose.

I really wouldn't say that any of that is correct... In terms of software there really isn't anything that beats x86 when comes to ecosystem and licensing is really only an issue for companies who want to make their own chips, which is something relatively few companies in the server space actually want to do.

Originally posted by droidhacker View Post

So when the workload is very low, shut off 47 of the 48 cores and down clock the remaining core to 500 MHz. When its prime time and you have a billion requests ever minute, crank them up. Seems to me that it is more useful to be able to run on extremely low power when the demands are low, and yet still be able to crank it when the demands are high.

And yes, high number of cores is useful for ALL workloads that can be distributed over a high number of cores. Apache, for example.

The problem with that is that when all 48 cores are tapped out from serving requests, which is going to require a LOT of accesses to RAM and disc, that's going to cause a huge bottleneck in memory and specially disc access is going to be absolutely swamped. Because of this, a chip like this is never simply going to be able to be fully utilised in a non-compute role.
Leave a comment:
droidhacker replied

07 December 2016, 03:41 PM
Originally posted by L_A_G View Post

Sorry, but I don't really see the point in a 48 core ARM chip.

The main point of ARM is good performance at low wattage, but with this many cores it's not going to be low wattage, which puts it squarely in the territory of Intel's Xeon and AMD's upcoming Zen-based Opteron chips. Additionally this number of cores really isn't all that useful for anything except for compute workloads, would would put it in the line of fire of Intel's Xeon Phi accelerators along with Nvidia and AMD's GPGPU products. I'd go as far as call this thing just a flat-out solution in search of a problem.

So when the workload is very low, shut off 47 of the 48 cores and down clock the remaining core to 500 MHz. When its prime time and you have a billion requests ever minute, crank them up. Seems to me that it is more useful to be able to run on extremely low power when the demands are low, and yet still be able to crank it when the demands are high.

And yes, high number of cores is useful for ALL workloads that can be distributed over a high number of cores. Apache, for example.
Likes 2
Leave a comment:

Announcement

Qualcomm Sampling 10nm 48-Core Server SoC

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: