Qualcomm Sampling 10nm 48-Core Server SoC

sykobee replied

08 December 2016, 08:34 AM
Qualcomm will have designed this processor in tandem with potential customers - Google, Amazon, Baidu, etc - so will be targeting their needs.

If you want in on the massively profitable server market, you work with what you have - Qualcomm have no x86 license, that's out of the door. So you create a presumably-high IPC ARM core design (Falkor), and you make use of other core competencies (or which Qualcomm has many).

Note that one rumoured aspect is on-die or on-package FPGA (Xilinx) as an option with this design. There may also be on-package HBM2 to deal with the memory bandwidth issue (at least for cached assets).

Falkor (the core) is likely to be used in future Windows products, now that MS has announced it's trying again, and doing it properly this time round.
Leave a comment:
Jabberwocky replied

08 December 2016, 08:33 AM
Originally posted by gnufreex View Post

Html in kernel? Torvalds wont like that.

Application, not operating system... (practiceliteracy thx bai)
Likes 1
Leave a comment:
wizard69 replied

08 December 2016, 08:30 AM
Originally posted by liam View Post

Assuming the bus isn't terribly designed, this lets you pay for the dram,nic(s),accelerators ONCE per 48 cores. In a best case scenario all 48 cores will be able to interleave their responses and only be responsible for 1/48 of the power budget. The worst case is only 1 core is active (HOPEFULLY the others are either hotplugged or in a very low C-state) while occasionally servicing requests and paying for all the other hardware that would otherwise be amortized.
If you want a specific application, qualcomm mentioned hadoop and spark. To me, that suggests rather low ipc (so, relying on stupidly parallel workoads and the new arm neon instructions (http://www.eetimes.com/document.asp?doc_id=1330339)

As far as the new vector instructions I have to think that Apple and probably Qualcomm are also on board. Apple was heavily involved in Alt-Vec development and would be very interested in bringing such performance to the iOS lineup. Qualcomm of course is already going after the server market and might take a stab at the PC market, both businesses could leverage a high performance vector capability.

In any event for us old guys what amazes me is that we basically have a Cray on a chip many times over. Enhanced vector capability just means even more software will run smoothly on these chips.

As for the limit on cores that is an interesting discussion because in the end ""It Depends". I remember some reported work by Intel that indicated that their architecture had problems going past 32 cores. Can't remember the specifics about work load but the point is you can optimize a processor for the type of work load you expect to run on it. Beyond that "cores" aren't really the issue, it is cache memory and RAM interfaces that bottleneck and get extremely hot (burn power). This is where innovation can still happen, the nice thing with ARM is that there is more free space per core on the die to allocate to cache and other support circuitry
Leave a comment:
gnufreex replied

08 December 2016, 06:52 AM
Originally posted by DMJC View Post

Who cares about X86 if your application is entirely written in HTML, and being run on Linux?

Html in kernel? Torvalds wont like that.
Leave a comment:
renox replied

08 December 2016, 05:53 AM
Originally posted by Brane215 View Post

x86 is power hungry and has horrific ISA format, which means higher code footprint, hungrier decoder unit and caches and extra complications with instruction translations.

Half of what you wrote is false: CISC ISA tend to have higher code density than RISC ISA.
Which is why ARM has Thumb/Thumb2 and MIPS has MIPS16 extension to be able to have the nearly the same code density as the x86.
Surprisingly ARMv8 doesn't have a 16 bit extension..
Leave a comment:
vadix replied

08 December 2016, 03:12 AM
Originally posted by Brane215 View Post

x86 is power hungry and has horrific ISA format, which means higher code footprint, hungrier decoder unit and caches and extra complications with instruction translations.
And even when all this is solved on technical level, you still end up with legal and licensing limitations. Intel'r only alternative is AMD and that's it.

ARM svcene is wide open to new players and by its nature it doesn't even insist on ARM. Whoever decided to recompile his/her code for ARM, know that there isn't much to stop him from doing it again for something completely different.

Also, now that applications are using multithreading more and more, single thread performanc eis not that essential any more, whijch means oprating in area, where ARM is much more comfortable - with higher count of more power efficient cores.

Also, Samsung, Qualcomm and the likes aren't that much behind Intel WRT to pure CPU muscle nor uncore material.

If nice,speedy 32 or 64-core ARM/MIPS/Power were available on xATX board, I wouldn't lose a nanosecond contemplating Zen.

Firstly, Xeons are designed to be morr power efficient, even if its not the same as an ARM, and secondly, while Intel core may have a more complicated pipeline, x86 programs are drfinitely smaller than ARM (even with thumb) on average. Less program size means less cache misses with the same size cache. Don't mix truth and BS together.
Leave a comment:
BillBroadley replied

08 December 2016, 01:14 AM
Originally posted by Brane215 View Post

x86 is power hungry and has horrific ISA format, which means higher code footprint, hungrier decoder unit and caches and extra complications with instruction translations.
And even when all this is solved on technical level, you still end up with legal and licensing limitations. Intel'r only alternative is AMD and that's it.

ARM svcene is wide open to new players and by its nature it doesn't even insist on ARM. Whoever decided to recompile his/her code for ARM, know that there isn't much to stop him from doing it again for something completely different.

Also, now that applications are using multithreading more and more, single thread performanc eis not that essential any more, whijch means oprating in area, where ARM is much more comfortable - with higher count of more power efficient cores.

Also, Samsung, Qualcomm and the likes aren't that much behind Intel WRT to pure CPU muscle nor uncore material.

If nice,speedy 32 or 64-core ARM/MIPS/Power were available on xATX board, I wouldn't lose a nanosecond contemplating Zen.

These days the x86 isa is mostly just a compatible binary format. It's NOT directly executed, or even cached. The decoder breaks it into microops which are RISC (fixed with, simple, etc). X86's on the inside are basically Out of Order risc cores. The microps are cached, speculatively executed, restired, etc. Sure there's a bit of extra complexity, but it's very minor. If you look at the transistor budget the slightly bigger decoder is a very minor issue. That's why no other architecture has significantly better than x86 complexity.
Leave a comment:
BillBroadley replied

08 December 2016, 12:37 AM
Originally posted by L_A_G View Post

Sorry, but I don't really see the point in a 48 core ARM chip.

The main point of ARM is good performance at low wattage, but with this many cores it's not going to be low wattage, which puts it squarely in the territory of Intel's Xeon and AMD's upcoming Zen-based Opteron chips. Additionally this number of cores really isn't all that useful for anything except for compute workloads, would would put it in the line of fire of Intel's Xeon Phi accelerators along with Nvidia and AMD's GPGPU products. I'd go as far as call this thing just a flat-out solution in search of a problem.

Arm's deal is best price/perf at the phone friendly power. If they can manage best price/perf at server power levels all the better. Many embarassingly parallel workloads at large companies like google or facebook could care less about node performance. They want best performance/(total cost of ownership). That includes things like power, cooling, purchase cost, maintenance cost, error rate, etc.

If a rack + 2 30 amp 208V 3phase PDUs + arm ends up delivering more performance per $ then I can see it being very popular. Intel most specializes in maximum performance per core.
Leave a comment:
boxie replied

08 December 2016, 12:06 AM
Michael pls get friendly with the Qualcomm people to get some hardware to benchmark! after all, this is the largest Linux news site - surely you would have some pull!
Leave a comment:
liam replied

07 December 2016, 11:29 PM
Also....

Last edited by liam; 09 December 2016, 01:06 AM.
Leave a comment:

Announcement

Qualcomm Sampling 10nm 48-Core Server SoC

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: