NVIDIA Announces Grace CPU For ARM-Based AI/HPC Processor

Jumbotron replied

13 April 2021, 02:54 PM
Originally posted by 1250568

This is the first time, in a while, that I've seen that reference. Does anyone have a link to a clear description of "The Machine"? I'm not looking for marketing BS.

Perhaps this could serve as a start. More to come if I can find it.

Future Systems: How HP Will Adapt The Machine To HPC

https://www.nextplatform.com/2015/08/17/future-systems-how-hp-will-adapt-the-machine-to-hpc/

When Hewlett-Packard launched its moonshot effort to create a new computing architecture centered on non-volatile memory last year, called The Machine,
Leave a comment:
artivision replied

13 April 2021, 02:19 PM
Originally posted by TemplarGR View Post

No, i am not ignorant, i am just not a fanboi whose only knowledge about chip design comes from pop-tech sites.

1) Ghz are not about the nodes alone, they are as i said about the design principles. If the chip is very complicated it can never achieve 100% load at those clocks. Those ghz you mentioned are best case scenarios.

2) I want to see a link about that benchmark, to see the conditions of the test. Rise of the Tomb Raider is a very lightweight game that can be maxed on low end cpus/gpus. You need more games to reach a stronger conclusion. Especially in a video game which is mostly gpu bound

3) The GTX 1650 is a 12nm design, the Apple M1 is a 5nm design. That is a huge difference. You call me ignorant but i am the only one who is talking objective facts here, you are just an ignorant fanboy, and don't you dare call me ignorant again.

The 2020 Mac Mini Unleashed: Putting Apple Silicon M1 To The Test

https://www.anandtech.com/show/16252/mac-mini-apple-m1-tested/3

Ice Storm 10 watts pick load.
Likes 1
Leave a comment:
coder replied

13 April 2021, 01:04 PM
Originally posted by Jumbotron View Post

In this respect, we are approaching a time where HP's "The Machine" concept will be the prevailing design paradigm.

This is the first time, in a while, that I've seen that reference. Does anyone have a link to a clear description of "The Machine"? I'm not looking for marketing BS.
Likes 1
Leave a comment:
coder replied

13 April 2021, 12:57 PM
Originally posted by numacross View Post

The A1100 is not socketable, it's BGA

Thanks. I could swear I remember something about it sharing the same socket as their Opterons of the same era, but maybe it just shared the same chipset?
Leave a comment:
coder replied

13 April 2021, 12:54 PM
Originally posted by jabl View Post

Now, Intel largely controls the PCI SIG which develops the PCI standard, and they're in no hurry to develop it in a direction which would help GPGPU.

Yes, they did. It's called CXL.

Now that Intel is building datacenter GPUs & AI accelerators, they're highly-motivated to solve those problems.
Likes 1
Leave a comment:
coder replied

13 April 2021, 12:49 PM
Originally posted by oiaohm View Post

2030 that is quite a way out. You have missed what Risc-V is up to.
https://www.xda-developers.com/android-risc-v-port/

It's true. China is the big wild card, with Russia being a smaller one. I'm sure neither likes ARM's US ownership. They're each building MIPS, RISC V, and proprietary ISA CPUs.

And guess who makes most appliances and personal electronics? China. If China goes big on RISC V, then they can single-handedly turn the tide against ARM.
Leave a comment:
Jumbotron replied

13 April 2021, 12:44 PM
Originally posted by coder View Post

I think that's not what they meant. I imagine they had in mind that one OS kernel should be managing hybrid ISA CPU cores, which share a global pool of RAM. This would be an interesting project, but I'm not sure we really have anything like it, today.

I agree! One of the techniques Apple used to extract performance uplift was to get M1's RAM on the package and tightly linked to all its various cores, not just CPU and GPU. With the upcoming movement by Intel to start integrating RAM on the wafer itself and the continued use of HBM by Nvidia and AMD on GPUs one could see general RAM on the motherboard but linked by CXL or Infintity Architecture as a kind of memory pool. This pool would, as a matter of course, with CXL and Infinity Architecture, be part of a zero copy, cache coherent, heterogeneous compute environment.

I think that's part of what we will see with Nvidia's Grace SoC. Each Grace core could have an NVLink from SiP RAM to each of Grace's CPU's ancillary cores (DSP, NPU, DPU, etc,) straight to any and all Nvidia's integrated or discreet and external GPUs.

In this respect, we are approaching a time where HP's "The Machine" concept will be the prevailing design paradigm.
Leave a comment:
coder replied

13 April 2021, 12:43 PM
Originally posted by zxy_thf View Post

Actually this advantage is also not that clear, if we take (potential) vendor locked-in into consideration.
We may switch Xeon with Epyc and enjoy improved performance/dollar, but when we switched one ARM from another one, there is no guarantee that they share the shame extensions and have similar performance behavior.

ARM is very strict about licensees not adding their own instructions. So, in that respect, it's less susceptible to lock-in than x86. And if you adopt software that relies on a certain ISA level, you should only do so with the knowledge that you're restricting yourself to fewer CPUs.

RISC-V is the worst, though. Basically, a RISC-V CPU can add whatever the heck it wants!
Leave a comment:
coder replied

13 April 2021, 12:38 PM
Originally posted by TemplarGR View Post

Clock for clock performance is not telling the full story. And not running them higher may not be just about efficiency, but also simple stability. you can get great IPC in a processor but be unable to clock it high because of the design.

I think what you're dancing around is the critical path length of the circuitry. If a core is designed for lower clock speed, its critical path can be longer, which enables more complex pipeline stages and in turn benefits IPC. However, you can't then take that exact design and crank up the clock speed. Likewise, if a core is designed to clock higher, this will naturally come at the expense of some IPC.

In general, the argument for lower clock speeds is energy-efficiency, which is why mobile and server cores tend to clock lower than desktop CPUs. However, the designers can also take advantage of that to imbue them with more IPC, depending on the silicon & power budget.

ARM has one inherent benefit over x86, in that you can scale up its front-end wider, due to its fixed-length instruction encoding. So, if one is willing to devote the silicon necessary, it's not surprising to see ARM cores that exceed x86 in IPC. And this is independent of clock speed, in which case it should even be possible to build an ARM core that clocks comparable to x86 and offers more IPC. There's just not as much incentive for it, given that ARM can beat x86 in single-thread performance with higher IPC at lower clock speeds.
Leave a comment:
Jumbotron replied

13 April 2021, 12:31 PM
If you all want to learn more about ARM v9 and ARM's "Total Compute Vision" this should be a good start. Lots of video chats with ARM partners including a "fireside" chat with Microsoft exec.

Arm Vision

https://www.arm.com/campaigns/arm-vision

The latest updates to the Arm architecture are designed to deliver the power of specialized processing with the economics and accessibility of general-purpose computing.
Leave a comment:

Announcement

NVIDIA Announces Grace CPU For ARM-Based AI/HPC Processor

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: