Announcement

Collapse
No announcement yet.

NVIDIA Announces Grace CPU For ARM-Based AI/HPC Processor

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • coder
    replied
    Originally posted by Jumbotron View Post
    The Age of ARM is here. x86 is legacy.
    It's slated to come online in 2023, and we don't even know the details.

    The funny thing is that the core CPU isn't really the impressive part of the announcement. I highly doubt this will even be the fastest ARM server CPU of its day. Its main job is really as a partner to the GPU, and scale out its memory. To that end, you could probably swap out the ARM cores with x86, POWER, etc. without much impact on overall performance.

    Leave a comment:


  • jabl
    replied
    The Nvidia Grace CPU isn't really about the x86-64 ISA vs the ARMv8 (or v9) ISA. It's about the business models of the x86 vendors vs. ARM.

    Problem is that many GPGPU workloads are starved for BW for feeding data to the GPU's, and further, cache coherent memory access would improve programmability of GPU applications. Now, Intel largely controls the PCI SIG which develops the PCI standard, and they're in no hurry to develop it in a direction which would help GPGPU. Hence things like NVLINK, CAPI, and whatnot. And further, AMD and Intel develop their server CPU's for the general purpose server market.

    So here's where ARM comes in. NVIDIA can "just" go and buy an off-the-shelf Neoverse core from ARM, and develop their own CPU with that core, adding high BW memory interfaces, and high BW and cache coherent NVLINK for connecting to the GPU's.

    Leave a comment:


  • AdrianBc
    replied
    Originally posted by vegabook View Post

    Jetson AGX Xavier is 699 USD with 8 fairly modern Denver cores, 512 CUDA cores, plenty of tensor cores, 32GB or RAM, nvme SSD capability, and PCIE expansion slot. Still got a few problems but it's getting mighty close. You can most definitely use this a full performance Linux desktop and the only thing you'll really be missing is gaming.

    EDIT: The cores are not Denver (as per TX2), they're its successor "Carmel".

    I have a Jetson AGX Xavier, so it has its uses.

    Nevertheless, for general-purpose applications it cannot compete with an x86 computer of the same price and the same power consumption.

    The NVIDIA Carmel cores have a speed intermediate between Cortex-A73 and Cortex-A75, so they are more than 4 times slower than a modern AMD Zen 3 or Intel Tiger Lake.


    Leave a comment:


  • TemplarGR
    replied
    Originally posted by artivision View Post

    You really are ignorant. The LPP node offers 3 subnodes: 5ghz full power, 4ghz 2.5x less power and 3.2ghz 2.5x extra less power. Second the Apple soc has tested running Rise of the Tomb Raider via Rosetta 2, it gave 70% vs many core x86 and gtx1650. That was with the gpu at 7w and the cpu at another 7w, with the Soc (including Ram) locked at 16.5w. When the gpu was unleashed at 10w matched the gtx1650. Also the 512bit Fused FP performance per core is enough and the Fujitsu one (100% Arm) utilizes the new instruction set with 2048bit FP per core. The thing goes like this: Arm will fuse the graphics instruction subset with even the heavy cores and bye bye everyone.
    No, i am not ignorant, i am just not a fanboi whose only knowledge about chip design comes from pop-tech sites.

    1) Ghz are not about the nodes alone, they are as i said about the design principles. If the chip is very complicated it can never achieve 100% load at those clocks. Those ghz you mentioned are best case scenarios.

    2) I want to see a link about that benchmark, to see the conditions of the test. Rise of the Tomb Raider is a very lightweight game that can be maxed on low end cpus/gpus. You need more games to reach a stronger conclusion. Especially in a video game which is mostly gpu bound

    3) The GTX 1650 is a 12nm design, the Apple M1 is a 5nm design. That is a huge difference. You call me ignorant but i am the only one who is talking objective facts here, you are just an ignorant fanboy, and don't you dare call me ignorant again.

    Leave a comment:


  • zxy_thf
    replied
    Originally posted by Qaridarium View Post

    thats not possible because vendor lock in only works if you have high marketshare.

    ARM any kind of does not have this marketshare right now.
    Apple's mac also doesn't have high markershare, and if you count the number of devices this also applies to iPhones (~10% iirc).

    IMO this is very likely to happen if ARM servers take the custom design business model -- just like game consoles, and we all know PS5 and XSX work very differently even if both are designed by AMD and using Zen2+RDNA-like
    Last edited by zxy_thf; 12 April 2021, 11:22 PM.

    Leave a comment:


  • artivision
    replied
    Originally posted by TemplarGR View Post

    Clock for clock performance is not telling the full story. And not running them higher may not be just about efficiency, but also simple stability. you can get great IPC in a processor but be unable to clock it high because of the design.

    For a clear example to illustrate, Pentium 4 (netburst) had lower IPC than AMD's Athlon (and Pentium 3) but could be clocked significantly higher because of its longer pipeline. Athlon had a shorter pipeline, so it was clocked lower. It wasn't for just "efficiency" reasons (after all, both Intel and AMD -FX9xxxx series- have proven they don't care about efficiency much), but also for stability reasons. At some point, you just get errors and instability if you clock a design for more than it can go.

    ARM is similar in the sense that it is designed for lower clocks and lower die sizes. The designs are not meant to be clock champions, and i am not even sure they could even reach stable clocks that high. And it lacks the SIMDs/FP performance too.
    You really are ignorant. The LPP node offers 3 subnodes: 5ghz full power, 4ghz 2.5x less power and 3.2ghz 2.5x extra less power. Second the Apple soc has tested running Rise of the Tomb Raider via Rosetta 2, it gave 70% vs many core x86 and gtx1650. That was with the gpu at 7w and the cpu at another 7w, with the Soc (including Ram) locked at 16.5w. When the gpu was unleashed at 10w matched the gtx1650. Also the 512bit Fused FP performance per core is enough and the Fujitsu one (100% Arm) utilizes the new instruction set with 2048bit FP per core. The thing goes like this: Arm will fuse the graphics instruction subset with even the heavy cores and bye bye everyone.

    Leave a comment:


  • oiaohm
    replied
    Originally posted by Jumbotron View Post
    The Age of ARM is ARRIVING for EVERY OTHER platform not mentioned above, like Supercomputers, HPC, AI, Edge, VR, AR, Auto both augmented and fully Self Driving. Also in Chromebooks, ALL APPLE products, increasingly in Microsoft products.

    Because the x86 Desktop PC is an increasingly marginalized platform in light of ARM based Smart Phones, Tablets and soon every SINGLE Apple Personal Compute product including Desktops, by 2030 more than 50% of all Personal Compute products both desktop and laptops Windows, Apple and Google combined will be ARM based.
    2030 that is quite a way out. You have missed what Risc-V is up to.
    https://www.xda-developers.com/android-risc-v-port/

    Yes the x86 Desktop PC is an increasingly marginalised platform. But we also have to remember we have big companies with a problem with x86 and arm.
    https://www.chinamoneynetwork.com/20...ab-new-markets
    Export/import of technology ban problems.

    It is possible that by 2030 the dominant mobile phone on the market is risc-v. Every market Arm is targeting so is Risc-V.

    Nvidia has acquired arm when there is quite a battle ahead. Arm really has increasing competition from Risc-V because of the USA government bans making non USA companies worried about if they will be able to keep on getting updates to the arm Licenses they have. This is forcing Arm/Nvidia to go after more USA based companies going forwards as they lose china and other country based companies if nothing changes.

    Leave a comment:


  • qarium
    replied
    Originally posted by zxy_thf View Post
    Actually this advantage is also not that clear, if we take (potential) vendor locked-in into consideration.
    We may switch Xeon with Epyc and enjoy improved performance/dollar, but when we switched one ARM from another one, there is no guarantee that they share the shame extensions and have similar performance behavior.
    Considered Apple's success from walled garden, I don't believe ARM's vendors won't want to lock you into their own production line.
    thats not possible because vendor lock in only works if you have high marketshare.

    ARM any kind of does not have this marketshare right now.

    Leave a comment:


  • Jumbotron
    replied
    Originally posted by schmidtbag View Post
    Ugh enough with this cringy tag line...
    OK...I admit I was wrong. Let me amend.

    The Age of ARM HAS BEEN HERE.

    For PDAs, then Smart Phones and Tablets.
    For IoT, Smart Devices, Watches, Sensors, Nerworking, etc.

    The Age of ARM is ARRIVING for EVERY OTHER platform not mentioned above, like Supercomputers, HPC, AI, Edge, VR, AR, Auto both augmented and fully Self Driving. Also in Chromebooks, ALL APPLE products, increasingly in Microsoft products.

    Because the x86 Desktop PC is an increasingly marginalized platform in light of ARM based Smart Phones, Tablets and soon every SINGLE Apple Personal Compute product including Desktops, by 2030 more than 50% of all Personal Compute products both desktop and laptops Windows, Apple and Google combined will be ARM based.

    Leave a comment:


  • zxy_thf
    replied
    Originally posted by piotrj3 View Post

    ARM doesn't give anything to you, except avoiding duopoly of AMD-Intel.
    Actually this advantage is also not that clear, if we take (potential) vendor locked-in into consideration.
    We may switch Xeon with Epyc and enjoy improved performance/dollar, but when we switched one ARM from another one, there is no guarantee that they share the shame extensions and have similar performance behavior.

    Considered Apple's success from walled garden, I don't believe ARM's vendors won't want to lock you into their own production line.

    Leave a comment:

Working...
X