Announcement

Collapse
No announcement yet.

NVIDIA Announces Grace CPU For ARM-Based AI/HPC Processor

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #61
    Originally posted by TemplarGR View Post
    Clock for clock performance is not telling the full story. And not running them higher may not be just about efficiency, but also simple stability. you can get great IPC in a processor but be unable to clock it high because of the design.
    I think what you're dancing around is the critical path length of the circuitry. If a core is designed for lower clock speed, its critical path can be longer, which enables more complex pipeline stages and in turn benefits IPC. However, you can't then take that exact design and crank up the clock speed. Likewise, if a core is designed to clock higher, this will naturally come at the expense of some IPC.

    In general, the argument for lower clock speeds is energy-efficiency, which is why mobile and server cores tend to clock lower than desktop CPUs. However, the designers can also take advantage of that to imbue them with more IPC, depending on the silicon & power budget.

    ARM has one inherent benefit over x86, in that you can scale up its front-end wider, due to its fixed-length instruction encoding. So, if one is willing to devote the silicon necessary, it's not surprising to see ARM cores that exceed x86 in IPC. And this is independent of clock speed, in which case it should even be possible to build an ARM core that clocks comparable to x86 and offers more IPC. There's just not as much incentive for it, given that ARM can beat x86 in single-thread performance with higher IPC at lower clock speeds.

    Comment


    • #62
      Originally posted by zxy_thf View Post
      Actually this advantage is also not that clear, if we take (potential) vendor locked-in into consideration.
      We may switch Xeon with Epyc and enjoy improved performance/dollar, but when we switched one ARM from another one, there is no guarantee that they share the shame extensions and have similar performance behavior.
      ARM is very strict about licensees not adding their own instructions. So, in that respect, it's less susceptible to lock-in than x86. And if you adopt software that relies on a certain ISA level, you should only do so with the knowledge that you're restricting yourself to fewer CPUs.

      RISC-V is the worst, though. Basically, a RISC-V CPU can add whatever the heck it wants!

      Comment


      • #63
        Originally posted by coder View Post
        I think that's not what they meant. I imagine they had in mind that one OS kernel should be managing hybrid ISA CPU cores, which share a global pool of RAM. This would be an interesting project, but I'm not sure we really have anything like it, today.
        I agree! One of the techniques Apple used to extract performance uplift was to get M1's RAM on the package and tightly linked to all its various cores, not just CPU and GPU. With the upcoming movement by Intel to start integrating RAM on the wafer itself and the continued use of HBM by Nvidia and AMD on GPUs one could see general RAM on the motherboard but linked by CXL or Infintity Architecture as a kind of memory pool. This pool would, as a matter of course, with CXL and Infinity Architecture, be part of a zero copy, cache coherent, heterogeneous compute environment.

        I think that's part of what we will see with Nvidia's Grace SoC. Each Grace core could have an NVLink from SiP RAM to each of Grace's CPU's ancillary cores (DSP, NPU, DPU, etc,) straight to any and all Nvidia's integrated or discreet and external GPUs.

        In this respect, we are approaching a time where HP's "The Machine" concept will be the prevailing design paradigm.

        Comment


        • #64
          Originally posted by oiaohm View Post
          2030 that is quite a way out. You have missed what Risc-V is up to.
          https://www.xda-developers.com/android-risc-v-port/
          It's true. China is the big wild card, with Russia being a smaller one. I'm sure neither likes ARM's US ownership. They're each building MIPS, RISC V, and proprietary ISA CPUs.

          And guess who makes most appliances and personal electronics? China. If China goes big on RISC V, then they can single-handedly turn the tide against ARM.

          Comment


          • #65
            Originally posted by jabl View Post
            Now, Intel largely controls the PCI SIG which develops the PCI standard, and they're in no hurry to develop it in a direction which would help GPGPU.
            Yes, they did. It's called CXL.

            Now that Intel is building datacenter GPUs & AI accelerators, they're highly-motivated to solve those problems.

            Comment


            • #66
              Originally posted by numacross View Post
              Thanks. I could swear I remember something about it sharing the same socket as their Opterons of the same era, but maybe it just shared the same chipset?

              Comment


              • #67
                Originally posted by Jumbotron View Post
                In this respect, we are approaching a time where HP's "The Machine" concept will be the prevailing design paradigm.
                This is the first time, in a while, that I've seen that reference. Does anyone have a link to a clear description of "The Machine"? I'm not looking for marketing BS.

                Comment


                • #68
                  Originally posted by TemplarGR View Post

                  No, i am not ignorant, i am just not a fanboi whose only knowledge about chip design comes from pop-tech sites.

                  1) Ghz are not about the nodes alone, they are as i said about the design principles. If the chip is very complicated it can never achieve 100% load at those clocks. Those ghz you mentioned are best case scenarios.

                  2) I want to see a link about that benchmark, to see the conditions of the test. Rise of the Tomb Raider is a very lightweight game that can be maxed on low end cpus/gpus. You need more games to reach a stronger conclusion. Especially in a video game which is mostly gpu bound

                  3) The GTX 1650 is a 12nm design, the Apple M1 is a 5nm design. That is a huge difference. You call me ignorant but i am the only one who is talking objective facts here, you are just an ignorant fanboy, and don't you dare call me ignorant again.
                  https://www.anandtech.com/show/16252...le-m1-tested/3

                  Ice Storm 10 watts pick load.

                  Comment


                  • #69
                    Originally posted by 1250568
                    This is the first time, in a while, that I've seen that reference. Does anyone have a link to a clear description of "The Machine"? I'm not looking for marketing BS.
                    Perhaps this could serve as a start. More to come if I can find it.

                    https://www.nextplatform.com/2015/08...achine-to-hpc/

                    Comment


                    • #70
                      Originally posted by coder View Post
                      This is the first time, in a while, that I've seen that reference. Does anyone have a link to a clear description of "The Machine"? I'm not looking for marketing BS.
                      Ahhh...here we go. Full rundown of HP's "The Machine" with pix, diagrams, etc. of all major components from the SoC, memory pools, data connections both copper and fiber optic, data and interface planes, rack sleds, pretty much a entire tear down of The Machine. Also, and I had forgotten this The Machine was based off of an undisclosed ARM SoC called the "Workload Processor". Also the interconnects inside and outside The Machine was Gen Z. Yeah...that Gen Z which will be working with the CXL coalition in tying up racks of CPUs, GPU, and external Memory pools in the next year or two.

                      https://www.nextplatform.com/2017/01...-architecture/

                      Comment

                      Working...
                      X