Announcement

Collapse
No announcement yet.

AMD Launches The Ryzen Threadripper 7000 Series: Up To 96 Cores, DDR5 RDIMMs, PRO & HEDT CPUs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by Hibbelharry View Post
    For consoles it should take a while before we see new generations and AMD has won Playstaton and Xbox multiple times now. PS5 Pro is on the horizon and will (according to leaks) use a Zen2+RDNA3 APU. Everything else is pure speculation.

    I personally don't see consoles going ARM anytime soon because I guess x86 has advantages in non mobile usecases and I don't imagine Sony and MS going the Nintendo way.
    There was a leak from earlier plans of next gen Xbox, where they would be deciding between Zen6 and ARM64, so at least MS are thinking about it, for the 2028 or 2029 timeframe iirc.

    Comment


    • #22
      My Threadripper 2950X has been such a reliable workhorse since 2018. I'm buying a 7970X the minute I am able to do so.

      Comment


      • #23
        Originally posted by stesmi View Post

        There was a leak from earlier plans of next gen Xbox, where they would be deciding between Zen6 and ARM64, so at least MS are thinking about it, for the 2028 or 2029 timeframe iirc.
        Yeah and it was also rumored to be made by Ngreedia and we know very well how happy Apple, Sony and MS are with the thought of being at their mercy again...

        Comment


        • #24
          Originally posted by bridgman View Post

          I think the 24 core parts have higher clocks... goal is to maximize performance in a given power envelope, not maximize efficiency at a variety of power levels. EPYC parts lean a bit more towards maximizing efficiency.
          That's what I thought at first too, but there's the same specified max boost with the middle model, and it says TDP, it's primarily heat dissipation and heat radiation requirements. TDP is usually specified as base or average, rarely max boost.

          Comment


          • #25
            Sure there's a variety of architectures / options for different loads -- servers with NUMA / many CPU sockets / blades / cores per CPU, GPUs with lots of SIMD and lots of cores and lots of VRAM bandwitdh, more basic systems with just one CPU socket and an IGPU, etc. etc.

            But it is undeniable that from the ML training / inferencing data center use cases to many HPC compute (FEA, image processing, chem / bio modeling, ...) use cases they're heavily using workstation as well as enterprise server class GPUs for what amounts to general purpose computing (GPGPU) whether that's SIMD matrix / vector / tensor stuff or even more general stuff that may simply be memory BW limited and which benefits from numerous cores and high RAM BW.

            We've seen the "prosumer" / pro desktop standard drop down from common 2-socket SMP boards to single socket ones, we've been stuck on 2-channel DIMMS of mostly non ECC motherboard chipsets / DIMMs in the mainstream "desktop" space for far too long.

            Consumers with higher end systems end up with 2-3 slot wide $1000-$2000+ GPUs that basically take up effectively ~ the entire available ATX chassis motherboard space,
            tax the PSU and power / thermals to the point where cables / connectors melt / ignite, and still they have only a small fraction of the GPGPU capability of the mainstream GPU
            workhorses of even "medium" level data center / server ML GPUs.

            On commercial servers we've still got, hmm (???) 3TBy/s peak VRAM BW on a H100 vs 0.8 TBy/s peak DDR5 BW for a single socket 4G EPYC with 12 DIMMs in some benchmark numbers that were the first to be found, so at least ~3x RAM BW advantage for the GPU where code can use some efficient streaming access pattern.
            When running that CPU in 4-DIMM DDR5 configuration the peak RAM BW is down under 0.26 TBy/s peak DDR5 BW.

            Whereas on the consumer/prosumer desktop GPUs we've got 1.0TBy/s peak VRAM BW for a 4090 and 0.28 TBy/s peak VRAM BW for a 4060 Ti, and 0.56 TBy/s on an A770-16.

            So ranging between consumer performance DT to mainstream server level stuff that's at the very least parity in VRAM BW / GPU vs DDR5 BW/socket and more likely closer to a 3x or 4x advantage in favor of the GPU without also considering latency / access patterns etc.etc.

            Looking at compute capability (MIPS/FLOPS and including data flow BW effects) for many use cases one might argue that the GPU is so far superior to what's in the CPU socket "next to" the GPU that the CPU and the CPU attached DDR5 DRAM is hardly even relevant for many (HW limited performance) things where the CPU/RAM does little work over a computation while almost all is done in the GPU/VRAM e.g. GPU accelerated AIML inferencing / training, FEA/FEM, graphics processing,...

            So my point is that one could probably in many relevant use cases refactor the system architecture so that a "CPU" could almost be a peripheral of the GPU if not handled BY the GPU (assuming a future modified SW / system design permitting that holistically), and moving more of the system's RAM channel / BW I/O resources to handle either literally VRAM or some different compromise between current DDR5 and GDDRx might not be a bad compromise to achieve more use case RELEVANT "heavy load" performance than what we often see today where so many things are GPU-bound / resident with in the most ridiculously ironic common cases the CPU/RAM sitting lightly loaded (games, GPGPU HPC, ML inference, ...).

            In any case the 2-channel not-typically ECC DT memory / socket design doesn't serve well when in practical cases even "mid-high range" gamers (e.g.) have to spend probably
            ~equal or MORE on their GPU than they do on their CPU / motherboard / RAM combined only to have the latter being lightly used under application load while frequently hitting game / app / GPGPU / ML inference performance limits really almost fully limited by how much GPU they can afford / install.

            If one is "only" going to support say 2-4 channel RAM interconnect on a DT / modest WS-HEDT then it seems silly to have even that much I/O / layout / cost dictated to
            interfacing modest amounts of "slow" RAM which doesn't even matter much for many applications coupled with slow PCIE x16 slots which are woefully slow / limited compared to even the DDR5 channels and which are so few in number / physical usability that in reality you're getting ONE big GPU installed and are lucky if you even HAVE use of any other PCIE x16 or x8 slot for anything else, and then there's the turtle slow 1-2.5Gb/s consumer / small business LAN "limit" when F/O 10Gb/s or something like that would be more reasonable for prosumer DT/modest WS anyway by this time.

            Just a miserable and unscalable set of physical ATX / ... design for power / thermal / slots / RAM channels / CPU socket / PCIE / GPU attachment overall.
            In what insane world is this mess not getting refactored / fixed when we've got "cheap" TFLOP/s / TBy/s computing available but it's NOT sanely integrable / integrated
            into desktop / HEDT / small server architectures and is really driving the cost / power / thermal / space envelope of realistic systems but is not commensurately
            driving the ARCHITECTURE and DESIGN of the systems holistically, physically, electrically, in SW, HW, ME, etc.

            Even storage attachment is ridiculous in ATX witness the M.2 slot HW designs / limits, then USB-C availability, thunderbolt / USB-4, etc. etc.
            The ATX / desktop installed moore's law capabilities are evolving surely but that's honestly IN SPITE OF the platform design's architecture / mechanical evolution,
            not BECAUSE of it. Having built many systems from parts it seems more and more like we're hitting a brick wall of architectural platform insanity
            while the sheer potential of the COTS technologies we have to make use of (GPUs, GDDRx, NN-core RISCV, fiber optics, ECC, high bandwidth fabrics / interconnects, modern digital power, ...) are amazing but vastly under appreciated / not fulfilling their potential because of sticking to decades old platform / socket / market segmentation / ISA / etc. constraints that really are not necessary or good in many ways.

            Originally posted by Hibbelharry View Post

            ‚ÄčI'm pretty sure those are not meant for standard consumer usecases. AMD is not trying to sell you a fancy gaming cpu here, Threadripper is a efficient upperclass workhorse.

            Thats simply not true. Their workstation class GPUs and ROCM are pretty solid these days, AMD just doesn't make any ground on consumer hardware grade GPUs.

            Whatever AMD does they need to be comparable in efficiency for most usecases. A CPU will never be able to do some specific tasks as fast as GPU hardware, so they won't try. You're trying to simplify matters in a obvious wrong way.

            Comment


            • #26
              I'm rejoicing that non-Pro now uses registered memory.

              As for the price - I want one of you suckers to buy one so that I can get a cheap, used one later

              Comment


              • #27
                Originally posted by guglovich View Post
                That's interesting. We buy 24 cores, but the consumption will be the same as 64. Did AMD decide not to cut the tracks?
                TDP is not consumption

                Comment


                • #28
                  Originally posted by NeoMorpheus View Post

                  Yeah and it was also rumored to be made by Ngreedia and we know very well how happy Apple, Sony and MS are with the thought of being at their mercy again...
                  Oh yes, I remember well how MS basically ran away from Nvidia after the first Xbox.

                  Comment


                  • #29
                    Originally posted by guglovich View Post

                    That's what I thought at first too, but there's the same specified max boost with the middle model, and it says TDP, it's primarily heat dissipation and heat radiation requirements. TDP is usually specified as base or average, rarely max boost.
                    However, the Max Boost is not Max All-Core Boost, so if you're idling 22 out of 24 or 30 out of 32 and full enchilada on 2 of them, the 24- and 32-core versions aren't going to behave any differently and boost the same. I expect, if we would look at the boost tables, that they will start differing the more cores are used.

                    Comment


                    • #30
                      Originally posted by guglovich View Post
                      That's what I thought at first too, but there's the same specified max boost with the middle model, and it says TDP, it's primarily heat dissipation and heat radiation requirements. TDP is usually specified as base or average, rarely max boost.
                      Sorry, what is "the middle model" ? Maybe we are looking at different slides, or maybe I am taking non-boost clocks into consideration while you are not. I don't think TDP is just "base or average"... power-limiting factors into boost as well (with the caveat that I'm on the GPU side and not up on the nuances of CPU power management).

                      If you are talking about 32 vs 24 core it's probably a mix of "yeah 24 core probably draws less power than 32 core but not enough difference to go through the exercise of qualifying a different TDP" and "the more efficient bins tend to get used in the higher core count SKUs".
                      Last edited by bridgman; 19 October 2023, 07:10 PM.
                      Test signature

                      Comment

                      Working...
                      X