Announcement

**Jumbotron** · 23 April 2024, 08:57 AM

Originally posted by Terr-E View Post

Isn't the next one called "Celestial" ?

Yes, you are correct. Faulty memory on my part. Perhaps a cosmic ray flip a bit in my brain pan.

**Amaranth** · 23 April 2024, 09:05 AM

Originally posted by Jumbotron View Post

In fact, in order to play some AAA games in Linux you have to spoof the game into believing it’s NOT using an Intel Arc GPU. Pathetic.

They have to do this for Windows games running on Linux via WINE/Proton that see an Intel GPU and try to use XeSS which doesn't work on Linux. You can still blame Intel since they make XeSS and make the Linux driver but it doesn't have anything to do with the design or quality of the GPU.

**Weasel** · 23 April 2024, 09:53 AM

Originally posted by Jumbotron View Post

The vast majority of the world’s consumer GPUs are mobile / iGPU. Every Android phone has a mobile GPU. Every Android tablet has a mobile GPU. Every iOS device has a mobile GPU. Every MacOS device has a iGPU. The majority of Consumer x86 Windows desktops has an iGPU. Every Windows laptop has a mobile GPU. And every console, Microsoft X Box, Sony PlayStation, Nintendo Switch have an iGPU. Only a minority of consumer desktops have a discreet GPU of any sort. And PC gamers are a smaller minority.

And literally nobody gives a fuck because they do not matter, except when the company can't deliver competitiveness at the HIGH END.

Nvidia is like the 3rd richest company or something and mostly makes GPUs/TPUs only.

Stop using majority as an argument. What's that saying again? Ah yeah. Quality over quantity.

Same reason mass produced chinese garbage are irrelevant and still trash no matter how many broke ass specimens use them (I don't mean the good quality chinese products, which DO exist, I mean the low quality garbage). If I was CEO of a high quality Chinese company I'd feel terrible for people associating me with those trash.

**Jumbotron** · 23 April 2024, 12:12 PM

Originally posted by Weasel View Post

And literally nobody gives a fuck because they do not matter, except when the company can't deliver competitiveness at the HIGH END.

Nvidia is like the 3rd richest company or something and mostly makes GPUs/TPUs only.

Stop using majority as an argument. What's that saying again? Ah yeah. Quality over quantity.

Same reason mass produced chinese garbage are irrelevant and still trash no matter how many broke ass specimens use them (I don't mean the good quality chinese products, which DO exist, I mean the low quality garbage). If I was CEO of a high quality Chinese company I'd feel terrible for people associating me with those trash.

You are quite literally wrong and the world computer marketplace proves you wrong. If there were indeed a sustainable and recognizable market for discreet GPUs there would be sales that drove measurable market share. The data shows otherwise. The entirety of compute platforms being used by human beings outside the rarified market of HPC and hyper scalers demonstrably shows that the market share of compute platforms with discreet GPUs is a decided minority. This will only accelerate as component integration intensifies because, amongst other things, bringing high speed memory heterogeneously connected into either the SoC package or the actual core die itself or both speeds up operations to all cores along with the attendant power (compute) per watt advantages, not to mention BOM savings (Billable Order of Materials) for the OEMs who are already in a profit margin constrained environment concerning PC manufacturing. Sorry, Junior, but a LOT of people give a fuck about such things. Gamers who use discreet GPUs are a minority of a minority of global computing. You are nominally a large group. But when compared to the overall market share of Planetary Computing you all are a minority. That trend will continue as the years roll along as integrated components become as performant for less cost both initial cost of the compute platform and over time as it costs less to run a performant integrated system than powering a soon to be Kilowatt gaming GPU.

**rob-tech** · 23 April 2024, 01:06 PM

Yes, but will it be available in the form of a high end GPU, something that can compete with Nvidia xx80 and AMD X900 series. It would be nice to have more choice instead of only the two players and maybe the pricing would even drop.

**Jumbotron** · 23 April 2024, 08:09 PM

Originally posted by rob-tech View Post

Yes, but will it be available in the form of a high end GPU, something that can compete with Nvidia xx80 and AMD X900 series. It would be nice to have more choice instead of only the two players and maybe the pricing would even drop.

I agree it would be nice but unfortunately that’s entirely up to Intel to execute a lot better in a lot shorter amount of time than they ever have since 1982. And even then the market place has the last say. The only way Intel had any chance to even be competitive is to produce a discreet GPU in every segment, Budget, Mainstream, Enthusiast and Fanatic/Cutting Edge/Home AI builder, where each GPU that fits in each of these segments has three qualities ….

(1). 90% Mean performance as compared to the leading market segment model of both AMD and Nvidia.

(2). The comparative cost of each market segment Intel GPU is a minimum of 30% less at Retail than the segment leader from AMD and Nvidia. That should be 40% less once street pricing kicks in.

(3). Price parity with either segment leader AMD GPU or Nvidia GPU but only when the Intel GPU has double the on-board memory of either competitor’s GPU.

And then Intel needs to do these 2 additional things….

(4). Intel forces AAA game companies to have Day 1 bug free support for oneAPI. Intel needs to seed every AAA game company with Intel engineers in house side by side with the game designers with the highest end Intel computers with the highest end CPUs and GPUs.

(5). And finally, this may seem counterintuitive, but Intel needs to capture one of the two major console companies. Microsoft would be the easiest. Pry a future Xbox away from AMD by building from scratch an Intel console mother board complete with a 16 or 32 core all 64 bit CPU with on die HBM, a Celestial or better yet a Druid new architecture GPU beyond Xe also with on die HBM along with a huge NPU capable of over 100 TOPs baseline of AI inference along with at least 32GB of LPGDDR 6x RAM in the SoC package in addition to the in die HBM and finally a minimum of a 1TB to 2TB M2 SSD or the next gen version and tie the whole system together with heterogeneous CXL zero copy memory protocol. What does this have to do with a Discreet GPU ? Simply this. This hypothetical Intel only Xbox becomes the way thousands of developers come into the oneAPI world from CUDA or the proprietary schemes by Microsoft, AMD and Sony. Porting a game made with this hypothetical Intel Xbox would be beyond trivial in order for it to run on a PC with a discrete Intel GPU.

If intel fails at any of this then there will be no discreet Intel GPUs by the end of the decade. It will a two GPU show like it is now. Or like the choice we have between just Android or iOS for phones and tablets.

**pong** · 23 April 2024, 10:13 PM

Originally posted by Jumbotron View Post

Agreed, but I don’t think the question was if Intel was ever going to release Battlemage but in what form. Rumor on the street is that Battlemage will not be released in discreet form. At least not initially. From what I’ve seen of all of Intel’s vaporware slideshows and presentations of future GPU releases I haven’t seen one mock up of a discreet card . Contrast that with their slideware of upcoming and future CPU architectures. They show how the CPU will be laid out with different component and configurations. All these include some kind of iGPU of some kind of arch. Xe currently. Battlemage next. Then Druid which will supersede Battlemage. Each of these are shown as iGPUs only, never with a dGPU counterpart. Then, even more curiously is about the 2026 timeframe Intel is releasing the first clean sheet CPU design since the original Pentium Pro. Along with this will be an all new GPU arch superseding Xe/Battlemage/Druid. The entire die which will have new CPU cores, new iGPU cores, new NPU cores and quite possibly Habana Gaudi AI inference tech since Intel is closing that down and not producing anymore standalone Gaudi chips after Gaudi 3 which has just been released….this die looks REMARKABLY like an Apple Silicon chip, specifically the latest Apple Silicon M3. This new Intel SoC has at least two memory chips on package as well but it’s unknown what capacity. All of this leads me to believe that even if Battlemage is released eventually as a standalone discreet GPU that the days are numbered for Intel discreet GPUs. The world’s computer designs are all going integrated. Between the power of modern day CPUs and multiple cores at that, the power to watt efficiency that modern GPUs have not to mention that Apples M3 is desktop capable and competitive for all but the latest AAA games at 4K, plus the added capabilities of on package NPUs and FPGAs just means that the era of discreet components like GPUs is coming to an end for most people and compute use cases. This becomes even clearer once you tie all those components up with a truly heterogeneous memory protocol like CXL. Apple has had this since the original M1. It truly helps keep every component on an Apple Silicon chip fed with data efficiently and when needed at speed. It’s also crucial because every component on the Apple Silicon package feeds off System RAM. There’s no discreet memory for anything. Apple’s CXL like memory scheme is truly the first one for consumer products. It’s why they can be desktop performant in a laptop with no fans whatsoever. That is what I believe Intel is moving to for their consumer reference platforms by 2026. If Battlemage does indeed come out as a discreet GPU card it may very well be the last from Intel.

That's all very sensible. And in many ways one could argue for the architectural benefits of those kinds of changes (unified memory with high bandwidth available to ALL system processing functions -- CPU / NPU / DSP / GPU) are clear.

The discrete GPU as "one ring to rule them all" is a pathetic half-baked architecture anyway. We've got the status quo where GPUs are used for serious industrial HPC, supercomputing, AIML training & inference applications for the core processing; those aren't even GRAPHICAL applications. Why's it used? Well it has N times the memory bandwidth (TBys/s) as the "motherboard / CPU / RAM" alternative, AND it has O(10k) SIMD DSP/vector-ish ALU/processors designed for data-flow high bandwidth highly parallel processing. So actually NONE of that high bandwidth RAM / highly parallel computing stuff has A THING to do with graphics other than that being ANOTHER
computing application that can benefit from such. So the key take-away is the high bandwidth memory & highly parallel processing SHOULD be integral & unified to the basic processing and memory platform because lots of applications besides graphics benefit from it.

One might argue that it's just been monopolistic inertia of "x86 rules the world forever" that we haven't had architectural gradual evolution or innovation for mainstream computing -- you'll get from 1-24 cores or so and that's it; you'll get 1-2 memory channels only on consumer platforms, 4-12 per socket on server platforms and that's it,
and that's the way intel / amd kept things. When that utterly failed to be remotely sufficient for even basic graphical applications then others who WOULD dare scale the architectures along with Moore's law (ironic that's referring to Intel who WOULDN'T scale along with it architecturally!) took the baton and created the "peripheral" GPUs that were originally one-trick-ponies for 2D/3D graphics as PERIPHERAL to the "main system".

Fast forward to 2020s and the GPUs have VASTLY more BW & computing OP/s than the "main system" CPU / RAM, so, what then makes the "main CPU" and "main RAM" "main" anymore if they're too limited / slow to even satisfy a large part of HPC / DSP / graphics / ML applications.

In reality the x86 CPU / ARM CPU and DDR4/5 is like a minor PERIPHERAL to the computing & BW power of the GPU now.

So then rebalancing the "main" system to have 1000s of SIMD / ALU cores, NPU/TENSOR acceleration, TBy/s RAM access will help restore relevance and equitable sharing of expensive resources (RAM amount & BW access to it) among the various system processing / acceleration functions.

But what one could end up with is a big SOC/SOP (chiplets, whatever) in a single socket having graphics / video / DSP / CPU / NPU / shader processors / SRAM (cache) / RAM (DDRx, GDDRx, HBM, whatever) all integrated which is probably somewhat good for bandwidth / latency / motherboard cost reduction, but possibly bad for scalability and expansion and competition.

What happens if you don't just want 64 / 128 / whatever GBy of memory?

Do you have DIMM / LPCAMM / whatever sockets available so you can scale the RAM population up to 256-512 GBy, and at some reasonably good bandwidth?

Do they start to just hobble the expandability of external PCIE / CXL / whatever sockets & bandwidths even more
since "the SOP is all you need in one place, you don't need an external GPU / NPU / ...!", so whatever "sockets" you have are few and meager in bandwidth?

Apple (being apple) has already / always "walled gardened" its systems so there's ~zero user expansion options and ~zero choice to use 3rd party expansions in many cases AFAICT -- you order & buy the system you want with the exact amount of RAM, SSD(?) you want soldered in from the factory and that's it, you're done, no expansion DIMMs, M.2 sockets, GPUs, NPUs, etc.

Same for basically all smart phones now not even having (usually) any microSD slots at all.

And for SW development apple developer system lock-ins where everything costs money to do to develop and there's a huge laundry list of things you're not even allowed by policy or available documentation / API to develop.

So on the one hand Apple's got a great unified SOP architecture for high BW low energy use compute / ML, too bad there's nothing open or cost effective about it HW / SW.

Intel / AMD could head the same way. Google / samsung already have with tablets / smartphones.

So what's the non-dystopian / user-hostile / OSS-hostile evolution that's possible to actually "destroy" the painfully obsolete x86 / PC platform architecture and evolve it in a revolutionary / quantum jump way (decades overdue) BUT keep higher end desktop / SMB computing "open" wrt. architecture, OSS capability, multi-vendor market ecosystems, expandability, and scalability?

IMO regardless of how much DRAM / VRAM / SRAM is "in package" there MUST be (for the more fully featured "desktop" platforms, not so much tablet / laptop etc.) an ability to add-on open standard DDR DRAM modules available from a variety of vendors. To me this is just an evolution of L1 / L2 / L3 / L4 / ... memory hierarchy. Of course your latency decreases and BW increases the lower level you access, but capacity is limited, so you may opt for slower but more plentiful access to memory at a higher level.
So whether they offer 64G, 128G, ... on-package, we should have something that can scalably expand by N sockets of DRAM via N parallel channels of memory to
allow the freedom to have low-cost expansion up to NNN GBy range.

Then there's the system / inter-system level question of scalability -- right now the picture is utterly miserable with consumer systems having 1-2.5 Gb/s NICs as the only real "commonly available" choice and SMB / consumer HW having sometimes capability to add in expensive 10Gb-100Gb/s ethernet or similar NICs via PCIE. Not very high BW. Not very low latency. Not very low cost.

What would be nice to see is a way to meaningfully scale the compute of multiple systems / motherboards by some kind of "fabric" CXL-ish / evolved OcuLink / whatever so that over short (0.5 meter?) distances one can simply daisy-chain or ... link board to board with low cost maybe passive interconnects maybe low cost multi-vendor equalizer / driver / transceiver boards and achieve 10-100+ GBy/s or whatever expansion to more compatible system(s) / peripherals.

Using some kind of high-ish performance expansion fabric like CXL et. al. would then potentially have the ability to free us from the severe mechanical / electrical / case form factor restrictions of current PCIE slots where there's never enough mechanical space for things like GPUs which need substantial power/cooling, never enough slots due to limited SOC / chipset lane counts, never enough slots due to trying to cram CPUs / GPUs / coolers into ATX chassis form factors that were designed race-to-the-bottom in 1990 and haven't evolved since then.

If you want a 4-node system, great, buy a chassis that can take four motherboards, and a pair of PSUs, link the fabric, and away you go.

If you want to use some other kind of add-on HBW peripheral, maybe a 16-SSD hot swap NVME RAID chassis no problem, put it in whatever kind of external chassis makes sense for your unit, plug it into the CXL / whatever link chain, and there you go.

But regardless of HOW it is done we're seeing smartphones / tablets / laptops / workstations (apple) and increasingly even SMB / enthusiast desktop PCs become
Tivoized "appliances" and are fundamentally sacrificing OSS capability (increasingly closed systems), user autonomy (you don't have ROOT & full sysadmin & dev capability & enablement for your samsung / google / apple / ... ), and SCALABILITY (PCIE expansion is already a joke; ATX/PC platform is a joke; consumer multi-socket MBs almost non existent; more and more systems with soldered-on "peripherals" and no expansion).

What are we / the industry going to do to preserve the traditional "freedom" and multi-vendor openness & architectural openness of the SMB "workstation" / "pro desktop" space so everything doesn't end up looking like a single-vendor-unique apple-ish closed ecosystem zero competition zero cost control set of walled off monopolies?

We need the BW and compute performance that next-generation GPUs should have to be refactored into a holistic system mechanical / electrical / form factor / architectural unity but at the same time we shouldn't sacrifice every aspect of openness the PC platform has provided (as poor as the rewards have been wrt. x86 / intel / microsoft dominance) in the evolutionary process.

Also wrt. GPUs and ML it's very unlikely a single "socket" system is going to end up being enough for a lot of applications. We need to be able to compose systems limited
not just by how many GPUs you can squeeze into a tiny number of PCIE slots on a single motherboard, but a way to scale N motherboards / chassis open-systems so we can actually buy HW in 2026, modularly add-on more in 2027, add another unit in 2028, etc. and have high enough BW / synergy that we can scale even ML / graphics / HPC stuff at the personal / SMB levels not limited by SOP capacity, motherboard size, enclosure capacity, etc.

**Jumbotron** · 23 April 2024, 11:50 PM

Originally posted by pong View Post

That's all very sensible. And in many ways one could argue for the architectural benefits of those kinds of changes (unified memory with high bandwidth available to ALL system processing functions -- CPU / NPU / DSP / GPU) are clear.

The discrete GPU as "one ring to rule them all" is a pathetic half-baked architecture anyway. We've got the status quo where GPUs are used for serious industrial HPC, supercomputing, AIML training & inference applications for the core processing; those aren't even GRAPHICAL applications. Why's it used? Well it has N times the memory bandwidth (TBys/s) as the "motherboard / CPU / RAM" alternative, AND it has O(10k) SIMD DSP/vector-ish ALU/processors designed for data-flow high bandwidth highly parallel processing. So actually NONE of that high bandwidth RAM / highly parallel computing stuff has A THING to do with graphics other than that being ANOTHER
computing application that can benefit from such. So the key take-away is the high bandwidth memory & highly parallel processing SHOULD be integral & unified to the basic processing and memory platform because lots of applications besides graphics benefit from it.

One might argue that it's just been monopolistic inertia of "x86 rules the world forever" that we haven't had architectural gradual evolution or innovation for mainstream computing -- you'll get from 1-24 cores or so and that's it; you'll get 1-2 memory channels only on consumer platforms, 4-12 per socket on server platforms and that's it,
and that's the way intel / amd kept things. When that utterly failed to be remotely sufficient for even basic graphical applications then others who WOULD dare scale the architectures along with Moore's law (ironic that's referring to Intel who WOULDN'T scale along with it architecturally!) took the baton and created the "peripheral" GPUs that were originally one-trick-ponies for 2D/3D graphics as PERIPHERAL to the "main system".

Fast forward to 2020s and the GPUs have VASTLY more BW & computing OP/s than the "main system" CPU / RAM, so, what then makes the "main CPU" and "main RAM" "main" anymore if they're too limited / slow to even satisfy a large part of HPC / DSP / graphics / ML applications.

In reality the x86 CPU / ARM CPU and DDR4/5 is like a minor PERIPHERAL to the computing & BW power of the GPU now.

So then rebalancing the "main" system to have 1000s of SIMD / ALU cores, NPU/TENSOR acceleration, TBy/s RAM access will help restore relevance and equitable sharing of expensive resources (RAM amount & BW access to it) among the various system processing / acceleration functions.

But what one could end up with is a big SOC/SOP (chiplets, whatever) in a single socket having graphics / video / DSP / CPU / NPU / shader processors / SRAM (cache) / RAM (DDRx, GDDRx, HBM, whatever) all integrated which is probably somewhat good for bandwidth / latency / motherboard cost reduction, but possibly bad for scalability and expansion and competition.

What happens if you don't just want 64 / 128 / whatever GBy of memory?

Do you have DIMM / LPCAMM / whatever sockets available so you can scale the RAM population up to 256-512 GBy, and at some reasonably good bandwidth?

Do they start to just hobble the expandability of external PCIE / CXL / whatever sockets & bandwidths even more
since "the SOP is all you need in one place, you don't need an external GPU / NPU / ...!", so whatever "sockets" you have are few and meager in bandwidth?

Apple (being apple) has already / always "walled gardened" its systems so there's ~zero user expansion options and ~zero choice to use 3rd party expansions in many cases AFAICT -- you order & buy the system you want with the exact amount of RAM, SSD(?) you want soldered in from the factory and that's it, you're done, no expansion DIMMs, M.2 sockets, GPUs, NPUs, etc.

Same for basically all smart phones now not even having (usually) any microSD slots at all.

And for SW development apple developer system lock-ins where everything costs money to do to develop and there's a huge laundry list of things you're not even allowed by policy or available documentation / API to develop.

So on the one hand Apple's got a great unified SOP architecture for high BW low energy use compute / ML, too bad there's nothing open or cost effective about it HW / SW.

Intel / AMD could head the same way. Google / samsung already have with tablets / smartphones.

So what's the non-dystopian / user-hostile / OSS-hostile evolution that's possible to actually "destroy" the painfully obsolete x86 / PC platform architecture and evolve it in a revolutionary / quantum jump way (decades overdue) BUT keep higher end desktop / SMB computing "open" wrt. architecture, OSS capability, multi-vendor market ecosystems, expandability, and scalability?

IMO regardless of how much DRAM / VRAM / SRAM is "in package" there MUST be (for the more fully featured "desktop" platforms, not so much tablet / laptop etc.) an ability to add-on open standard DDR DRAM modules available from a variety of vendors. To me this is just an evolution of L1 / L2 / L3 / L4 / ... memory hierarchy. Of course your latency decreases and BW increases the lower level you access, but capacity is limited, so you may opt for slower but more plentiful access to memory at a higher level.
So whether they offer 64G, 128G, ... on-package, we should have something that can scalably expand by N sockets of DRAM via N parallel channels of memory to
allow the freedom to have low-cost expansion up to NNN GBy range.

Then there's the system / inter-system level question of scalability -- right now the picture is utterly miserable with consumer systems having 1-2.5 Gb/s NICs as the only real "commonly available" choice and SMB / consumer HW having sometimes capability to add in expensive 10Gb-100Gb/s ethernet or similar NICs via PCIE. Not very high BW. Not very low latency. Not very low cost.

What would be nice to see is a way to meaningfully scale the compute of multiple systems / motherboards by some kind of "fabric" CXL-ish / evolved OcuLink / whatever so that over short (0.5 meter?) distances one can simply daisy-chain or ... link board to board with low cost maybe passive interconnects maybe low cost multi-vendor equalizer / driver / transceiver boards and achieve 10-100+ GBy/s or whatever expansion to more compatible system(s) / peripherals.

Using some kind of high-ish performance expansion fabric like CXL et. al. would then potentially have the ability to free us from the severe mechanical / electrical / case form factor restrictions of current PCIE slots where there's never enough mechanical space for things like GPUs which need substantial power/cooling, never enough slots due to limited SOC / chipset lane counts, never enough slots due to trying to cram CPUs / GPUs / coolers into ATX chassis form factors that were designed race-to-the-bottom in 1990 and haven't evolved since then.

If you want a 4-node system, great, buy a chassis that can take four motherboards, and a pair of PSUs, link the fabric, and away you go.

If you want to use some other kind of add-on HBW peripheral, maybe a 16-SSD hot swap NVME RAID chassis no problem, put it in whatever kind of external chassis makes sense for your unit, plug it into the CXL / whatever link chain, and there you go.

But regardless of HOW it is done we're seeing smartphones / tablets / laptops / workstations (apple) and increasingly even SMB / enthusiast desktop PCs become
Tivoized "appliances" and are fundamentally sacrificing OSS capability (increasingly closed systems), user autonomy (you don't have ROOT & full sysadmin & dev capability & enablement for your samsung / google / apple / ... ), and SCALABILITY (PCIE expansion is already a joke; ATX/PC platform is a joke; consumer multi-socket MBs almost non existent; more and more systems with soldered-on "peripherals" and no expansion).

What are we / the industry going to do to preserve the traditional "freedom" and multi-vendor openness & architectural openness of the SMB "workstation" / "pro desktop" space so everything doesn't end up looking like a single-vendor-unique apple-ish closed ecosystem zero competition zero cost control set of walled off monopolies?

We need the BW and compute performance that next-generation GPUs should have to be refactored into a holistic system mechanical / electrical / form factor / architectural unity but at the same time we shouldn't sacrifice every aspect of openness the PC platform has provided (as poor as the rewards have been wrt. x86 / intel / microsoft dominance) in the evolutionary process.

Also wrt. GPUs and ML it's very unlikely a single "socket" system is going to end up being enough for a lot of applications. We need to be able to compose systems limited
not just by how many GPUs you can squeeze into a tiny number of PCIE slots on a single motherboard, but a way to scale N motherboards / chassis open-systems so we can actually buy HW in 2026, modularly add-on more in 2027, add another unit in 2028, etc. and have high enough BW / synergy that we can scale even ML / graphics / HPC stuff at the personal / SMB levels not limited by SOP capacity, motherboard size, enclosure capacity, etc.

Holy Mother of Pearl Jam !! That was a deep run ! It’s late where I am so this first read will have to suffice for now. But your post deserves a deeper read and multiple ones at that. But I will say this . I had a short thread discussion with the editor of The Next Platform which is a sister publication of The Register. It was over an article about a new type of interconnect and interposer tech but it got me thinking about the chiplet trend and the end of monolithic CPU and APU dies. I half joked about how we could see by around 2030 if the size of the chiplet package continues to grow from the addition of various cores and memory modules all being put on package that we could see an enthusiast PC have a chiplet package nearly the size of the original Cerebras Wafer Scale Chip. And he somewhat agreed but maybe more like the size of today’s EPYC CPU or Threadripper. He thought it could also be the case when I speculated that at the rate that chiplet package size is growing we could see a Raspberry Pi where the entire board is nothing but the SoC with enough space on the board around the periphery for external connections. Taking this to the extreme and keeping in mind a future version of CXL I can see the emergence or re-emergence of the old Blade Computing Paradigm but instead of servers one could build a personal blade PC. Since everything is integrated and there are no slots for dGPUs or even memory one could just rack in another small board containing a huge chiplet package. And the boards themselves are all connected by CXL and each core on both boards are all connected by CXL. Taken to a further extreme one could envision the comeback of Beowulf clusters. You have, say , 4 chassis each with 4 blades and each blade has a chiplet package with 32 CPU cores spread across 4 dies with dozens of GPU cores, NPUs, FPGAs, DSPs all connected by CXL but now all 4 chassis are connected by CXL so you have in effect 16 blades all talking to one another heterogeneously along with all their core parts with all the various memory and SSD storage all seen as one huge storage pool. That could be the enthusiast PC setup of the future.

Announcement

Intel Enabling Linux Driver Display Support For Upcoming "Battlemage" GPUs

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment