Announcement

**coder** · 08 December 2023, 11:55 PM

Originally posted by ms178 View Post

Let's wait and see what Strix Halo can achieve late next year, but conceptionally Apple does something similar with the M1 for multiple generations already.

Strix Point is set to provide about 45 TOPS of performance (3x of Hawk Point). That's respectable, but merely brings it into the same ballpark as Qualcomm's Snapdragon X will supposedly be.

With the slated improvement, Hawk Point's NPU should be roughly equal to Apple's.

Originally posted by ms178 View Post

Samsung proposed some neat new in-memory-processing tech,

Not sure how much uptake they're getting on that. Might be just a tech demo, in effect. Nvidia decided they'd rather have their own silicon in the stack, and SK Hynix appears ready to accommodate that.

https://www.tomshardware.com/news/sk...gic-processors

Should be another datacenter-grade thing, but perhaps hybrid stacks of logic and DRAM will eventually reach consumers products.

Originally posted by ms178 View Post

From a strategic perspecive, Nvidia is very successfull with a common base architecture design with added features on the HPC/enterprise part. Intel will follow that model to save development costs after realizing that custom-tailored solutions for each segment are hard and costly. It would also allow AMD more flexibility to address the consumer and enterprise segment with less chips.

Nvidia, AMD, and Intel all have completely different GPU-like architectures for their high-end server parts than their consumer GPUs. With Falcon Shores, Intel looks set to continue down that path.

In other words, they're continuing to move away from a unified architecture, not towards it!

Originally posted by ms178 View Post

It's no secret that the RDNA/CDNA split has hurt AMD's software side very much, too. RDNA support in ROCM is quite telling and RDNA 3's driver problems are also something which doesn't fit to their wanna-be premium brand status - maybe due to a lack of ressources as they needed to bring up CDNA at the same time?

That's on AMD. Intel and Nvidia seem to manage it, just fine. I've personally run the same Intel software stack on both the little Xe iGPU in an Alder Lake as on a Ponte Vecchio cloud GPU.

Ultimately, it's better for them to tune each hardware product line to its intended use and just provide a unified software interface across them all.

Originally posted by ms178 View Post

Nvidia's top-to-bottomn approach with CUDA shows how it's done properly

Yeah, but that doesn't support your above point about hardware unification. That's entirely through the magic of software that you can run the same CUDA apps on all of their hardware.

Originally posted by ms178 View Post

and what Intel wants to extend with oneAPI to multiple devices even from different vendors.

Again, you seem to be contradicting yourself. You're arguing AMD should re-unify their server and client GPU architectures, while pointing to oneAPI's alleged cross-vendor support?

**ms178** · 09 December 2023, 07:42 AM

Originally posted by coder View Post

Nvidia, AMD, and Intel all have completely different GPU-like architectures for their high-end server parts than their consumer GPUs. With Falcon Shores, Intel looks set to continue down that path.

In other words, they're continuing to move away from a unified architecture, not towards it!

The product definition of Falcon Shores changed a lot over the last year. At first it was planned as a multi-tile GPU/FPGA-hybrid, now it is GPU-only. It looks very much like they will use a Xe-derived architecture for its first generation. I think you've missed the point about the advantages of having tiles/chiplets in the first place, it is all about reducing costs, e.g. to re-use existing IP over different product lines. Eventually both AMD and Intel will scale up the amounts of GPU tiles to scale performance up in a linear fashion. Then they can serve different markets with the same graphics IP, just put more tiles on the enterprise cards. While they could develop different GPU tiles to adress the needs for different segments, this approach would diminish the cost savings quite a bit as they would need to design and verify different tiles with less usage of each across the product stack.

Originally posted by coder View Post

Yeah, but that doesn't support your above point about hardware unification. That's entirely through the magic of software that you can run the same CUDA apps on all of their hardware.

It might be "just" a manpower problem with AMD, but as there are some major difference in hardware capabilities, e.g. RDNA lacked some preemption features that were implemented in Vega-derived products, you might end up with just enough differences in hardware that you cannot provide a single compute stack for each hardware generation over multiple CPU-ISAs or operating systems.

Originally posted by coder View Post

Again, you seem to be contradicting yourself. You're arguing AMD should re-unify their server and client GPU architectures, while pointing to oneAPI's alleged cross-vendor support?

I cannot see a contradiction? Lobbying AMD for doing something Intel will do in a couple of years seems to be the way forward if you follow a tile/chiplet-based approach. And possibly AMD has something along that line of thinking already in the works (it is known that they have their chiplet engineers working on the GPU side for years now). Intel already manages to support not only their own hardware on their compute stack while ROCM is very much more limited in what hardware generations are supported and even dropped Vega support lately, that is not how you build customer confidence in your software stack. AMD might achieve more with the manpower they have if they only needed to support a single GPU architecture serving all markets.

**user556** · 09 December 2023, 08:12 AM

Oooh, the IO die is flipped, facing down, and is back to back against the compute dies! The rising and dropping through-silicon-vias are of similar ilk where they meet at the bonding line. That's quite tidy.

**coder** · 09 December 2023, 09:29 AM

Originally posted by ms178 View Post

The product definition of Falcon Shores changed a lot over the last year. At first it was planned as a multi-tile GPU/FPGA-hybrid, now it is GPU-only.

Yes, it's just the GPU portion of their original planned hybrid.

Originally posted by ms178 View Post

It looks very much like they will use a Xe-derived architecture for its first generation.

Xe-derived doesn't mean it's the same as the consumer GPUs. Nvidia does the same thing with its sever GPUs, where the microarchitecture of the execution units shares a lot of commonality with their client GPUs, but they're not the same.

Similarly, Intel cannot use consumer GPU dies in Falcon Shores, even if it wanted to, because their client GPUs don't have any fp64 hardware and HPC users need a lot.

Originally posted by ms178 View Post

I think you've missed the point about the advantages of having tiles/chiplets in the first place, it is all about reducing costs, e.g. to re-use existing IP over different product lines.

You're confused. AMD is the only one doing this, and they only do it for their CPUs, not GPUs. We'll see how long even that keeps up. Note that AMD is not using their Zen 4C tiles outside of EPYC CPUs. Perhaps the next generation will see even more fractures between their client and server CPUs.

As for Intel, their use of tiles in current and future CPUs is simply to increase yield. Their server cores & tiles are very different from the client ones.

Originally posted by ms178 View Post

Eventually both AMD and Intel will scale up the amounts of GPU tiles to scale performance up in a linear fashion.

So far, the only example we have of anyone distributing rendering across multiple dies is Apple. In spite of their claims, their M-series SoCs cannot match the rendering performance of top-end dGPUs. AMD said they looked at it and found the energy overhead was too big, due to the amount of data movement involved.

Originally posted by ms178 View Post

as there are some major difference in hardware capabilities, e.g. RDNA lacked some preemption features that were implemented in Vega-derived products, you might end up with just enough differences in hardware that you cannot provide a single compute stack for each hardware generation over multiple CPU-ISAs or operating systems.

Oh, but Intel's oneAPI and CUDA can? Please.

Some flat-earth theories make more sense than that. Yes, you can have a mostly-shared software stack for both. They've already demonstrated this.

The lack of attention to consumer platforms is just because they bit off more than they could chew by deciding they were going to implement a CUDA clone (i.e. HIP). However, CUDA is not a static target. So, that's an ongoing effort, as is porting it to their new hardware & optimizing their domain-specific libraries that mimic various CUDA runtimes. Then, they have to maintain their forks of various user-level libraries and frameworks.

Nvidia has a much bigger team and a decade-long head start on this stuff. I'm not making excuses for AMD, but the problem is obvious and there's not much they can do but continue to grind through it. They recently opened a new office in India, where I'd imagine they're already staffing up to help such efforts.

The one thing they're not going to do is walk back their hardware strategy, because they spent like 15 years trying to address both markets with the same architecture and they know very well the downsides of that approach. You get a more expensive solution that's less adept at either workload.

Originally posted by ms178 View Post

I cannot see a contradiction? Lobbying AMD for doing something Intel will do in a couple of years

You only think that because you haven't been following Intel's GPU strategy closely enough. You're dead wrong, though. They've very clearly telegraphed they intend to continue maintaining separate architectures for datacenter vs. client GPUs.

**ms178** · 09 December 2023, 03:29 PM

Originally posted by coder View Post

Similarly, Intel cannot use consumer GPU dies in Falcon Shores, even if it wanted to, because their client GPUs don't have any fp64 hardware and HPC users need a lot.

There is a simple solution for this problem that has been seen before in the industry - design a common tile for all segments with such fp64 capabilities and nerf it down for the consumer segment via firmware to 1/8th or 1/16th of the performance. I doubt that the cost for the increased silicon area is higher than designing and validating two seperate GPU tiles for each segment (also considering the inflexibility to use these for both segments if there is any excess inventory). But it's Intel after all, I do not rule out anything here.

Originally posted by coder View Post

You're confused. AMD is the only one doing this, and they only do it for their CPUs, not GPUs. .

Please, I was looking at the future which has been rumored about on different channels already and it has been revealed officially by AMD years ago that they brought that chiplet expertise to the Radeon group. Even Nvidia is working on such a chiplet design in-house for some time already. Nvidia hasn't released anything with that tech yet as they either haven't mastered it yet or the trade-offs weren't worth it for their architecture to this date.

Originally posted by coder View Post

As for Intel, their use of tiles in current and future CPUs is simply to increase yield. Their server cores & tiles are very different from the client ones.

On servers, Intel goes with P-core and E-core only CPU tiles. Meteor Lake ships these days with a Alchemist+ GPU tile (that is not just a shrink from the consumer line either). It doesn't need to stop in the server segment with CPU-tiles in the future. The end-goal is to mix and match different tiles as they please for each segment, it's the same for AMD, just with a different technical implementation.

Originally posted by coder View Post

So far, the only example we have of anyone distributing rendering across multiple dies is Apple. In spite of their claims, their M-series SoCs cannot match the rendering performance of top-end dGPUs. AMD said they looked at it and found the energy overhead was too big, due to the amount of data movement involved.

That might have been true yesterday. It might even be true today, it doesn't mean that it will stay this way forever. My comments were forward looking. Also I think Imagination Technologies has mastered such a thing, too. But they are not worth talking about today.

Originally posted by coder View Post

Oh, but Intel's oneAPI and CUDA can? Please. Some flat-earth theories make more sense than that.

Please, keep it civil. I already hinted at a manpower problem, too. I also layed out how their bifurcation strategy hurt them hard. This can be overcome of course by different means. I am neither a GPU engineer nor the CEO of AMD, so there might be better approaches to this problem than I can imagine. But I think it has become crystal clear for everyone that the bi-furcation of their GPU architecture might not have been the best decision at a time with the thin software ressources at their hands.

Originally posted by coder View Post

The one thing they're not going to do is walk back their hardware strategy, because they spent like 15 years trying to address both markets with the same architecture and they know very well the downsides of that approach. You get a more expensive solution that's less adept at either workload.

Never say never.

On the CPU-side AMD once thought of Bulldozer of being a good idea, too. That didn't work out that well in practice. The sooner some strategic mistakes are corrected, the better. As mentioned, there might be better options to tackle this to what I proposed.

Originally posted by coder View Post

You only think that because you haven't been following Intel's GPU strategy closely enough. You're dead wrong, though. They've very clearly telegraphed they intend to continue maintaining separate architectures for datacenter vs. client GPUs.

There is no such thing as clearly telegraphed intentions coming from Intel these days. The Falcon Shores re-definition was one prime example I gave you already. There is clearly a lot of confusion about (now defunct) AXG's plans and what Intel intends to do with GPUs going forward in the long term.

**coder** · 09 December 2023, 11:26 PM

Originally posted by ms178 View Post

There is a simple solution for this problem that has been seen before in the industry - design a common tile for all segments with such fp64 capabilities and nerf it down for the consumer segment via firmware to 1/8th or 1/16th of the performance.

First, modern HPC GPUs don't only support fp64 vector arithmetic. They also now support fp32 and fp64 matrix operations, which also uses quite a bit of die space.

Meanwhile, these GPU-compute applications have no need of the rendering hardware (ROPS, Texture Units, tessellators, ray tracing engines, etc.) needed for interactive graphics. I once asked a GPU engineer about how much die area was consumed by these graphics hardware engines and he estimated up 1/2.

Originally posted by ms178 View Post

I doubt that the cost for the increased silicon area is higher than designing and validating two seperate GPU tiles for each segment

Based on what? Your imagination?

I trust these GPU makers to know their business. They know what the hardware design & fabrication costs are, as well as the software costs. We see that all three have independently reached the same conclusion that a hard fork of their GPU architectures is warranted to optimally address the HPC and client markets.

But I'm sure you know better.
/s

Originally posted by ms178 View Post

it has been revealed officially by AMD years ago that they brought that chiplet expertise to the Radeon group.

They did that, and what they found made sense was to separate the memory controller + L3 cache into separate dies, but not the compute.

Originally posted by ms178 View Post

Even Nvidia is working on such a chiplet design in-house for some time already. Nvidia hasn't released anything with that tech yet

That should tell you something.

Originally posted by ms178 View Post

On servers, Intel goes with P-core and E-core only CPU tiles. Meteor Lake ships these days with a Alchemist+ GPU tile (that is not just a shrink from the consumer line either). It doesn't need to stop in the server segment with CPU-tiles in the future. The end-goal is to mix and match different tiles as they please for each segment, it's the same for AMD, just with a different technical implementation.

They're not doing it to share tiles between client vs. server, however. That's the key difference between what they're doing and what you're saying. It's never going to happen, because the needs of each segment are very specific and distinct from the other.

Originally posted by ms178 View Post

I think Imagination Technologies has mastered such a thing, too. But they are not worth talking about today.

The only example we have of a scaled up GPU, based on Imagination's tech, is the disastrous MTT S80. Until we can even see a remotely competitive example of their tech scaling up on a monolithic die, I don't care what else they are saying or doing.

Originally posted by ms178 View Post

I already hinted at a manpower problem, too.

They're not going to nerf their hardware stack and hand Nvidia an even bigger lead in both markets, just to cover a solvable staffing problem! They just need to staff up comparable to Nvidia, if that's who they want to compete against.

Originally posted by ms178 View Post

I also layed out how their bifurcation strategy hurt them hard.

No, you didn't. You also glossed over the fact that they were getting their ass kicked on client performance, before RDNA came along and started to reverse the trend. Then, RDNA2 was the first time we saw AMD chalk up some actual wins in about a decade! You want them just to throw that all away and go back to the lackluster era of GCN!

Originally posted by ms178 View Post

I am neither a GPU engineer nor the CEO of AMD,

Exactly, yet you seem to think you know better than them.

Originally posted by ms178 View Post

Never say never.

On the CPU-side AMD once thought of Bulldozer of being a good idea, too. That didn't work out that well in practice.

Bulldozer failed hard, so of course they were going to walk it back.

On the contrary, RDNA has been working well. Judging by the initial claims, MI300 seems to be notching some wins, as well. You don't reverse course when you're finally catching up!

Originally posted by ms178 View Post

The sooner some strategic mistakes are corrected, the better.

Except what you're talking about is reversing strategic wins.

Originally posted by ms178 View Post

There is no such thing as clearly telegraphed intentions coming from Intel these days.

https://cdrdv2-public.intel.com/7874...-%20Public.pdf

Shows a roadmap that distinguishes their HPC product line from their GPU Flex series. The GPU Flex series repurposes consumer GPU dies for transcoding and inferencing workloads. The GPU Max series packs the AI training and fp64 horsepower.

Originally posted by ms178 View Post

The Falcon Shores re-definition was one prime example I gave you already.

Except you don't even understand the example you're citing. They always planned a CPU + GPU hybrid, plus a GPU-only version. All they did was cancel the hybrid version. It's still a HPC/datacenter-oriented GPU in every way.

https://www.tomshardware.com/news/in...-first-details

Originally posted by ms178 View Post

There is clearly a lot of confusion about (now defunct) AXG's plans and what Intel intends to do with GPUs going forward in the long term.

Not as much confusion as seems to exist in your mind.

The strength of your opinions is grossly mismatched to your actual knowledge about the GPU industry. Even for an outsider, you're poorly-positioned to be second-guessing their core business strategy.

Please stick to what you know, because it's clearly not this.

**ms178** · 10 December 2023, 05:37 AM

Originally posted by coder View Post

Please stick to what you know, because it's clearly not this.

It is funny that you always claim to know better, even the GPU industry that you are not a part of either. Let's wait and see what the future will bring to us.

**coder** · 10 December 2023, 06:49 AM

Originally posted by ms178 View Post

It is funny that you always claim to know better, even the GPU industry that you are not a part of either.

I obviously follow it a lot more closely than you do. I also know people who work at Nvidia and AMD's graphics group (no, they don't tell me anything non-public). At my job, we use Intel iGPUs and Nvidia dGPUs for compute workloads.

Originally posted by ms178 View Post

Let's wait and see what the future will bring to us.

You can sit back & be surprised when it doesn't go the way you think it should, or you can inform yourself better and try to understand enough to see why they do what they do.

What's intolerable is to presume you know their business better than they do, when you clearly don't even read all the information and analysis available to you. Even then, because a lot of the key decisions depend on data that we'll never see, we can never know for sure the rationale behind things they do or don't.

Seriously, do you think they're idiots? Because, last time I checked, idiots aren't very good either at designing high-performance chips or running successful businesses, in a fast-paced and hyper-competitive industry.

I can sit here and complain about the state of ROCm support for consumer GPUs, but it's hard for me to second-guess what AMD should do about it. That's the real difference between me and you. I can distinguish between what I want and what makes sense, from their perspective. What makes sense, from my perspective, is to buy an Intel dGPU (if they ever fix their idle power problem).

**ms178** · 10 December 2023, 11:16 AM

Originally posted by coder View Post

I obviously follow it a lot more closely than you do. I also know people who work at Nvidia and AMD's graphics group (no, they don't tell me anything non-public). At my job, we use Intel iGPUs and Nvidia dGPUs for compute workloads.

I've worked in the industry for some time myself and watching it for over 30 years now, even though I don't advise tech companies or on business matters, watching the tech industry very closely is a hobby for me. I was obviously speculating about future technology trends for the most part, based on rumors, announcements and company PR etc. that are available out there.

Originally posted by coder View Post

You can sit back & be surprised when it doesn't go the way you think it should, or you can inform yourself better and try to understand enough to see why they do what they do. What's intolerable is to presume you know their business better than they do, when you clearly don't even read all the information and analysis available to you. Even then, because a lot of the key decisions depend on data that we'll never see, we can never know for sure the rationale behind things they do or don't.

Well, I am thankful for a meaningful discussion with you and your challenges to my views even though your counter-arguments did not convince me so far.

Originally posted by coder View Post

Seriously, do you think they're idiots? Because, last time I checked, idiots aren't very good either at designing high-performance chips or running successful businesses, in a fast-paced and hyper-competitive industry.

Having worked in large (public sector) organisations myself, some decisions on the top might support the former view.

But seriously, Lisa Su reminded us all the other day that these decisions are great financial and technological bets, for the best solutions at the right time. She might have lost the initial AI bet with playing it too conservatively wheras Nvidia embraced it from the start. There is still a lot of business to be had over the next couple of years but Nvidia gained a lot financially from their AI dominance (as their last financials showed very impressively).

Originally posted by coder View Post

I can sit here and complain about the state of ROCm support for consumer GPUs, but it's hard for me to second-guess what AMD should do about it. [...] What makes sense, from my perspective, is to buy an Intel dGPU (if they ever fix their idle power problem).

It seems we found some common ground here, another answer to overcome AMD's software weaknesses could be that AMD should consider to join Intel's oneAPI effort. There was even some Intel-funded work for AMD hardware in that area. Company politics might make that unlikely, but it would be another option going forward.

Feedback from the peanut gallery is free advice. They are also free to ignore it if they know better...

**coder** · 11 December 2023, 10:37 AM

Originally posted by ms178 View Post

I've worked in the industry for some time myself and watching it for over 30 years now, even though I don't advise tech companies or on business matters,

Then you should be well aware that the math has to add up, in order for a decision to make business sense. Without any of the data, how you can believe you know what they should do is beyond me.

Originally posted by ms178 View Post

watching the tech industry very closely is a hobby for me.

Apparently not GPUs, then.

Originally posted by ms178 View Post

But seriously, Lisa Su reminded us all the other day that these decisions are great financial and technological bets, for the best solutions at the right time.

She's probably referring to the way chip design has such a long lead time. From the point of initial development until fist shipment to customers, modern CPU and GPUs can take up to 4 years! You have to make predictions about costs, demand, and competition. That's what makes it a gamble. In order to be reasonably successful, you need to do a really good job of controlling and modelling as many factors as possible, as well as eliminating unnecessary risks.

What she's definitely not saying is that they just make a blind guess.

Originally posted by ms178 View Post

She might have lost the initial AI bet with playing it too conservatively wheras Nvidia embraced it from the start.

AMD certainly made some missteps on their path to GPU Compute & AI, but it's hard to say exactly what they should've done and when, without knowing what kinds of budgets they had to work with. You must be mindful of the fact that they were barely keeping the lights on, in the years just before & after Zen. They had also lost some talent, through layoffs and attrition. It takes time to build capacity.

I disliked how AMD pivoted towards HIP, just as it seemed they were finally getting ROCm in shape. I'd have much rather seen them continue to stabilize ROCm, get it nicely packaged and integrated into more distros, ported to their entire hardware range, and get caught up on their OpenCL support. Those are my selfish wishes. I can't say that would've best positioned them for their AI or HPC objectives, however.

Originally posted by ms178 View Post

another answer to overcome AMD's software weaknesses could be that AMD should consider to join Intel's oneAPI effort.

Having just invested so much in HIP, I don't see that happening. In fact, I'm sure AMD would rather see HIP running on Intel hardware than oneAPI running on AMD hardware.

Announcement

AMD Details The MI300X & MI300A, Announces ROCm 6.0 Software

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment