Originally posted by qarium
View Post
Originally posted by qarium
View Post
So onto what you said. I looked it up and it looks like there were issues with Windows 98 and ME using over 512MB. People who run either OS on newer hardware usually use an unofficial patch to get around those issues. Fair enough. Maybe that would lead to issues with the Kyro or Kyro II. However you said:
"i did buy such a Imagination Technology PowerVR Kyro graphics card i am nur is was the Kyro1 or Kyro 2,,, but this card was total shit and i did give it back after only 1-2 days. the reason was my system at that time had 756MB RAM but this card only supported 512MB ram. to use this card i would had downgrade to 512mb ram so i decided to just give the card bag."
So you're making a judgement on the quality of the card based on the driver assuming a limitation of the OS that people would need to use a workaround or extra tools to get around. How does that make the card itself shit? That doesn't reflect on it's architecture, it would have to do with it's driver.
Originally posted by qarium
View Post
Originally posted by qarium
View Post
AMD was attempting to do that in Navi 4C with up to 9 SEDs (Shader Engine Dies), and active interposer, a MID (Multimedia and IO die), and 6 GDDR PHY dies but it's been canceled as AMD isn't looking to produce high-end cards next gen.
Originally posted by qarium
View Post
Regardless, if there are patent issues that really doesn't prevent it from being a solution to spreading a GPU design across chiplets. It would just mean there's some roadblocks for those who would need to pay to use these patents.
I'm also not suggesting that Nvidia and AMD must go tile-based in order to use chiplets. I was stating why I feel tile-based designs are naturally more easily suited to spanning across chiplets. That doesn't mean typical IMRs can't also span across chiplets, it's just way more complicated.
Originally posted by qarium
View Post
Originally posted by qarium
View Post
The reason I mentioned those techniques is to demonstrate the effectiveness of TBDR techniques and to explain why I felt TBRs are uniquely well-suited to working with chiplets. I feel I demonstrated that well.
Originally posted by qarium
View Post
I know 2.5TB/s of bandwidth sounds like a lot but the cumulative bandwidth that the MCDs provide to the GCD in the 7900XTX is 5.3TB/s. That's just for the compute die to access memory that's off-die. The fact that the GCD itself can't be split up means that would require it's internal bandwith is much higher than 5.3TB/s.
The fact that the M2 Ultra connects two TBDRs (compute portions and all) together and has them working as one with less than half of the bandwidth that an IMR graphics compute chiplet needs to communicate with it's last-level cache and external memory, shows that I was right: you need far less inter-chiplet bandwidth for two TBDR chiplets to work as one GPU than you would for an IMR.
That also means that the amount of bandwidth that the MCDs provide the 7900XTX's one GCD is more than enough to cover four M2 chips being connected together. But remember that the M2 Maxes aren't GPU chips, they're SOCs. So putting four together gets you one SOC with 48 CPU cores, 152 GPU core (that's 19,456 ALUs), 64 Neural Engine cores, four Image Signal Processors, four video accelerators, a bunch of USB4, and four PCI storage controllers.
From that we can derive that if the chips were just TBDR GPUs and lacked everything else that makes them SOCs, they could get away with an even less bandwidth still.
Originally posted by qarium
View Post
You must first render the low resolution image before you can upscale it.
In order to get motion vectors for the current frame, you must first compute the previous frame.
In order to interpolate the in-between frame, the GPU first computer and upscale the last two frames to interpolate a frame between them.
If we assume a four chiplet GPU, I supposed it could try to render two of the frames at the same time on different chiplets, just like Alternate Frame Rendering. But if frame 1 takes longer to render than frame 2, it needs to wait on the frame 1 to be done in order to calculate motion vectors for frame 2. Now the middle frame can be interpolated but that's a less intensive process than rendering a single low-res image so it wouldn't make full use out of a third chiplet. More importantly, the first two chiplets are done with their work so there's not reason one of them couldn't do it. The only use I can think of for the last two chiplets is to compute two more future frames or have them do split frame rendering, but this is all really just taking ancient methods of multi-GPU rendering on IMRs, apply it to chiplets and adding temporal upscaling and interpolation.
Look up benchmarks of Quad SLI and Quad Crossfire rendering. You might see decent boost across to cards but it's never 2x and sometimes you get lower frame rates than one card. As you add more cards, you get more diminishing returns and the chances getting a negative benefit over a single card increase.
Now I'll read how you think it will work.
Originally posted by qarium
View Post
Originally posted by qarium
View Post
Originally posted by qarium
View Post
FSR3 and DLSS3+ create artifacts. They're interpolating data all over the place including whole frames where a character might just be missing a leg or something.
Originally posted by qarium
View Post
Originally posted by qarium
View Post
Comment