Announcement

**NeoMorpheus** · 16 August 2023, 11:16 AM

I wish that they returned to the desktop GPU market.

**Myownfriend** · 16 August 2023, 02:44 PM

Originally posted by NeoMorpheus View Post

I wish that they returned to the desktop GPU market.

They have a few desktop GPUs in China that use their B series architecture. One of them even uses two chips with four chiplets per GPU. I've been looking for reviews of them but the only one I saw was on LTT and it didn't perform well. I imagine (lol) a good part of that was driver related because lacked features like tessellation which Imagination's GPUs definitely support.

It's unfortunate that they don't partner up with someone like EVGA to release GPUs in the west because I think they have a lot of potential to shake up the market. It's clear that the tile-based nature of their GPUs is the reason they were able to get their design working across four chiplets while other vendors are struggling to get to two. Being able to split a GPU up into that many parts can be very advantageous to GPU vendors and customers. With just one chip they could have atleast 1024, 2048, 3072, and 4096 ALU SKUs. On top of that, they could take advantage of binning and offer an additional 4096 ALU SKU with highly binned chiplets that clock the fastest and a less than 1024 ALU SKU that uses chiplets with defects. They could sell all of these at pretty competitive prices because the yields on that chip would be very high.

Having multiple chiplets that can work on their own or together has some potential benefits for server customers, too, if they are given the ability to split a 4-chiplet GPU into two 2-chiplet GPUs or four 1-chiplet GPUs then they can pass-through whole chiplets to VMs without virtualization. If one chiplet goes bad for some reason then the GPU could still technically function as a lesser GPU instead of going off-line completely. I'm not aware of any other GPU that could do that.

Of course, having that many identical chiplets would have some weird redundancy that can still be exploited in interesting ways. Think about hardware video decoders and encoders. If each chiplet is the same then each chiplet would have one of these decoders and encoders so a four chiplet design would have four of them. That's weird but if they're utilized kind of like Intel's Hyper Encode then they can work together to get nearly 4x the encode speed. Other aspects like having more display engines and audio DSPs would also be welcome. If each chiplet can support two 4K displays and has a single core audio DSP then a four chiplet GPU could support eight 4K displays and have a four core audio DSP.

**qarium** · 16 August 2023, 04:39 PM

Originally posted by Myownfriend View Post

They have a few desktop GPUs in China that use their B series architecture. One of them even uses two chips with four chiplets per GPU. I've been looking for reviews of them but the only one I saw was on LTT and it didn't perform well. I imagine (lol) a good part of that was driver related because lacked features like tessellation which Imagination's GPUs definitely support.
It's unfortunate that they don't partner up with someone like EVGA to release GPUs in the west because I think they have a lot of potential to shake up the market. It's clear that the tile-based nature of their GPUs is the reason they were able to get their design working across four chiplets while other vendors are struggling to get to two. Being able to split a GPU up into that many parts can be very advantageous to GPU vendors and customers. With just one chip they could have atleast 1024, 2048, 3072, and 4096 ALU SKUs. On top of that, they could take advantage of binning and offer an additional 4096 ALU SKU with highly binned chiplets that clock the fastest and a less than 1024 ALU SKU that uses chiplets with defects. They could sell all of these at pretty competitive prices because the yields on that chip would be very high.
Having multiple chiplets that can work on their own or together has some potential benefits for server customers, too, if they are given the ability to split a 4-chiplet GPU into two 2-chiplet GPUs or four 1-chiplet GPUs then they can pass-through whole chiplets to VMs without virtualization. If one chiplet goes bad for some reason then the GPU could still technically function as a lesser GPU instead of going off-line completely. I'm not aware of any other GPU that could do that.
Of course, having that many identical chiplets would have some weird redundancy that can still be exploited in interesting ways. Think about hardware video decoders and encoders. If each chiplet is the same then each chiplet would have one of these decoders and encoders so a four chiplet design would have four of them. That's weird but if they're utilized kind of like Intel's Hyper Encode then they can work together to get nearly 4x the encode speed. Other aspects like having more display engines and audio DSPs would also be welcome. If each chiplet can support two 4K displays and has a single core audio DSP then a four chiplet GPU could support eight 4K displays and have a four core audio DSP.

"potential to shake up the market"

looks like you life in a alternative reality... for many years Imagination Technology/PowerVR hat the worst opensource driver support from all GPU companies. the situation was even worst than Nvidia because Nvidia had as some people claim working closed source drivers and Imagination Technology had never good working GPU drivers on linux.
of course they say they changed and now do open-source drivers... but they are not known to have "Good" opensource drivers.

the release of PowerVR cards like LTT where a full disaster with only bad reviews... there was not a single review who said it is good.

you say "potential to shake up the market" but it only shows you life in an alternative reality.

you claim Imagination Technology/PowerVR has chiplet design and others have no chiplet design thats wrong.

a AMD 7900XTX is in fact a chiplet design to. a 7900XTX has 7 chiplets on the GPU...
then ok you can say 6 chiplets only do caching and only 1 gpu die who does the compute.

a gpu with many gpu dies is not something new a Voodoo-5 6000 had 4 VSA100 gpu dies...

you say they have the technology to put 4 gpu dies on a gpu but any product they ever released (LTT)
did not show any signifikant performance uplift.

intel did also try to "shake up the market" and... well i can not see it.

**Myownfriend** · 16 August 2023, 06:23 PM

Originally posted by qarium View Post

"potential to shake up the market"

looks like you life in a alternative reality...

How? I'm talking about potential. I'm not saying what will happen and I'm not lying about how things are.

Originally posted by qarium View Post

for many years Imagination Technology/PowerVR hat the worst opensource driver support from all GPU companies. the situation was even worst than Nvidia because Nvidia had as some people claim working closed source drivers and Imagination Technology had never good working GPU drivers on linux.

How would that effect there ability to shake up the market? lol I love open source software but the reality is that vast majority of consumers use a closed-source operating system with closed-source drivers. Linux makes up only a tiny sliver of the consumer market. We're not the whole world. This is what I'm geting out when I say that you live in an alternate reality. It might hurt your feelings when I say that but you prove it more and more as you keep saying more.

Originally posted by qarium View Post

of course they say they changed and now do open-source drivers... but they are not known to have "Good" opensource drivers.

I'm not sure they're known for much right now. I do know that them releasing open-source drivers was applauded by the sane members of the Linux community.

Originally posted by qarium View Post

the release of PowerVR cards like LTT where a full disaster with only bad reviews... there was not a single review who said it is good.

How is that relevant to their potential? lol

Originally posted by qarium View Post

you say "potential to shake up the market" but it only shows you life in an alternative reality.

You said this already lol Why not spend this time to explain how you don't feel they have potential to shake up the market instead of repeating something that I said to you that accurately describes you?

Originally posted by qarium View Post

you claim Imagination Technology/PowerVR has chiplet design and others have no chiplet design thats wrong.

I didn't say that lol I said others are struggling to scale to two chiplets and I DO mean compute chiplets.

Originally posted by qarium View Post

a AMD 7900XTX is in fact a chiplet design to. a 7900XTX has 7 chiplets on the GPU...
then ok you can say 6 chiplets only do caching and only 1 gpu die who does the compute.

Yea, the fact that only one of the dies is a compute die is pretty relevant to the point I was making lol My point was that starting with Imagination's B series they're able to create chiplets that can work as one large GPU or four separate ones. Each chiplet can have it's own memory controller, display engine, and other accelerators and it can scale up to four chiplets. Alternatively, those mini-GPUs can be placed on one die and they don't even need to be near each other. That provides a lot a possibilities for creating a cost-effective line of graphics cards while only have to manufacture one, smaller chiplet.

I DO think AMD's design is clever though and I believe they have a compute only chip that scales up to two chiplets but I'm strictly talking about graphics compute chiplets.

Originally posted by qarium View Post

a gpu with many gpu dies is not something new a Voodoo-5 6000 had 4 VSA100 gpu dies...

I'm aware. Multi-GPU graphics have come out before but there performance scaling has always been a problem. Because of the way IMRs work, with the exception of alternate frame rendering, each GPU needs to transform all of the vertices but the each just do fragment shading on portions of the screen. Those portions could be horizontal slices, quadrants, lines, tiles, etc. The Voodoo-5 6000, which didn't come out, didn't support transform and lighting so that reduced some redundancy but it pushed the work on to the CPU. The cost also would have put it well above any of it's competition.

Sega's Naomi 2 had two PowerVR GPUs. They didn't have T&L either but because the transforms and binning processing were done on the CPU, it just needed to pass different tiles to each GPU so past the T&L stage the scaling was apparently excellent. The Voodoo probably wouldn't have had that benefit because every triangle is going to cross several scanlines.

Part of what makes it difficult for IMRs to work together is that they're sent an untransformed triangle then they transform it into screen space, rasterize it, fragment shade it, and spit out as pixels while updating the Z-buffer. The order that the triangles are submitted dictates whether the z-buffer can be used to prevent unnecessary writes and fragment shading of pixels that won't contribute to the final image. Since they don't know where the triangle is going to be on-screen before it's submitted to a GPU, they can't figure out which GPU is going to end up shading it. That's why all IMR GPUs transform all of the geometry in the image even if they're working on the same image. They transform the geometry, check if they're supposed to fragment shade it, and if they don't they discard it. Because they need fast access to render targets, accessing them needs a lot of bandwidth, they would need a very high-band with interconnect to another chip in order for them to share memory. That's why multi-GPU scaling methods don't double the amount of memory? They have to store copies of most of the same data in their own memory. Can you see how scaling becomes difficult for IMR's? There are a lot stages of the render pipeline that they can't distribute well and that's why using multiple GPUs together can sometimes run worse than one.

Imagination's Tile-Based Deferred Rendering works better for multi-GPU scaling. When you push an un-transformed triangle to the GPU (or in the B series case, it pulls the triangle), the triangle is transformed into screen space, and binned into screen-space tiles in external memory. Once all of the triangles in the scene have been transformed and binned, the GPU then reads back a tile for the rasterization and fragment shading stage. The first thing it does with the tile data is sorts it's primitives from front to back, builds a depth buffer for that tile in on-chip SRAM. Then the G-buffer is created in another small amount of high-bandwidth, low-latency, on-chip memory. It's entirely possible for the G-buffer to exist solely in this memory with no need to write to external memory until the final pixels need to be sent to be displayed. The only accesses it would need to make to external memory during rasterization and fragment shading stage would be for textures or for the tiles which both don't need much memory bandwidth and don't scale up much with render resolution. One of these GPUs can work on several tiles at a time. Because of this design, a multi-GPU TBDR design needs to share a lot less information between chips than an IMR. In fact, the way that one TBDR GPU would render a frame isn't all that different from how multiple TBDR GPUs would render a frame.

The last widely released PowerVR GPU was the Kyro II back in 2001 and that was known for punching well-above it's weight. It had slower, cheaper VRAM than competitors had, it was a smaller chip than competitors were, and it ran cooler than competitor GPUs, yet in some benchmarks it outperformed cards that were $200 more expensive than it. Also because memory bandwidth wasn't an issue for it, it always rendered internally at 32 bit color.

STMicroelectronics Kyro II 64MB

https://www.anandtech.com/show/735

Originally posted by qarium View Post

you say they have the technology to put 4 gpu dies on a gpu but any product they ever released (LTT)
did not show any signifikant performance uplift.

You realized I'm referring to a Youtube channel when I say LTT right? And what do you mean it showed no significant performance uplift? Where did anybody test one chiplet vs four?

Originally posted by qarium View Post

intel did also try to "shake up the market" and... well i can not see it.

FAAANBOOOOY get pissed off if someone tries to compete with AMD lol

**Quackdoc** · 16 August 2023, 10:22 PM

imagination having an open source driver is pure music to me, glad to see they are commited

**qarium** · 17 August 2023, 02:36 PM

Originally posted by Myownfriend View Post

The last widely released PowerVR GPU was the Kyro II back in 2001 and that was known for punching well-above it's weight. It had slower, cheaper VRAM than competitors had, it was a smaller chip than competitors were, and it ran cooler than competitor GPUs, yet in some benchmarks it outperformed cards that were $200 more expensive than it. Also because memory bandwidth wasn't an issue for it, it always rendered internally at 32 bit color.

STMicroelectronics Kyro II 64MB

https://www.anandtech.com/show/735

i did buy such a Imagination Technology PowerVR Kyro graphics card i am nur is was the Kyro1 or Kyro 2,,, but this card was total shit and i did give it back after only 1-2 days. the reason was my system at that time had 756MB RAM but this card only supported 512MB ram. to use this card i would had downgrade to 512mb ram so i decided to just give the card bag.

this means Imagination Technology PowerVR Kyro cards failed in the market and Tile-Based Rendering was not the problem.

also i have to ask this is over 20 years old tech isn't it true that the Patents run out after 20 years ?

this means anyone could build a Tile-Based Rendering graphic card today. why if this is do great AMD or Intel or Nvidia do not do this ?

**Myownfriend** · 17 August 2023, 04:51 PM

Originally posted by qarium View Post

i did buy such a Imagination Technology PowerVR Kyro graphics card i am nur is was the Kyro1 or Kyro 2,,, but this card was total shit and i did give it back after only 1-2 days. the reason was my system at that time had 756MB RAM but this card only supported 512MB ram. to use this card i would had downgrade to 512mb ram so i decided to just give the card bag.

If that's true that's not necessarily on Imagination. They're not like AMD or Nvidia. They don't make the actual chips, they just design the GPU IP. Someone else licenses it to make a chip and then either uses it themselves or sells it to another party.

Originally posted by qarium View Post

this means Imagination Technology PowerVR Kyro cards failed in the market and Tile-Based Rendering was not the problem.

Your one story, which I can't find anything similar to on the internet, doesn't show that those cards were failures. Also why do you want to push they whole "____ is failures" thing.

I'm saying that Imagination's modern chiplet-based GPUs have potential to shake up the market in the future if they work their way back into desktops. I'm saying that because of how they can spread they can scale their designs with up to four chiplets. With one chip being manufactured, someone can potentially make a lineup of up to six graphics cards with very high-yields. That offended you because I didn't mention AMD's more-limited chiplet architecture. Then you mentioned a bunch of legacy cards that used multiple GPUs with far worse scaling than what I'm talking about. I explained why I feel TBDR seems to lend itself well to a chiplet design so I could re-enforce my point. I mentioned some legacy PowerVR cards to show that the bandwidth savings they touted could be observed in the real world over 20 years ago. That was relevant when talking about the needed bandwidth between chips.

At that point in the conversation you had at least stayed on the topic of multi-GPU scaling up, but after I explained my position, you decided to drop that topic entirely and try to focus on the market performance of one PowerVR-based graphics card from 22 years ago. And what you cite for that? Your own personal experience which could be made up.

Why does it hurt you so badly to give other active companies credit that aren't AMD? Why can't you talk about the technology? Why do you always view things said about other companies as an attack on AMD?

Originally posted by qarium View Post

also i have to ask this is over 20 years old tech isn't it true that the Patents run out after 20 years ?

More or less. I don't know what exact patents ImgTec owned/owns.

Originally posted by qarium View Post

this means anyone could build a Tile-Based Rendering graphic card today.

Yes they could and there are other tile-based GPUs. The Adreno GPUs by Qualcomm can switch between TBR and IMR, ARM's Mali GPUs are also tile-based, and obviously Apples GPUs are TBDR because they borrowed from Imagination's GPU design.

Originally posted by qarium View Post

why if this is do great AMD or Intel or Nvidia do not do this ?

The Architectures

Intel's early attempts at a dGPU did use titled rendering and so did their integrated graphics. They may even use some variation of it in it's dGPUs.

The Wikipedia for Tile-rendering will actually tell you that Nvidia and AMD already use tiled rendering since Pascal and Vega respectively but that isn't true. Both use tile-based rasterization/caching which bins a certain amount of consecutive triangles in the GPU and renders them to the L2 cache so they pixels can be written out to VRAM in more efficient, square groups of pixels. I think Nvidia just calls it Tile-Based Rasterization and AMD calls it Draw Space Binning Rasterizer. The technique has only some of the benefits of tile-based rendering. Because the tile lists they create never grow to the point that they contain all the scene's primitive data, they can only remove overdraw from batches of consecutively submitted triangles and they can't be used to, for example, create a whole G-buffer for a tile completely within the cache and only render write the final pixel data to VRAM.

Nvidia's newest multi-GPU rendering method is also called CFR (Checkered Frame Rendering I believe) which distributes work between two GPUs based on tiles. Both GPUs still do all the vertex shading but they essentially use a checkerboard-like mask to distribute fragment shading across GPUs which does a better job distributing the work evenly. It also removes the need to constantly try to guess how to redistribute work based on the last frame like split-frame rendering requires. This technique is kind of like a dumbed-down version of how tiled rendering GPUs distribute work both internally and externally.

AMD's Infinity Cache has the same rough goal of a TBR, to keep more work on the GPU and save external memory bandwidth. Using a large cache allows for passively getting a lot of the memory savings of TBR with an IMR. The issue is that it can't guarantee that frame buffer writes stay on the GPU and it obviously requires much more on-chip memory. A single pool of tile memory in a lot of TBRs is measures in tens of kilobytes while an Infinity Cache can exceed 100 MB.

Besides all this, I imagine that pure compute workloads contribute to why they aren't as interested in added dedicated tiling hardware to their GPUs. Compute workloads wouldn't be able to benefit from tiling. Imagination does allow the tile-memory to be used as high bandwidth, low-latency scratch memory for compute workloads but that is going to effect different compute workloads to different degrees and can't be passively used.

The Way They're Used

I imagine that part of the reason that AMD and Nvidia haven't tried going fully tile-based is because they were already invested in IMR architectures and because graphics APIs like OpenGL were built around IMRs for a very long time.

It's also worth mentioning that there are rendering techniques that games use now that mimic parts of Imagination's TBDR pipeline but they assume they're being run on an IMR.

For example, Forward+ rendering uses tiled light culling to reduce the amount of lights that need to be considered for shading. This requires splitting the screen up into screen spaces tiles and binning them in the same way that TBRs do except its done via a compute shader. When implementing Forward+ on a TBR, the GPUs built-in tiling hardware could be used and the light lists could just be add to the existing bins but the game would have to be aware that it can take advantage of this. If not then the TBR will just create two bin lists, one created by it's hardware that stores primitives, and one computed through shaders that stores light lists. Because Apple's Metal API is designed around their TBDR GPUs, a game using that API can use a tile-shader to add light lists to the tiles created by the hardware tiler.

Another technique that's frequently used is Z-prepass which is the process of sending all the geometry to the GPU just to create a full depth buffer, then resubmitting the geometry for fragment shading. Because the second pass has access to the fully created depth buffer, the fragment shading stage will never spend time shading a pixel that won't be seen in the final image. This is almost exactly what a TBDR does passively except there's still overdraw when making the depth buffer and it still needs to be written to VRAM and then read back. Writing an reading a 4K depth buffer requires far more bandwidth than writing and reading back bins of primitives. In a TBDR, the depth buffer only needs to exist in on-chip memory unless someone wants to write it to VRAM for something like shadow maps, for example. I believe Unreal Engine disables Z-prepass on the Imagination's GPUs because it would be redundant and only slow down what the GPU naturally does.

Non-realtime Rendering

3D animation software like Blender, Cinema4D, or Maya use tiles to distribute rendering of a frame across GPUs, CPU cores, or both. It's doing that specifically because tiles are an excellent way of dividing up fragment shading.

If I needed to simplify why a TBR can work across chiplets so well, I would say it's because the way they handle vertex shading and tiling is kind of like a pure compute workload. It can easily be spanned across chips and the process basically prepares a bunch of little "framelets" to distribute work for the fragment shading stage. The fragment shading stage basically works like a bunch of mini-GPUs working on their own separate "framelets" with their own memory. In theory, the different mini-GPUs don't even really have to run at the same speed to scale well.
__________________________________________________
I look forward to you ignoring this long explanation and finding something else to bitch about while jerking off AMD.

**Myownfriend** · 18 August 2023, 03:51 PM

qarium where did you go? Why so silent all of a sudden?

**qarium** · 18 August 2023, 07:07 PM

Originally posted by Myownfriend View Post

qarium where did you go? Why so silent all of a sudden?

you wrongfully think i am silent. you wrongfully think i did drop a topic. ...

i had a medical emergency (not my person had the emergency) and important work to do outside of forum and computers.
a person related to me nearly died and i had to drive 8 hours in the car today to help.

you myfriend you really have some real problems i can assure you that i had a Kyro1/2 card 22 years ago and i told you why i did give the card back i was not able to use more than 512mb ram. just remember it was the time of windows 98/windows ME and this windows version could not perform ram swaping on the harddrive with more than 512mb ram you could hack this by disabling the ram swapping on the harddrive. then 756mb ram or even 3 or even 3,5GB ram on 32bit cpus where no problem. more than 3,5gb was impossible because 512mb of the 32bit space where reserved for fixed spaces means hardwar.
with a configuration like this you had to use tools who every second or so did occupy ram space and then release the ramspace as emty because the DOS kernel of win98/me had no ram managment means i used a tool who did this ram managment. at that time people told me in Emule forums you could no do server for emule or otehr P2P tasks with win98/me because of no ram managment but if you used tools who automatically occupied ram and give it free emty this worked well they claimed you need to use windows 2000 at that time.
that could be the reason why these Kyro1/2 cards assumed you only run 512mb ram with win98/ME but i can also say these cards where also incompatible with windows2000 and XP at that time. so you als0 could not fix it with useing windows 2000 all the gamers who did buy this card had win98 or windows ME at that time because the games did not run in windows 2000 or NT 4.0

about your favored topic chiplet design with multible gpu/comute dies and explicit not the amd way with cache dies.

i do not believe that Tile-Based Rendering is the solution to chiplet design in GPUs and the reason i think so is the patent situation

in patents you only have 3-4 outcomes and this is all the time: one outcome is patent costs to high you do not use it
ond outcome is patent costs is cheaper than the benefit then you use it. and then if patent costs to high but after 20 years the patent time runs out companies as soon as possible use this patent. and so one and so one.

there are similar examples about saving memory bandwith like S3TC allt he companies used it because the benefit was to high and did outgun the patent costs and after the patent did run out really everyone did use it.

with Tile-Based Rendering this is different the patent run out and as you say AMD/Nvidia/intel still do not use it. why ? you already explained it. they only used similar techniques to mimic the effect without really use it.

and as you say the only big player with relevant marketshare in the desktop business is APPLE but apple does not have chiplet design yet instead they use very very large chip dies... means the complete opposit of what you claim.

instead i think FSR3 upscalling with or without temporal data with Frame-generation is the solution for chiplet designs.

Frame-Generation in DLSS3 and FSR3 are the proof that you can use old data to calculate with deep learning or by smart algorytms what the frame should look like without really calculate it.

now if you have FSR and frame generation and you use multible GPUs to render some parts of the screen then the parts who where NOT calculated can be restored from old data by the FSR3 frame generation engine.

chiplet designs and multigpu solutions had the problems that if some parts run on one gpu and some parts on another gpu that if you put it together it maybe does not fit together and you have artefacts or differences what makes it look ugly...

but with FSR3 with frame generation or DLSS3+ the deep learning algorythmen run on shaders or custom DLSS like hardware could just fill the gab between what is really there and what is not there...

and this could be made in a way that humans can not see a quality loss ... maybe if you make screenshots you see something but these technologies are not made for screenshot these technologies are made for real movement and in movement means real gaming it could have such small differences that the human eye could not see a different.

Announcement

Imagination Tech Posts Updated PowerVR Linux DRM Driver

Imagination Tech Posts Updated PowerVR Linux DRM Driver

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment