Announcement

**mangeek** · 12 November 2023, 10:47 PM

Originally posted by pong View Post

It is far past time for "consumer" GPUs to support SR-IOV and virtualization in LINUX & ms-windows.

Agree 100%. I think it was about ten years ago when I started arguing for our VDI and terminal server instances to have at least rudimentary GPU because the whole user experience was starting to pipe through it. A 'productivity desktop' VM with just a tiny slice of a real GPU is so much better than software graphics.

Hypervisor vendors and GPU manufacturers know this. Their own power users and developers are running VMs at their desks. I suspect the reason it's been so hard to get to is 'security'. Shared GPUs on servers between guests and desktops with the host OS are probably a whole different ballgame for security stuff, and I'll bet that a lot of GPUs and their drivers make assumptions about the trust that aren't necessarily true once guests start carving slices out of 'em.

**qarium** · 14 November 2023, 09:24 PM

Originally posted by pong View Post

Yes, I think it's increasingly important and desirable to have open hardware architectures / designs.
Though the major problem with that is that even though it may be possible to design a decent computer / peripheral that is open
the ability to make the chips to implement it will remail very closed so then one still is at the mercy of the IC fabricators to make something
that has good availability, good value, and is also trustable since one cannot know what may be different between the "open design" vs.
what ends up in the fabricated chip.
I think with respect to GPUs, though, that it is ridiculous to depend on "toy" GPUs for the foundation of both our 2D/3D graphics processing capabilities but also our HPC, SIMD, parallel, ray tracing, tensor operations, AI/ML operations, and high RAM bandwidth general purpose computing.
Ok there is a market segment that is not totally mainstream (grandparent on their chrome book / cell phone / small laptop / tablet) but is very prominent (gamers, creative consumers, developers, people using graphic / compute intense productivity tools, people using AI/ML, etc.) that routinely willingly will spend $500-$2000 MORE than the cost of a basic "powerful desktop computer" to get the capabilities that a GPU offers.
Mainly those capabilities are (A) high RAM bandwidth (e.g. ~1TBy/s more or less), and (B) highly parallel integer / FP computations (e.g. 4k SIMD ALU processors more or less), and (C) accelerating architectural elements for things like tensor / matrix / vector / ray tracing / AI-ML operations, and finally (D) just several actual display interfaces that can scan out frame buffer contents (HDMI, displayport, etc.).
Any ray-tracing cores along with item (D) (the actual literal frame buffer DMA output to multiple display port / HDMI interfaces are the ONLY things that are really specifically / mainly anything to do with actual graphics / display interfacing.
All of the rest of the GPU functions (in a modern programmable shader pipeline GPU) are actually just either high bandwidth RAM interfaces, parallel programmable processor cores or are COMPUTE specific acceleration cores (tensor / AI-ML / etc.) -- all of which don't have any real "reason" to be associated with a GPU as opposed to being part of the core compute / memory architecture of the "core computer".
It should be obvious that the *GRAPHICS* specific parts (display interfaces, frame buffer DMA, maybe some ray tracing H/W) of a GPU are pretty insignificant in technology compared to the REST of what's in a modern mid-range or higher GPU, so if one is talking about a $500 GPU surely the "display interface and graphics specific" stuff is more suitably apportioned to be 20% of that cost, probably less.
So therefore since there's obviously DEMAND (by the many ~millions / year), there's obviously precedent (people want the AI/ML, compute, high RAM bandwidth, graphics and non-graphics purpose COMPUTE capabilities of a modern GPU), and there's NO END IN SIGHT (people will want more and more AI/ML, graphics processing until there's real time fluid rendered true photorealistic VR / AR / synthetic holograms etc.),
the elephant in the room question which few people seem to be focusing on is *WHY* are 80% of the RAM / COMPUTE / math / SIMD / parallel capabilities associated with GPUs NOT actually intrinsically made foundational in mid/high range consumer desktop workstation ARCHITECTURES for CPU / RAM / chipset / motherboard INSTEAD of being added on lumped into a "toy" GPU?
Moore's law has enabled us to have desktop PCs with 0.5/1/2-TBy/s access to many gigabytes of RAM, massively parallel int/FP ALUs, 2k-6k of them, and acceleration cores for tensor/matrix/vector/AI-ML operations that have 1-to-many TOPS/s performance and all able to be found in a typical well equipped "teenage gamer's" gaming desktop with a $500-$1000 GPU.
But the form factors are ridiculous (just TRY having / using more than one or maybe two PCIE x16 slots if you've got a 2.5-3+ slot GPU, fans, cables, ...). The mechanical & power architecture is deplorable (melting / igniting cables & connectors when using top of the line modern gear, kilo-Watt+ PSUs and GPUs and cables that don't even FIT right into almost any case). Major lack of PCIE lanes / slots one can actually use.
Artificial limitations like no SR-IOV, short / bad GPU warranties, GPUs not designed for maintenance / quality / long life (fans, thermal solution issues, easy access to clean / replace parts like fans), vendors that don't even support their "consumer" cards for their compute / ML libraries (e.g. AMD RDNAx vs ROCm), virtualization / sharing that is completely non-existent.
So instead of anemic 2-channel DDRx interfaces, CPUs with ONLY ~16 cores, motherboards with ONLY 4-DIMM slots which you're lucky if you can even USE all four without problems / trade-offs, SIMD/vector stuff built into the CPUs that looks like a pathetic toy compared to the capability of a small GPU (AVX-512, NEON, ...), why not move some of this high bandwidth RAM and high performance compute / AI-ML / massively parallel SIMD stuff into the holistic core machine architecture where it belongs. Properly integrate virtualization so EVERYTHING virtualizes, shares well. Design the form-factors, sockets, etc. so we get real expandability and scalability back without insane physical / mechanical / electrical compromises.
Fine keep chromebooks, laptops, entry level ryzen / intel desktops / laptops as they are for that low end "don't need a DGPU anyway" market.
But anything higher that WOULD have a $500-$2000 DGPU really needs an architectural overhaul for sanity, quality, usability, and holistic "it all works together" sake.
It is particularly ironic that AMD & Intel who make almost all the CPUs out there do for every single one of their processors in the last N generations include MMUs and several other virtualization technologies, IOMMU, etc right in their CPU & system chipsets even for the least cost entry level CPUs these days intended for consumer markets.
But those SAME COMPANIES make DGPUs that have the most virtualization / resource sharing / resource isolation hostile HW / SW stack possible for their GPU products sold to the same consumer desktop market, it makes no sense to have such a bipolar attitude to what
should be a uniform "everything should be able to be isolated / secure for multi-process / multi-user / multi-level security use, everything should be able to be virtualized" architecture.
Even cell phones these days have tensor / vector / AIML acceleration cores built right into the CPU but somehow the desktop architecture is
not even remotely holistically updated every couple of decades to follow the "it should have been done since 2010" scaling of what's now
a "GPU" technology into the core form factor / chipset / motherboard / CPU / memory architecture of the desktop / workstation / server.

just for your information with opensource hardware it is impossible to implement HDMI because it is closed and patented spec

opensource designs can only implement Displayport or DVI

**agd5f** · 17 November 2023, 12:26 PM

virtio-gpu with native contexts is, IMHO, a better paravirtualization solution than GVT-g. virtio is a generic interface and is supported in Windows to some extent. Basically, you run native UMDs in the VM and proxy an IOCTL like interface across to the KMD running in the hypervisor. It's similar to the way MS does virtualization for WSL. Linux on Linux works today with this interface. For Linux on Windows, you can use WSL. For Windows on Linux, you could either run mesa UMDs on Windows or write a virtio-gpu backend for Windows UMDs.

SR-IOV is not a magic bullet. SR-IOV virtual functions (VFs) are not the same as physical functions (PFs). You need explicit support for the VF in the KMDs for every OS you want to run in a VM. You can't just use the regular bare metal drivers. Additionally, at least on AMD GPUs, SR-IOV only virtualizes the engines on the GPU, not the display hardware, so you still need to figure out how to handle display surfaces in the VM somehow so they can actually be displayed somewhere. Most of this is in place today for Linux on Linux, but I assume most people are interested in some combination including Windows which would require a bunch of work on the Windows side to support this for consumer cases.

**sharpjs** · 17 November 2023, 01:19 PM

Originally posted by agd5f View Post

I assume most people are interested in some combination including Windows

Correct in my case. I just want to run a Windows VM with snappy desktop (not games) graphics performance on Qemu+KVM without undue hassle or limitations. Currently, it can't be done IMO, so I use VMware Workstation.

**qarium** · 17 November 2023, 08:27 PM

Originally posted by agd5f View Post

virtio-gpu with native contexts is, IMHO, a better paravirtualization solution than GVT-g. virtio is a generic interface and is supported in Windows to some extent. Basically, you run native UMDs in the VM and proxy an IOCTL like interface across to the KMD running in the hypervisor. It's similar to the way MS does virtualization for WSL. Linux on Linux works today with this interface. For Linux on Windows, you can use WSL. For Windows on Linux, you could either run mesa UMDs on Windows or write a virtio-gpu backend for Windows UMDs.

SR-IOV is not a magic bullet. SR-IOV virtual functions (VFs) are not the same as physical functions (PFs). You need explicit support for the VF in the KMDs for every OS you want to run in a VM. You can't just use the regular bare metal drivers. Additionally, at least on AMD GPUs, SR-IOV only virtualizes the engines on the GPU, not the display hardware, so you still need to figure out how to handle display surfaces in the VM somehow so they can actually be displayed somewhere. Most of this is in place today for Linux on Linux, but I assume most people are interested in some combination including Windows which would require a bunch of work on the Windows side to support this for consumer cases.

I have a Fedora39 on a AMD Ryzen 7950X3D with 192GB DDR5 RAM and AMD PRO W7900 and i have a Windows 11 installed in a VM in virt-manager in windows i have the
Proxmox virtio driver installed https://pve.proxmox.com/wiki/Windows_VirtIO_Drivers

first problem i encountered was that it only supportes 2K or 2,5K resolution i had to manuelly edit the XML files to go from 16MB to 64MB VRAM for the framebuffer then i could go to 4K resolution in windows.

second problem is that only 2D works right now. mean i have no 3D acceleration in this windows 11 installed in a VM on a fedora host.

Originally posted by agd5f View Post

but I assume most people are interested in some combination including Windows which would require a bunch of work on the Windows side to support this for consumer cases.

yes right i really Vote for this to make my Workstation fully operational. can you tell me what exactly can ordinary normal people do to make this happen ? well i already did spend like 8000€ on this workstation so i am pretty sure that spend more money on AMD hardware was not possible at that time. it was before the release of the Threadripper 7000 series with that you maybe can spend more money...

so right now it is sad that we only get 2D in Windows 11 in the VM...

**qarium** · 17 November 2023, 08:34 PM

Originally posted by sharpjs View Post

Correct in my case. I just want to run a Windows VM with snappy desktop (not games) graphics performance on Qemu+KVM without undue hassle or limitations. Currently, it can't be done IMO, so I use VMware Workstation.

I do not run VMware instead i run the Virt-manager with KVM proxmox.com Windows VirtIO Drivers installed and edited the XML file to add 64mb vram instead of the default 16mb for the 2D framebuffer

and the 2D Desktop works OK... i use a AMD 7950X3D CPU this means the desktop is more or less snappy even without 3D GPU support.

"Currently, it can't be done IMO,"

what exactly do you mean by thast ? i do exactly this ??? right now you just need to do 2 additional steps you need to exit a XML file to change 16mb to 64mb for the framebuffer or else you only have 2K or 2,5K resolution support with 64mb you can go to 4K resolution

and you have to install the KVM proxmox.com Windows VirtIO Driver.. https://pve.proxmox.com/wiki/Windows_VirtIO_Drivers

**sharpjs** · 18 November 2023, 12:16 AM

Originally posted by qarium View Post

"Currently, it can't be done IMO,"

what exactly do you mean by thast ?

You might find that good enough, but I did not. It was lag city for me at 4K.

I tried the straightforward graphics methods mentioned in the libvirt XML documentation, but none yielded Windows 2D performance that I found acceptable for daily use. RDP was the fastest but just not good enough. Yes, I added VRAM via the XML as suggested by the documentation. Yes I used the virtio-win driver package (provided by Fedora), which did not include a virtio video driver.

I was already comfortable with VMware Workstation and knew it had good emulated 2D performance, so I just went with that. I would switch to Qemu+KVM if there was an easy way to match or exceed VMware's emulated graphics performance.

**Quackdoc** · 18 November 2023, 11:23 AM

Originally posted by sharpjs View Post

Correct in my case. I just want to run a Windows VM with snappy desktop (not games) graphics performance on Qemu+KVM without undue hassle or limitations. Currently, it can't be done IMO, so I use VMware Workstation.

it might be worth following the virgl on windows stuff, so far perf is pretty bad, but it may get better in the future, especially if they can implement zink + vulkan instead of gl

**pong** · 22 November 2023, 07:22 PM

I don't recall the particulars of what I've read on this topic but I can say I've seen fairly uniform success reports of people using VFIO pass-through techniques
for entire GPUs to guests L->W / L->L. So perhaps to some extent that may mean that the GPU OEM KMDs for W/L OSs see a detectable (they used to block it but now do not) difference between the VFIO/bare metal state but accommodate it.

You mention a good point about the display hardware sharing / security / virtualization. Perhaps I'm wrongly conflating different aspects but I'm not quite sure where the fault lies there that this may not be a much more simple and supported thing. A typical GPU has N different physical display attachments and each of those N can attach to K different display devices (K > 1 for DP MST or such). Each display attachment streams out what may be configured to be a unique frame buffer in either VRAM or RAM
holding the content for that display, and I assume that there should be no particular technical difficulty in assigning unique potentially unshared segregated access to those frame buffer RAM regions to particular contexts which can be different among users / processes / guests VMs.

But then speaking of non-virtualized cases I gather that "multi head" doesn't even quite work in LINUX today unless one has a physically distinct GPU card dedicated to each "head" of the system, so even within the context of the host machine there must be something that really doesn't like to "share" a GPUs multiple display outputs between desktop sessions otherwise it should be possible for the e.g. 4 different monitors attached to one GPU to be used to create a system where every one of those four monitors displays a desktop associated with a different simultaneous desktop session / logged in user despite the fact that they're all coming through a single GPU.

Though someone obviously did some of the work for the "framebuffer" / "console" level stuff since you can CTRL-ALT-Fn switch between parallel running multi-user consoles on one host and one console K/V/M so launching & running in parallel is at least possible there though why they'd be bound to a single console vs. having possibly independent monitor associations I know not is merely a configuration convention or a restriction.

Anyway I believe if you simply buy some "enterprise" model of a GPU the NVIDIA driver stack does let you run VGPU / VDI stuff so you can have N working VMs with shared GPU resources coming from K different physical GPUs where K can be 1 or more. So it seems like they've already supported it for even the VDI / VGPU cases in their MS Windows and I guess LINUX SW stacks but by market segmentation it's not exposed to the consumer card levels or their driver builds.

Re: virtualizing the engines on the GPU only on AMD, well even being able to do that would be a step in the right direction. For instance if I wanted a guest machine to
be able to run some OpenCL / ROCm / whatever compute job so it could make some productivity or rendering application happy in its own background environment but still have a fully functional host desktop session with display & shared remaining compute capability available to it as well.

And yes, I feel the pain of LINUX only limitations wrt. multiple desktop sessions via RDP/VNC or guests allocated GPU resources based on individual monitor outputs not entire GPUs at once, or guests not having CL/CUDA compute capability.

But on Windows guests it is really much worse since there nothing at all works with even basic GPU acceleration for graphics / computs unless you pass-through an entire dedicated GPU just for the VM and have no simple way to "reclaim" that GPU into the host's graphics drivers when that VM is not running (so you can use it for the host compute / displays after you shut down the pass-through VM).

At least the multi-user-multi-head thing would be nice (I think) to "fix" on LINUX since I can imagine a lot of people still in the world could find use cases to have a "family computer" where 1-4 different users could use the same computer if they just had their own independent desktop session, monitor, keyboard, mouse, login session.
Whether that's for a gaming / entertainment / media purpose or literally basic network / computer access without paying N times the cost for PC / GPU hardware
when one actual PC and one actual GPU has more than enough capability to multi-user / multi-task and drive 4 monitors.

Originally posted by agd5f View Post

virtio-gpu with native contexts is, IMHO, a better paravirtualization solution than GVT-g. virtio is a generic interface and is supported in Windows to some extent. Basically, you run native UMDs in the VM and proxy an IOCTL like interface across to the KMD running in the hypervisor. It's similar to the way MS does virtualization for WSL. Linux on Linux works today with this interface. For Linux on Windows, you can use WSL. For Windows on Linux, you could either run mesa UMDs on Windows or write a virtio-gpu backend for Windows UMDs.

SR-IOV is not a magic bullet. SR-IOV virtual functions (VFs) are not the same as physical functions (PFs). You need explicit support for the VF in the KMDs for every OS you want to run in a VM. You can't just use the regular bare metal drivers. Additionally, at least on AMD GPUs, SR-IOV only virtualizes the engines on the GPU, not the display hardware, so you still need to figure out how to handle display surfaces in the VM somehow so they can actually be displayed somewhere. Most of this is in place today for Linux on Linux, but I assume most people are interested in some combination including Windows which would require a bunch of work on the Windows side to support this for consumer cases.

**pong** · 22 November 2023, 07:41 PM

Yeah in my case when I've used QEMU+KVM or VirtualBox I can create a LINUX or WIN guest even with no 2D/3D SW or HW acceleration and that works "ok" for basic uses like running some utility program or even say light web-applications or similar GUI uses. But definitely not full featured snappy and no 2D / 3D / video CODEC / CUDA / OpenCL support to any extent other than the most basic SW SVGA (or whatever) GPU emulation essential to get the desktop to display.

For me besides the annoying limitation of the above is that when I want to run some cross platform productivity application like some old CAD package or whatever which uses 2D / 3D graphics (DX10, DX11, OpenGL, whatever) and whatever moderate stuff it may do with GPU compute shaders or what not it just literally displays a blank screen instead of any window contents at all so the whole legacy application is totally unusable since it isn't a question of even running slowly, it doesn't display window content at all when it "needs" some GPU DX10/whatever feature.

In most cases I wouldn't even care if the HOST had only basic display output enough for X or Wayland desktop without any acceleration while I'm running the GPU, but in that case I'd want the GPU to be able to be "passed through" to the guest so the guest performed as if it had ~full GPU access except for the tiny part needed for the host.
But I've never heard that doing detachment / reattachment of GPU devices to the host display drivers (nvidia, amd) works well in practice to "mode switch" a single GPU between "host only" and "guest only" control so basically one seems to need a physically separate VFIO pass-through GPU for only the guest (whether or not you actually are using the guest, you don't get to easily / dynamically swap it back and forth from host to guest exclusive control as you might a host USB device).

If I get spare time in the future I'd even work to improve this stuff in the OSS codes but I know nothing about the graphics stack and where the limitations are so it seems like a big learning curve to contribute in this area. Maybe now that the X stuff is giving way to new software stack it can be more possible to handle multi-head or dynamic GPU pass-through / host detach & reattach etc. use cases and ultimately just multi-user share any single GPU device at the "this is my display channel / frame buffer but everything else we share" level.

Originally posted by qarium View Post

I do not run VMware instead i run the Virt-manager with KVM proxmox.com Windows VirtIO Drivers installed and edited the XML file to add 64GB vram instead of the default 16mb for the 2D framebuffer

and the 2D Desktop works OK... i use a AMD 7950X3D CPU this means the desktop is more or less snappy even without 3D GPU support.

"Currently, it can't be done IMO,"

what exactly do you mean by thast ? i do exactly this ??? right now you just need to do 2 additional steps you need to exit a XML file to change 16mb to 64mb for the framebuffer or else you only have 2K or 2,5K resolution support with 64mb you can go to 4K resolution

and you have to install the KVM proxmox.com Windows VirtIO Driver.. https://pve.proxmox.com/wiki/Windows_VirtIO_Drivers

Announcement

Intel Begins Sorting Out SR-IOV Support For The Xe Kernel Graphics Driver

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment