Announcement

**oiaohm** · 29 November 2021, 08:12 PM

Originally posted by coder View Post

Then whatever you read about the original point, you also misinterpreted. I guarantee nobody put "using Vulkan Descriptors to allocate storage on a NVMe SSD" on the original Vulkan roadmap.

Sorry to say you are wrong. Not all NVMe SSD are created equal. The roadmap when you read it mentions a ramdrive NVMe SSD as a item they have. I wish I knew who made that bugger. The problem here is you are presume something would not be on the vulkan roadmap because of your incorrect presume that NVMe SSD has like flash.

Originally posted by coder View Post

No, it doesn't. You don't use NVDIMMs like DRAM, period.

NVDIMMs – A Perfect Blend of Memory and Storage | Synopsys Blog

https://blogs.synopsys.com/vip-central/2019/09/10/nvdimms-a-perfect-blend-of-memory-and-storage/

Explore the seamless integration of NVDIMMs, blending memory and storage for optimal system performance. Discover how this technology revolutionizes computing.

I really should have pulled you up better on this. The reality this is not true. The original NVDIMM are a hybrid of dram and flash where all operations happened in ram first. Yes NVMe SSD that are a hybrid of ram and flash do exist with the same layout. These are devices that can give you absolute max io and are basically your old school ramdrives.

NVDIMM-N is horrible that its been used twice. Yes ramdrive NVMe SSD this most likely will be superseded by CXL but you have to remember when Vulkan first started CXL did not exist yet but ramdrive device are around.

Originally posted by coder View Post

So far, BMC's use their own dedicated DRAM.

Except this has not always been true. Intel attempted at one point to use the ME(Management Engine) as the BMC(base band management controller) and that was in fact using block of memory taken out your main system ram. Turned out not to be highly power effective having to keep all ram powered up. Maybe this will be revised with ddr5 and it decanted per module power control. Yes this did also cause other problem like if you had not powered system off at wall and you went to change ram you were doing this on a live power bus.

Base band management engines are a different problem to a GPU.

Originally posted by coder View Post

That makes about as much sense as building a CPU without L1/L2 cache, "... because it would just get swapped out when there's a context-switch". The fact that nobody does it should tell you something. The cost of evicting it after the context-switch turns out to be a lot less than the impact of not having the cache

No vram and caches are different problems here. It caches you need to look at to understand why vram could go away. Take AMD most recent GPUs they have a L3 cache. Yes this L3 Vcache will have more ram than your early GPUs. The reality is vram in most cases now is too slow to keep up with the GPU any how.

vramless is not cache less. GPU that cacheless is not going to work. Think about it you can have a GPU with 16G of vram to context switch out or only like 192Meg+ L3. Its possible we will end up with GPU with 1G of ram in GPU L3 cache. Yes these L3 caches on GPU will be larger than you min usable integrated GPU memory allocation from host.

Yes the reason why vram is problem for virtual machines is one of the reasons why using the Intel ME(Management Engine) did not end up ideal as a BMC. Yes being in the same memory controller security faults in memory allowed attackers to move from CPU instance memory to ME memory leading to some fun security issues. Yes BMC having own decanted memory today is security mitigation. Yes it the same thing you remove vram from your GPU so that you don't need clearing all the vram every time virtual machine gets swapped.

If GPU were staying with only L1 and L2 caches going vram less was not really option. GPU getting L3 caches does change the ball game a lot. Yes if you are needing to add a L3 cache because the vram is already too slow to keep up with GPU what is using memory out of CXL that a little slower going to do to performance. There are workload where its going to make bugger all difference being vramless or having vram for virtual machine performance in the active running state because once the cache of the GPU is populated there is no major back and forth. Of course it will make a major difference when it comes suspend and restore of those virtual machines.

This is a case that more is not always better. Yes if I was talking about a cache less GPU that would be stupid. vramless GPU is not cache-less. Yes a vramless GPU could still end up having 4Gb of in card storage in L3.

High Bandwidth Memory that we have seen in GPU so far have been designed on the idea of stacking next to the GPU and connected in as vram. Yes AMD ram stacking that connects to L3 in their CPUs kind of says you don't have to connect in by the vram interface instead quite a bit can be connected to the cache. So what happens with a vramless GPU is the caches expand and vram disappears because the GPU cache is big enough for the lighter workloads and removing the vram removed the need to sync it out when change instances for security reasons. So it use items like CXL when the GPU cache is not big enough.

**coder** · 30 November 2021, 06:22 AM

Originally posted by oiaohm View Post

Sorry to say you are wrong. Not all NVMe SSD are created equal. The roadmap when you read it mentions a ramdrive NVMe SSD as a item they have.

First, you reference a specific doc. You ought to be able to find it, so we can see exactly what was said.

Second, a "ramdrive NVMe SSD" implies a device consisting of RAM that holds a filesystem. Binding Vulkan runtime state to filesystem structures just doesn't make sense, because there's too much indirection and violates all kinds of abstraction boundaries. Doing so wouldn't make enough practical sense to justify adding all that complexity to the Vulkan runtime.

Originally posted by oiaohm View Post

https://blogs.synopsys.com/vip-centr...y-and-storage/
I really should have pulled you up better on this. The reality this is not true. The original NVDIMM are a hybrid of dram and flash where all operations happened in ram first.

That's really twisting the definition of what we were talking about. That's simply a DRAM DIMM that happens to be flash-backed. The only case where you'd use something like that is to be resilient against power failure.

It's really not why people use NVDIMMs in the main, which is to eliminate PCIe overhead for accessing non-volatile memory. They can be used just like SSDs, but also for things like in-memory databases.

Originally posted by oiaohm View Post

NVDIMM-N is horrible that its been used twice. Yes ramdrive NVMe SSD this most likely will be superseded by CXL but you have to remember when Vulkan first started CXL did not exist yet but ramdrive device are around.

Putting a bunch of RAM on a plug-in card is not a new idea, but it only qualifies as a SSD if you're actually putting a filesystem on it. In fact, NVMe itself is overkill for just a PCIe card with a bunch of RAM.

Originally posted by oiaohm View Post

Base band management engines are a different problem to a GPU.

It's almost surprising to see you point that out, because your usual MO is to shift your position until you find something defensible that kinda resembles your original claims. So, if we're not talking about BMCs, then we're back to a point where a GPU without dedicated RAM makes no sense.

Originally posted by oiaohm View Post

The reality is vram in most cases now is too slow to keep up with the GPU any how.

Their Infinity only has about the same bandwidth to the compute die as Radeon VII, even though the biggest RDNA2 compute dies are much more powerful.

Originally posted by oiaohm View Post

vramless is not cache less. GPU that cacheless is not going to work.

The amount of cache in AMD's RDNA2 GPUs is carefully sized to the bandwidth needs, and the ratio between the compute side and DDR6 side is only 2:1. If you make that something like 10:1, you're going to need so much "cache" that we're basically back to putting a bunch of HBM2 in-package.

And here we go: you make some outrageous claim and then backtrack and twist your way into some position that bares passing resemblance to your original statement but isn't meaningfully the same.

Originally posted by oiaohm View Post

you remove vram from your GPU so that you don't need clearing all the vram every time virtual machine gets swapped.

Except nobody does that. You're inventing a problem to fit your solution, rather than looking at what solutions people actually use.

Originally posted by oiaohm View Post

This is a case that more is not always better. Yes if I was talking about a cache less GPU that would be stupid. vramless GPU is not cache-less. Yes a vramless GPU could still end up having 4Gb of in card storage in L3.

High Bandwidth Memory that we have seen in GPU so far have been designed on the idea of stacking next to the GPU and connected in as vram. Yes AMD ram stacking that connects to L3 in their CPUs kind of says you don't have to connect in by the vram interface instead quite a bit can be connected to the cache.

You're just playing semantic games. Once you put GB of memory in a GPU, it's no longer "vramless".

And for your information, AMD already had the concept of treating GPU memory as a cache, as far back as VEGA. And, in fact, I already said that instead of context-switching the entire GPU, the obvious solution is just to page-in what you need.

AMD's HBCC for you and me - PC Perspective

https://pcper.com/2017/08/amds-hbcc-for-you-and-me/

AMD's HBCC for you and me Techgage has posted a look at what AMD's new HBCC feature in Vega is and how it will help you run games faster. HBCC

**oiaohm** · 01 December 2021, 08:46 AM

Originally posted by coder View Post

Second, a "ramdrive NVMe SSD" implies a device consisting of RAM that holds a filesystem.

ramdrive does not have to hold a file system. File system is something that is made on top of a block device.

NVIDIA GPUDirect Storage Installation and Troubleshooting Guide - NVIDIA Docs

https://docs.nvidia.com/gpudirect-storage/troubleshooting-guide/index.html

This guide describes how to install, debug, and isolate the performance and functional problems that are related to GDS and is intended for systems administrators and developers.

Yes when you look at gpudirect with Nvidia and file systems you will find it only works on filesystems that support direct I/O as in being able to say these files blocks are not be changed by the OS while the GPU/s is using them.

ANS-9010 5.25 inch Dynamic SSD 2 x SATA RAM Disk (Ram Modules Not Included) • $299.99

https://picclick.com/ANS-9010-525-inch-Dynamic-SSD-2-x-SATA-262482593770.html

ANS-9010 5.25 INCH Dynamic SSD 2 x SATA RAM Disk (Ram Modules Not Included) - $299.99. FOR SALE! RAM Modules & CF Not Included! RAM Disk supports unbuffer and non-registered, ECC / non-ECC DDR2 DIMM Overview ACARD Dynamic SSD RAM Disk made from DRAM memory modules, much faster than the traditional hard drives and Static SSD. With the speedy random access rate and shorter access time, RAM Disk 262482593770

A ramdrive using NVMe SSD protocol is no more stupid than the historic ram drive SSD using Sata.

Originally posted by coder View Post

It's really not why people use NVDIMMs in the main, which is to eliminate PCIe overhead for accessing non-volatile memory. They can be used just like SSDs, but also for things like in-memory databases.

Except when you are coming from a GPU where to get to main memory in the system you have have the PCIe overhead and then you are fighting with the CPU usage of the memory as well.

Originally posted by coder View Post

Putting a bunch of RAM on a plug-in card is not a new idea, but it only qualifies as a SSD if you're actually putting a filesystem on it. In fact, NVMe itself is overkill for just a PCIe card with a bunch of RAM.

SSD is a block device. When using items like orcale db you will use raw device mode as in no file system at times to get performance. There are a lot of high end use cases where SSDs are in systems and there are no file system on them.

It's almost surprising to see you point that out, because your usual MO is to shift your position until you find something defensible that kinda resembles your original claims. So, if we're not talking about BMCs, then we're back to a point where a GPU without dedicated RAM makes no sense.

Originally posted by coder View Post

You're just playing semantic games. Once you put GB of memory in a GPU, it's no longer "vramless".

And for your information, AMD already had the concept of treating GPU memory as a cache, as far back as VEGA. And, in fact, I already said that instead of context-switching the entire GPU, the obvious solution is just to page-in what you need.

https://pcper.com/2017/08/amds-hbcc-for-you-and-me/

No you need to go and read that hbcc closer. L1 L2 and L3 in the GPU die is not classed as vram. Vram is off GPU die memory. HBM2 is off die memory.

AMD Infinity Cache Explained : L3 Cache Comes To The GPU! | Tech ARP

https://www.techarp.com/computer/amd-infinity-cache-explained/

Infinity Cache is a brand new feature being introduced in the new Radeon RX 6000 Series graphics cards. Find out what Infinity Cache does!

Please note its only newer GPUs we are seeing with L3. Lot of your older GPUs had L1 and L2 but had to use the vram as L3 yes the 2017 example you pulled up no die L3. Yes the important L3 in GPU dia that allows to to transfer information between all your GPU Compute Units to be faster. This means your newer GPUs design vram is going to come L4.

L4 level caching does not need to be anywhere near as high performance as L3 because its not doing time sensitive operations between compute modules in the GPU die.

The over head of going across pcie to Nvme/CXL ram storage for L4 may not be as problem at all for performance as long as the on GPU die L3 is big enough..

Reality here how big does the on die L3 have to be to cover the over head problems of crossing PCIe using Nvme or CXL protocols to a ram based storage. Remember if what was stored vram on card is stored out on CXL with CXL security comes into play with virtual machines so you don't need to copy this when you change virtual machines. You only need to worry about the L3,L2 and L1 contents and state.

Of course if you are using CXL/Nvme protocols todo the L4 you can also be using this L4 for clustering GPUs up.

coder like it or not I am not playing semantic games. You are playing semantic games like claiming nvme has to have a file system when this is absolutely false.

The reality is we should expect vramless GPUs at some point. How important having vram on a GPU Card is happens to be massive reduced by adding a L3 on the GPU die. If you don't need the vram/L4 on the GPU card and you can have it instead by CXL or nvme protocols comes possible and this provides advantage that now you L4 can be shared between many GPUs.

Yes the hbcc caching in exactly how does this not suite a block transfer protocol. Note you said paging in. You page in from swapfiles and swapdisks with CPUs right. Yes the 2017 hbcc it was not possible to remove vram because there was no L3 but we are not in 2017 any more.

Yes the introduction of a L3 on die to GPU was kind of forecast. With L3 on GPU die was more of a question when would the GPU speed exceed the speed external vram could feed it or if keeping that speed started to come too far cost not effective so this was a when not if question for a very long time and the answer now is 2020. Once you have GPU dies with L3 that does really start throwing open the door to questions like vramless GPU cards because one of the reasons(sharing data between Compute Units) for vram need to be on the GPU cards goes way with the introduction of L3 on the GPU die.

**coder** · 01 December 2021, 11:04 AM

Originally posted by oiaohm View Post

ramdrive does not have to hold a file system. File system is something that is made on top of a block device.

But, for your original point to be relevant, we can't avoid the issues posed by a filesystem. Indeed, that's what GPU Direct deals with, and why it's limited to simply avoiding the "host-memory-bounce" in data transfers.

Originally posted by oiaohm View Post

A ramdrive using NVMe SSD protocol is no more stupid than the historic ram drive SSD using Sata.

It's not comparable. Before NVMe, most SSDs used SATA because that's the only standard there was. And early SSDs actually weren't bottlenecked by it! I bought one of the first SSDs that supported SATA 3, and it couldn't even saturate the link!

Before NVMe, a few PCIe SSDs used their own proprietary protocol, but they remained fairly niche.

Now, as for the idea that a RAM card should use NVMe, the reason that's silly is that NVMe is designed for higher-latency, block-based storage. And putting RAM on an add-in-card is something people have been doing for a long, long time. So, there's no particularly good reason to chain yourself to NVMe, especially when most RAM card vendors probably had been using direct-mapped I/O or something else, since long before NVMe came along.

This whole point is a silly tangent, which I know is your objective. You try to make up with quantity what you lack in quality, and side-track the exchange so we don't focus on the fact that your original claims were nonsense and still yet to be supported by you.

Originally posted by oiaohm View Post

No you need to go and read that hbcc closer.

No, HBCC does what I said. It lets you page in what you need, which supports virtualization in a natural way, rather than how you initially claimed it worked.

Originally posted by oiaohm View Post

The reality is we should expect vramless GPUs at some point.

You shifted your position so much, that your "vramless GPU" not only has VRAM, but it already exists today!

Sadly, this is typical, for you. If you simply wouldn't make ignorant claims that you can't backup, then you wouldn't need to fill page after page with these walls of text to try and keep from admitting you were talking out of your ass. Unfortunately for us, I think you actually enjoy trying to dig yourself out of such metaphorical holes.

**oiaohm** · 01 December 2021, 09:25 PM

Originally posted by coder View Post

But, for your original point to be relevant, we can't avoid the issues posed by a filesystem. Indeed, that's what GPU Direct deals with, and why it's limited to simply avoiding the "host-memory-bounce" in data transfers.

Except this is not the only Magnum IO usage of GPU Direct. One of the links I pointed to was using the GPU Direct as way to host sync information between individual GPU units avoiding the host memory issue. Of course using this has a habit of kill you flash based NVME dead from too many write cycles. This leads to ram based NVME being custom made for different super computers. Yes CXL does not exist yet.

Originally posted by coder View Post

It's not comparable. Before NVMe, most SSDs used SATA because that's the only standard there was. And early SSDs actually weren't bottlenecked by it! I bought one of the first SSDs that supported SATA 3, and it couldn't even saturate the link!

Second sentence here is wrong. Ramdrive based SSD are in fact bottlenecked by SATA3. The type of SSD that could 100% saturate the SATA3 and SAS links are the ramdrive/based. You point about being the only standard is important.

Originally posted by coder View Post

Now, as for the idea that a RAM card should use NVMe, the reason that's silly is that NVMe is designed for higher-latency, block-based storage. And putting RAM on an add-in-card is something people have been doing for a long, long time. So, there's no particularly good reason to chain yourself to NVMe, especially when most RAM card vendors probably had been using direct-mapped I/O or something else, since long before NVMe came along.

Please pay attention notice how many my links go to Nvidia. You put ram on a an add-in card attempt to use direct with a Nvidia GPU it is not going to work if it does not provide NVMe protocol. It does not matter how silly you think NVMe protocol is for task because Nvidia says so. Ram based NVMe also can max out NVMe transfer speeds just like the rambased Sata and SAS drives did. At this point in time we have exactly 1 protocol. With CXL we can end up with exactly 2 protocols.

Originally posted by coder View Post

No, HBCC does what I said. It lets you page in what you need, which supports virtualization in a natural way, rather than how you initially claimed it worked.

Paging in the memory is not enough to support virtualisation.

Originally posted by coder View Post

You shifted your position so much, that your "vramless GPU" not only has VRAM, but it already exists today!

No I have not. Lets take a MxGPU or GVT-g or Vgpu. all those have extra security data in the vram that is used when request to page something into the L2 of old AMD and Nvidia cards and L3 of the Newer AMD cards. Notice something the security is done when you transfer from outside die to into die.

You have been stupid tunnel visioned here. L1, L2 and L3 cache in the GPU die is not all vmem there are missing features when you get to virtualisation. Yes consumer cards are sold with out MxGPU and Vgpu because 1 market segmentation 2 the security on the vmem has a performance cost. So if you are sharing a GPU between multi virtual machines do you really want to be processing security every time you transfer into L3 to L2 or would not not want to do this between L4 and L3. Remember early PC only had a L1 and 12 at best when we added to PC cpus L3 we did not go and rename mmu/main memory right.

Vmem is your GPU form of main memory and is meant to be the full feature memory.

LX number is the level of cache. Vmem location makes it a particular level of cache. But Vmem does not just do cache.

The reality is a in die L3 in the modern AMD GPU is not a vmem it does not have the security features. Yes those security features include what memory is accessible by DMA over PCIe.

CXL has the security implemented in CXL protocol.

Yes you vramless gpus will look a lot like your current desktop GPUs just with everything in the die but these are not the ones you use in virtual machine setups as they are missing MxGPU and Vgpu in most cases.

There is one big difference between a AMD L3 in die cache and the old Vram being used as cache/l3. Vram has a MMU the L3 in die does not have a MMU.

vramless gpus could coming could also be-called mmuless GPUs.

Another thing to remember a GPU memory size on the specification sheet states the vram size not the on die cache memory size as well. So a GPU with a L3 of 4G on die but as a vramless on specification sheet would have a GPU memory size of 0. So yes people who see a specification sheet of a vramless gpu prototype think that there has to be a error because a GPU cannot work with zero memory right.

Reality vram and in die cache are written independently on the specification sheets as well.

Originally posted by coder View Post

Sadly, this is typical, for you. If you simply wouldn't make ignorant claims that you can't backup, then you wouldn't need to fill page after page with these walls of text to try and keep from admitting you were talking out of your ass. Unfortunately for us, I think you actually enjoy trying to dig yourself out of such metaphorical holes.

Except this time you have dug yourself into a hole you did not understand the field I am talking about at all.

Announcement

RADV Vulkan Driver Finally Adds VK_KHR_synchronization2 Support

Comment

Comment

Comment

Comment

Comment