Announcement

**Mahboi** · 06 May 2022, 06:39 AM

All the articles I find on SDMA/mesa are about mesa removing it due to crashes and hangs from rx 500 all the way to RDNA2.

Has any of this been put back? Is it even alive?

**You-** · 06 May 2022, 11:33 AM

I am guessing this will be linked to the recently announces Smart Access Memory. Or the Direct access in latest Direct X and used by Consoles.

**Mahboi** · 06 May 2022, 11:44 AM

To be honest it's one of my biggest expectations over AM5: a solid hardware communication pipeline from SSD to GFX.

MS may sell its DirectStorage and all, but the real innovator is the PS5 which basically advertises the end of load times. And the bonuses of a direct SSD/GFX pipeline just keep piling on with faster and faster SSDs and io_uring usage. I once suggested to Godot Engine to consider going for io_uring for almost instant load times, saying it was a great marketing asset to be able to claim making games with zero load times, and I was politely told that it was a poor idea and that the real game changer would be texture streaming.

For games and I imagine a large amount of visual media, the texture/data size in modern 3D apps is absurdly big and the capacity to stream it allows for excellent load speed and progressive worlds. If you want to reach the best potential for data transmission to the GFX, you want direct SSD to GFX, with high SSD speeds, with parellelisation (io_uring queues), and streaming ability of the data.

If AM5 actually brings in clean SSD to GFX, SK Hynix has already promised to bottleneck PCI-E 5 (14Go/s SSD read speed), Jens Axboe can be trusted to make io_uring faster than light, and the texture streaming can definitely be added on the software side.

It's a wet dream but I want to believe in a world of instant gaming with zero loads. If that gets put in place actually, the world will start marking the difference between the era when you had to wait for apps to load and the era where you just click and it all gets loaded pretty much instantly.

**agd5f** · 06 May 2022, 02:21 PM

Originally posted by Mahboi View Post

To be honest it's one of my biggest expectations over AM5: a solid hardware communication pipeline from SSD to GFX.

MS may sell its DirectStorage and all, but the real innovator is the PS5 which basically advertises the end of load times. And the bonuses of a direct SSD/GFX pipeline just keep piling on with faster and faster SSDs and io_uring usage. I once suggested to Godot Engine to consider going for io_uring for almost instant load times, saying it was a great marketing asset to be able to claim making games with zero load times, and I was politely told that it was a poor idea and that the real game changer would be texture streaming.

For games and I imagine a large amount of visual media, the texture/data size in modern 3D apps is absurdly big and the capacity to stream it allows for excellent load speed and progressive worlds. If you want to reach the best potential for data transmission to the GFX, you want direct SSD to GFX, with high SSD speeds, with parellelisation (io_uring queues), and streaming ability of the data.

If AM5 actually brings in clean SSD to GFX, SK Hynix has already promised to bottleneck PCI-E 5 (14Go/s SSD read speed), Jens Axboe can be trusted to make io_uring faster than light, and the texture streaming can definitely be added on the software side.

It's a wet dream but I want to believe in a world of instant gaming with zero loads. If that gets put in place actually, the world will start marking the difference between the era when you had to wait for apps to load and the era where you just click and it all gets loaded pretty much instantly.

There is nothing standing in the way of this today from a hardware perspective. You just need to stream data directly from the nvme to the GPU's vram rather than taking a trip through system memory. AMD did this years ago when we build GPUs with nvme on the GPU board, unfortunately, at the time there was not much interest in the industry. There are 3 major blockers from a Linux perspective:
1. Lack of a API at the OpenGL/Vulkan level to make it easy for applications to take advantage of this.
2. General reluctance to use peer to peer DMA more widely at the kernel level. Part of this is due to the fact that the PCI spec doesn't address peer to peer DMA or make any claims about whether it should work or not or provide a way for the platform to determine whether it works, coupled with the fact that it doesn't work on every platform due to hardware limitations. It does work on all AMD Zen CPUs. It generally works on all recent Intel CPUs, except for some cases where devices cross certain root ports. Beyond that, it's less clear.
3. Lack of a peer to peer DMA and fencing API at the kernel level that both nvme and GPU drivers support.

The story was not much better on the windows side until recently due to 2.

**Linuxxx** · 06 May 2022, 05:23 PM

Originally posted by agd5f View Post

It does work on all AMD Zen CPUs. It generally works on all recent Intel CPUs, except for some cases where devices cross certain root ports. Beyond that, it's less clear.

Could You please elaborate on this?

What exactly is the difference between AMD & Intel CPUs in regards to ReBAR?

**Linuxxx** · 06 May 2022, 05:39 PM

Originally posted by agd5f View Post

There is nothing standing in the way of this today from a hardware perspective. You just need to stream data directly from the nvme to the GPU's vram rather than taking a trip through system memory. AMD did this years ago when we build GPUs with nvme on the GPU board, unfortunately, at the time there was not much interest in the industry. There are 3 major blockers from a Linux perspective:
1. Lack of a API at the OpenGL/Vulkan level to make it easy for applications to take advantage of this.
2. General reluctance to use peer to peer DMA more widely at the kernel level. Part of this is due to the fact that the PCI spec doesn't address peer to peer DMA or make any claims about whether it should work or not or provide a way for the platform to determine whether it works, coupled with the fact that it doesn't work on every platform due to hardware limitations. It does work on all AMD Zen CPUs. It generally works on all recent Intel CPUs, except for some cases where devices cross certain root ports. Beyond that, it's less clear.
3. Lack of a peer to peer DMA and fencing API at the kernel level that both nvme and GPU drivers support.

The story was not much better on the windows side until recently due to 2.

Alright, scratch my question!

Just realized this isn't about ReBAR at all.

Sorry, watching a talk show at the same time...

**coder** · 07 May 2022, 04:47 AM

Originally posted by Mahboi View Post

If you want to reach the best potential for data transmission to the GFX, you want direct SSD to GFX, with high SSD speeds, with parellelisation (io_uring queues), and streaming ability of the data.

Think about what you're saying. io_uring isn't magic. It simply allows multiple requests to be submitted to the kernel without a syscall for each. However, with io_uring doing reads from a SSD and writing into mapped GPU memory, you're still having the CPU do PIO copies. And that's a bottleneck, at worst, and a waste of CPU time at best. Even though the CPU time isn't in userspace, it still eats into the same total budget of CPU time. It's just that you have a kernel thread doing the blocking I/O instead of a userspace one.

What you really want is to have the GPU do DMA transfers from the SSD. You don't need io_uring for this, because graphics APIs already have async command queues for submitting operations to the GPU.

Now, I'm not saying that's the best solution for asset streaming or eliminating load times. I'm just picking on this narrow optimization you've highlighted. I have enough experience optimizing system performance to know that the proper approach would be guided by careful measurement and analysis of the actual bottlenecks.

Originally posted by Mahboi View Post

SK Hynix has already promised to bottleneck PCI-E 5 (14Go/s SSD read speed),

Did they announce anything for consumers? I've seen enterprise PCIe 5 SSDs, but good consumer offerings could yet be a ways off.

Originally posted by Mahboi View Post

Jens Axboe can be trusted to make io_uring faster than light,

Understand what he's doing: optimizing throughput via io_uring. His patches don't only apply to io_uring. In fact, I'd guess the minority of his recent optimizations are in io_uring, itself. It's just that it helps remove the previous limiting factor on single-thread IOPS, revealing more optimization opportunities that he's been targeting.

Originally posted by Mahboi View Post

It's a wet dream but I want to believe in a world of instant gaming with zero loads.

The best way to solve a problem is through detailed understanding. You don't usually get very far by simply mashing together different buzzwords.

**coder** · 07 May 2022, 04:50 AM

Originally posted by agd5f View Post

Part of this is due to the fact that the PCI spec doesn't address peer to peer DMA or make any claims about whether it should work or not or provide a way for the platform to determine whether it works, coupled with the fact that it doesn't work on every platform due to hardware limitations.

Hopefully, CXL has straightened out this mess. That said, it's a wide open question if/when CXL will reach consumer platforms.

**skeevy420** · 07 May 2022, 08:44 AM

Originally posted by agd5f View Post

There is nothing standing in the way of this today from a hardware perspective. You just need to stream data directly from the nvme to the GPU's vram rather than taking a trip through system memory. AMD did this years ago when we build GPUs with nvme on the GPU board, unfortunately, at the time there was not much interest in the industry. There are 3 major blockers from a Linux perspective:
1. Lack of a API at the OpenGL/Vulkan level to make it easy for applications to take advantage of this.
2. General reluctance to use peer to peer DMA more widely at the kernel level. Part of this is due to the fact that the PCI spec doesn't address peer to peer DMA or make any claims about whether it should work or not or provide a way for the platform to determine whether it works, coupled with the fact that it doesn't work on every platform due to hardware limitations. It does work on all AMD Zen CPUs. It generally works on all recent Intel CPUs, except for some cases where devices cross certain root ports. Beyond that, it's less clear.
3. Lack of a peer to peer DMA and fencing API at the kernel level that both nvme and GPU drivers support.

The story was not much better on the windows side until recently due to 2.

I'm hoping for 1. that MS (and/or Valve) will port the DirectStorage API to Mesa/Vulkan and after that 2. and 3. will follow for those with supported hardware. From my Layman POV it makes sense to toss it into Vulkan since nearly every graphics standard has a To Vulkan conversion layer.

Announcement

AMDGPU Linux Driver Enabling New "LSDMA" Block

AMDGPU Linux Driver Enabling New "LSDMA" Block

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment