RDMA Changes For Linux 5.12 Add DMA-BUF Support For Peer-To-Peer Transfers With GPUs

mppix replied

23 February 2021, 02:01 PM
Originally posted by coder View Post

The CPU is still what boots, runs the OS & drivers, manages the memory space, manages security, and basically coordinates things at a high-level. It's still the boss, even if it's not an intermediary of every individual transaction. These devices don't know about UIDs, perms, or userspace processes, but the CPU is what's responsible for extending those constructs so that device-to-device transactions remain private and secure.

If you want to view the inside of a computer more as a network, where there are multiple equal peers, then I think blade servers are probably more along those lines. I believe at least some of them use Ethernet as a backplane.

Yes, of course. The CPU remains the main hardware that runs the system. However, they become more like a referee from a computation perspective and don't risk to be the bottleneck.

Blade servers and supercomputers can use both Ethernet or more commonly infiniband with direct memory access. In most cases, you still have a central access/management node(s). It certainly also a distributed computation environment at a higher level that would seem complementary.
Likes 1
Leave a comment:
coder replied

23 February 2021, 02:53 AM
Originally posted by mppix View Post

Sounds exciting. This, pci4/5 (..6) smart access memory, oneAPI and the like may really start a new compute model where the CPU is only one of many compute nodes in a system.

The CPU is still what boots, runs the OS & drivers, manages the memory space, manages security, and basically coordinates things at a high-level. It's still the boss, even if it's not an intermediary of every individual transaction. These devices don't know about UIDs, perms, or userspace processes, but the CPU is what's responsible for extending those constructs so that device-to-device transactions remain private and secure.

If you want to view the inside of a computer more as a network, where there are multiple equal peers, then I think blade servers are probably more along those lines. I believe at least some of them use Ethernet as a backplane.
Likes 1
Leave a comment:
You- replied

22 February 2021, 08:41 PM
Originally posted by agd5f View Post

FWIW, AMD did the GPU side of this (i.e., added peer to peer support to dma-buf in general).

Cool, with cooperation there is less chance that one party wants to take its toys home (or rather, not bring them out to play).

Thanks for correcting me on this.
Likes 1
Leave a comment:
agd5f replied

22 February 2021, 05:55 PM
Also, HMM doesn't currently handle Peer to Peer access.
Likes 1
Leave a comment:
mppix replied

22 February 2021, 05:53 PM
Originally posted by You- View Post

This is interesting and just goes to show how much of a behemoth intel is in relation to opensource.
I would suggest this is also useful to AMD and something they could have worked on, but by having a smaller team, a lot of this type of work is done by intel.

Early AMD exascale supercomuter reports suggests that they are doing something like infinity fabric over pcie interfaces. This may come down to a very similar concept..
Likes 1
Leave a comment:
agd5f replied

22 February 2021, 05:53 PM
Originally posted by You- View Post

This is interesting and just goes to show how much of a behemoth intel is in relation to opensource.

I would suggest this is also useful to AMD and something they could have worked on, but by having a smaller team, a lot of this type of work is done by intel.

FWIW, AMD did the GPU side of this (i.e., added peer to peer support to dma-buf in general).

Last edited by agd5f; 22 February 2021, 06:08 PM.
Likes 4
Leave a comment:
You- replied

22 February 2021, 05:34 PM
This is interesting and just goes to show how much of a behemoth intel is in relation to opensource.

I would suggest this is also useful to AMD and something they could have worked on, but by having a smaller team, a lot of this type of work is done by intel.

The sales and marketing side of intel may sometimes be evil, but its OSS Side is amazing.
Likes 3
Leave a comment:
tildearrow replied

22 February 2021, 05:20 PM
RDMA just makes me think of RDNA...

(even though I know they are separate things)
Likes 2
Leave a comment:
mppix replied

22 February 2021, 04:20 PM
Sounds exciting. This, pci4/5 (..6) smart access memory, oneAPI and the like may really start a new compute model where the CPU is only one of many compute nodes in a system.
Likes 2
Leave a comment:
agd5f replied

22 February 2021, 04:18 PM
Originally posted by pegasus View Post

So this is now the third implementation solving the same problem? As far as I know Mellanox was the first with GPUdirect. Seems like a good scenario for Linus to share some sharp wisdom with the world

GPUDirect/PeerDirect is not upstream.
Likes 2
Leave a comment:

Announcement

RDMA Changes For Linux 5.12 Add DMA-BUF Support For Peer-To-Peer Transfers With GPUs

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: