Announcement

Collapse
No announcement yet.

PCI Peer-To-Peer Support Merged For Linux 4.20~5.0

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • PCI Peer-To-Peer Support Merged For Linux 4.20~5.0

    Phoronix: PCI Peer-To-Peer Support Merged For Linux 4.20~5.0

    The recently covered PCI peer-to-peer memory support for the Linux kernel has indeed landed for the 4.20~5.0 kernel cycle. This is about PCI Express devices supporting peer-to-peer DMA that can bypass the system memory and processor via a standardized interface...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Interesting. But usually, half the point is you want to twiddle the bits going from one device to another.
    Atleast pushing them into some new form of structure.

    The usecase for pushing fast ethernet data streams to NVMe seems.. not logical.
    What would you do without Linux VFS + fs? Raw LBA storage only?

    Comment


    • #3
      Seems like something that could've been especially useful 15 years ago, back when PCIe bandwidth was much more limiting and with added latency due to external northbridges.

      I'm not saying it isn't useful now, because of course it is. I'm just surprised it took so long.

      Comment


      • #4
        Originally posted by milkylainen View Post
        Interesting. But usually, half the point is you want to twiddle the bits going from one device to another.
        Atleast pushing them into some new form of structure.

        The usecase for pushing fast ethernet data streams to NVMe seems.. not logical.
        What would you do without Linux VFS + fs? Raw LBA storage only?
        Maybe for some industrial use-cases requiring best possible performance...

        Once/if a contiguous block have been allocated on the device by the FS layer it might be possible to pipe data through PCI to NVMe..

        Comment


        • #5
          Originally posted by milkylainen View Post
          What would you do without Linux VFS + fs? Raw LBA storage only?
          In short, yes. NVMe is at the end block storage, although it also wants to be byte addressable. So NVMe over fabric is then also presented as a nvme device, but remote, not local.

          Getting to single namespace shared filesystem from here is still a long way ... of course if that is where you want to go.

          Comment


          • #6
            Originally posted by schmidtbag View Post
            Seems like something that could've been especially useful 15 years ago, back when PCIe bandwidth was much more limiting and with added latency due to external northbridges.

            I'm not saying it isn't useful now, because of course it is. I'm just surprised it took so long.
            It's only relatively recently that pcie chipsets supported peer to peer transactions reliably. Some still don't. One of the biggest problems is detecting whether peer to peer transactions are reliable on the platform. There are not standard caps to figure this out.

            Comment


            • #7
              Originally posted by wagaf View Post

              Maybe for some industrial use-cases requiring best possible performance...

              Once/if a contiguous block have been allocated on the device by the FS layer it might be possible to pipe data through PCI to NVMe..
              Sure. Still think it's a stretch. You'd need another signalling interface to tell VFS that you are starting new files etc, new data.. etc.
              Then you need the block vector list back so you can target something with data.

              To me it looks cumbersome. Data does not necessarily have to go to RAM.
              Why can't PCI-devices I/O stash it to L2?
              L2 has more than enough storage on modern CPUs for NVMe transactional data and fast enough.
              That would be a more proper function. A transaction memory hierarchy target flag?
              Device DMA can then pick it up from virtual L2 addressing from the internal busses?
              A lot of embedded devices do have I/O stashing functions on devices to closer-than-RAM storage.
              Simply because the total latency of the memory backend is substandard compared to x86.

              Intel I/O AT?
              "Direct Cache Access (DCA) allows a capable I/O device, such as a network controller, to place data directly into CPU cache, reducing cache misses and improving application response times."

              Comment


              • #8
                Originally posted by agd5f View Post
                It's only relatively recently that pcie chipsets supported peer to peer transactions reliably. Some still don't. One of the biggest problems is detecting whether peer to peer transactions are reliable on the platform. There are not standard caps to figure this out.
                Ah, I wasn't aware there was actually a hardware issue holding this back.
                Couldn't a standard cap just simply be "PCIe 4.0 and newer"? Maybe the PCIe 4.0 spec could require P2P compatibility (or, maybe 4.1).

                Comment


                • #9
                  Originally posted by schmidtbag View Post
                  Seems like something that could've been especially useful 15 years ago, back when PCIe bandwidth was much more limiting and with added latency due to external northbridges.
                  The Northbridge latency is nothing compared to the CPU having to take an interrupt, schedule a kernel thread, and then probably even a userspace thread to handle the data.

                  Originally posted by schmidtbag View Post
                  I'm not saying it isn't useful now, because of course it is. I'm just surprised it took so long.
                  20 years ago, I know high end video editing systems would do things like DMA'ing directly from RAID controllers to DSP boards.

                  But that was really about working around PCI's bandwidth limits. In this case, the only bandwidth saved is to/from memory. Because PCIe isn't a true bus, the amount of PCIe traffic is the same.

                  Comment


                  • #10
                    Originally posted by agd5f View Post
                    It's only relatively recently that pcie chipsets supported peer to peer transactions reliably. Some still don't.
                    Was the PCIe spec unclear about this, or just nobody cared about testing it until recently?

                    Comment

                    Working...
                    X