Announcement

Collapse
No announcement yet.

NVMe "Simple Copy" Offloaded Copy Support Being Prepared For The Linux Kernel

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • NVMe "Simple Copy" Offloaded Copy Support Being Prepared For The Linux Kernel

    Phoronix: NVMe "Simple Copy" Offloaded Copy Support Being Prepared For The Linux Kernel

    One of the NVMe specification additions that was ratified this year is the "simple copy" command that allows for copying multiple contiguous ranges to a single destination. That simple copy operation is offloaded to the SSD controller. The Linux kernel support for NVMe simple copy is now being prepared...

    http://www.phoronix.com/scan.php?pag...ple-Copy-Linux

  • #2
    Wow, this is huge for my application. Reordering append-only tracts without reading them is a huge win.

    Comment


    • #3
      Originally posted by microcode View Post
      Wow, this is huge for my application. Reordering append-only tracts without reading them is a huge win.
      Physically, the blocks are still being read and then written, but the reads&writes do not involve the PCI Express bus, the main memory and the CPU.

      Without NVMe-simple-copy being faster than 7 GB/s (PCIe x4 4.0) it is mostly pointless of course and would be beneficial only in case applications running on the CPU are fully saturating the DDR4 memory bandwidth.

      Comment


      • #4
        Originally posted by atomsymbol View Post

        Physically, the blocks are still being read and then written, but the reads&writes do not involve the PCI Express bus, the main memory and the CPU.

        Without NVMe-simple-copy being faster than 7 GB/s (PCIe x4 4.0) it is mostly pointless of course and would be beneficial only in case applications running on the CPU are fully saturating the DDR4 memory bandwidth.
        what about power consumption? Not having to deal with it on the CPU side should also lead to longer battery lifetime

        Comment


        • #5
          I had heard that copies of data were sometimes optimized away to a single copy on the same disk? (internally by the controller, nothing to do with the OS which otherwise thinks there is more than one copy)

          Is this meant to be similar to cut/paste on the same disk where some inodes are updated instead of an actual transfer? Or actually writes new copies internally?

          ---

          I remember, long ago back in the days before SSDs were mainstream, and when I was on Windows. I used a program called TeraCopy, since if I was copying files from various locations to a new destination (same disk or another disk I can't recall), the performance would slow down to a crawl without that software. Perhaps it was Queue Depth related? Just seemed that simultaneous transfers were being attempted instead of queued up (maybe what happened is what was effectively sequential I/O became random I/O as a result without a transfer queue?)

          It made me a bit paranoid to run multiple copies like that, but perhaps it's a non-issue on SSDs, especially NVMe disks? If anyone is familiar with that experience, is it something Linux can still experience when dealing with HDD storage? And if so is there an equivalent to TeraCopy? It seems UltraCopier is the equivalent, but it's not as transparent to adopt as TeraCopy on Windows did. eg for KDE/Plasma, anything using KIO needs support and anything not using KIO will need the alternative method of copying modified to delegate to UltraCopier (or equivalent queue solution): https://bugs.kde.org/show_bug.cgi?id=161017

          The description here suggests it's mostly an HDD issue due to access latency from stacking concurrent transfers:
          https://bugs.kde.org/show_bug.cgi?id=388291#c7

          Originally posted by atomsymbol View Post
          Without NVMe-simple-copy being faster than 7 GB/s (PCIe x4 4.0) it is mostly pointless of course =
          Embedded devices support NVMe, but sometimes it's PCIe 2.0 or if your lucky 3.0. Sometimes the amount of lanes is only x2 or x1 IIRC. Plenty of PCIe 3.0 PCs too, some that only use x2 lanes for M.2. External USB SSDs come to mind too as some of those are NVMe with a PCIe 3.0 x2 lane M.2 to USB bridge chipset.

          Regarding the 7GB/sec sequential I/O peak on PCIe 4.0, that's hitting the limits for x4 PCIe 4.0 lanes, they could possibly exceed that if they're already bottlenecking there. I don't know too much about the low-level details with NAND and controllers for those disks though to comment further.

          Comment


          • #6
            Originally posted by karolherbst View Post

            what about power consumption? Not having to deal with it on the CPU side should also lead to longer battery lifetime
            I suspect such power saving would be negligible, if at all noticeable.

            Comment


            • #7
              Originally posted by atomsymbol View Post

              Physically, the blocks are still being read and then written, but the reads&writes do not involve the PCI Express bus, the main memory and the CPU.

              Without NVMe-simple-copy being faster than 7 GB/s (PCIe x4 4.0) it is mostly pointless of course and would be beneficial only in case applications running on the CPU are fully saturating the DDR4 memory bandwidth.
              Are you sure it can't be done entirely through remapping, at least some of the time? It's not like your SSDs are exposing the flash directly to you to begin with so I would expect this to affect the indexes/hash tries, but not so much involve rewriting the data itself.

              On second thought I guess what I'm thinking of is a move and not a copy.

              Comment


              • #8
                Originally posted by microcode View Post

                Are you sure it can't be done entirely through remapping, at least some of the time? It's not like your SSDs are exposing the flash directly to you to begin with so I would expect this to affect the indexes/hash tries, but not so much involve rewriting the data itself.

                On second thought I guess what I'm thinking of is a move and not a copy.
                According to a random PDF about NVMe-simple-copy, it seems to me that the granularity of a simple copy command is 512 bytes which is most likely much smaller than the internal physical granularity of an SSD block. Quick Internet search for the term "ssd typical block size" yields "between 256 KB and 4 MB".

                Comment


                • #9
                  Originally posted by atomsymbol View Post

                  According to a random PDF about NVMe-simple-copy, it seems to me that the granularity of a simple copy command is 512 bytes which is most likely much smaller than the internal physical granularity of an SSD block. Quick Internet search for the term "ssd typical block size" yields "between 256 KB and 4 MB".
                  FWIW, 512 byte sectors are a legacy matter. When you are giving a copy command, even if it is 512-byte aligned rather than aligned to the SSD's page size, there is a chance that a complete page is somewhere in the range.

                  Comment


                  • #10
                    Originally posted by microcode View Post
                    FWIW, 512 byte sectors are a legacy matter. When you are giving a copy command, even if it is 512-byte aligned rather than aligned to the SSD's page size, there is a chance that a complete page is somewhere in the range.
                    I don't understand the point of your post.

                    Comment

                    Working...
                    X