Announcement

Collapse
No announcement yet.

NVMe "Simple Copy" Offloaded Copy Support Being Prepared For The Linux Kernel

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • microcode
    replied
    Originally posted by atomsymbol View Post

    Just a note: There exists for example the copy_file_range() syscall in Linux, but in reality most file copy operations are using the read&write syscalls. This basically means that NVMe-simple-copy won't be beneficial to most kinds of real-world file copy operations. Running "strace /bin/cp file1 file2" yields:

    Code:
    openat() = 3
    fstat(3) = 0
    openat() = 4
    fstat(4) = 0
    fadvise64(3) = 0
    read(3, 131072) = 8477 <<<<
    write(4, 8477) = 8477 <<<<
    read(3, 131072) = 0
    fchmod(4) = 0
    flistxattr(3) = 0
    flistxattr(3) = 0
    fchmod(4, 0400) = 0
    fgetxattr(3) = -1 ENODATA
    fstat(3) = 0
    fsetxattr(4) = 0
    close(4) = 0
    close(3) = 0
    which uses the read&write syscalls to copy the file.

    In summary: The real-world impact of NVMe-simple-copy outside of a small number of special cases is (currently) very limited.
    Well yes, I'm talking about my application which has its own on-disk format.

    Leave a comment:


  • atomsymbol
    replied
    Originally posted by microcode View Post
    That's a pretty bold claim, I don't think the breakeven would be that high, if the pagetable mechanism were designed with this in mind.
    Just a note: There exists for example the copy_file_range() syscall in Linux, but in reality most file copy operations are using the read&write syscalls. This basically means that NVMe-simple-copy won't be beneficial to most kinds of real-world file copy operations. Running "strace /bin/cp file1 file2" yields:

    Code:
    openat() = 3
    fstat(3) = 0
    openat() = 4
    fstat(4) = 0
    fadvise64(3) = 0
    read(3, 131072) = 8477   <<<<
    write(4, 8477) = 8477   <<<<
    read(3, 131072) = 0
    fchmod(4) = 0
    flistxattr(3) = 0
    flistxattr(3) = 0
    fchmod(4, 0400) = 0
    fgetxattr(3) = -1 ENODATA
    fstat(3) = 0
    fsetxattr(4) = 0
    close(4) = 0
    close(3) = 0
    which uses the read&write syscalls to copy the file.

    In summary: The real-world impact of NVMe-simple-copy outside of a small number of special cases is (currently) very limited.

    Leave a comment:


  • microcode
    replied
    Originally posted by atomsymbol View Post
    Yes. But if the pagetables aren't stored in some special kind of area on the SSD then any modification to the page table will result in writing a whole SSD block (SSD block size is from 256 KB to 4 MB). If NVMe-simple-copy is moving less than 256KB-4MB of data (such as: less than 4 MB per second) it does not matter from performance perspective whether it is implemented (A) just via page table modifications or (B) via additionally copying the data to an unused SSD block. Several gigabytes of data via NVMe-simple-copy would need to be moved in order to see a performance difference between (A) and (B).
    That's a pretty bold claim, I don't think the breakeven would be that high, if the pagetable mechanism were designed with this in mind.

    Leave a comment:


  • atomsymbol
    replied
    Originally posted by microcode View Post
    Even if you had to process part of the copy with actual copying/writing, any page-aligned subset of that copy operation can be done entirely through the pagetables.
    Yes. But if the pagetables aren't stored in some special kind of area on the SSD then any modification to the page table will result in writing a whole SSD block (SSD block size is from 256 KB to 4 MB). If NVMe-simple-copy is moving less than 256KB-4MB of data (such as: less than 4 MB per second) it does not matter from performance perspective whether it is implemented (A) just via page table modifications or (B) via additionally copying the data to an unused SSD block. Several gigabytes of data via NVMe-simple-copy would need to be moved in order to see a performance difference between (A) and (B).

    Leave a comment:


  • microcode
    replied
    Originally posted by atomsymbol View Post
    I don't understand the point of your post.
    Even if you had to process part of the copy with actual copying/writing, any page-aligned subset of that copy operation can be done entirely through the pagetables.

    Leave a comment:


  • atomsymbol
    replied
    Originally posted by microcode View Post
    FWIW, 512 byte sectors are a legacy matter. When you are giving a copy command, even if it is 512-byte aligned rather than aligned to the SSD's page size, there is a chance that a complete page is somewhere in the range.
    I don't understand the point of your post.

    Leave a comment:


  • microcode
    replied
    Originally posted by atomsymbol View Post

    According to a random PDF about NVMe-simple-copy, it seems to me that the granularity of a simple copy command is 512 bytes which is most likely much smaller than the internal physical granularity of an SSD block. Quick Internet search for the term "ssd typical block size" yields "between 256 KB and 4 MB".
    FWIW, 512 byte sectors are a legacy matter. When you are giving a copy command, even if it is 512-byte aligned rather than aligned to the SSD's page size, there is a chance that a complete page is somewhere in the range.

    Leave a comment:


  • atomsymbol
    replied
    Originally posted by microcode View Post

    Are you sure it can't be done entirely through remapping, at least some of the time? It's not like your SSDs are exposing the flash directly to you to begin with so I would expect this to affect the indexes/hash tries, but not so much involve rewriting the data itself.

    On second thought I guess what I'm thinking of is a move and not a copy.
    According to a random PDF about NVMe-simple-copy, it seems to me that the granularity of a simple copy command is 512 bytes which is most likely much smaller than the internal physical granularity of an SSD block. Quick Internet search for the term "ssd typical block size" yields "between 256 KB and 4 MB".

    Leave a comment:


  • microcode
    replied
    Originally posted by atomsymbol View Post

    Physically, the blocks are still being read and then written, but the reads&writes do not involve the PCI Express bus, the main memory and the CPU.

    Without NVMe-simple-copy being faster than 7 GB/s (PCIe x4 4.0) it is mostly pointless of course and would be beneficial only in case applications running on the CPU are fully saturating the DDR4 memory bandwidth.
    Are you sure it can't be done entirely through remapping, at least some of the time? It's not like your SSDs are exposing the flash directly to you to begin with so I would expect this to affect the indexes/hash tries, but not so much involve rewriting the data itself.

    On second thought I guess what I'm thinking of is a move and not a copy.

    Leave a comment:


  • dimko
    replied
    Originally posted by karolherbst View Post

    what about power consumption? Not having to deal with it on the CPU side should also lead to longer battery lifetime
    I suspect such power saving would be negligible, if at all noticeable.

    Leave a comment:

Working...
X