Announcement

Collapse
No announcement yet.

Corsair MP700 PRO 2TB PCIe Gen5 NVMe SSD

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by yump View Post
    I have a theory. IIRC, NVMe protocol has multiple queues, one per CPU. One queue with 32 I/Os in flight is not the same as 32 queues with one I/O in flight. Especially considering the large block size of 2 MB. I only have one NVMe SSD to check, but if I look at /sys/block/nvme0n1/queue/max_sectors_kb, mine has a maximum actual I/O size of 512K. My understanding (which fio+iostat agrees with) is that a 2 MiB read gets split by the kernel and sent to the SSD as 4 512 KiB reads.

    Suppose what this enterprise SSD sees is 32 queues each with 4 requests in flight. Suppose also that its scheduler is tuned for peak latency or fairness instead of throughput. If it round-robins between the queues, that could effectively turn a 2 MiB sequential workload into a 512 KiB random workload.
    Very interesting. Thanks for replying.

    I guess the question would then be why the MP700 Pro handles such a workload better, or does something about it trigger a different max_sectors_kb value? Because there's a factor of 4.8 between them, in this sequential benchmark, but less than a factor of 2 difference between those drives on true 4k random reads:


    Comment


    • #22
      Originally posted by coder View Post
      Very interesting. Thanks for replying.

      I guess the question would then be why the MP700 Pro handles such a workload better, or does something about it trigger a different max_sectors_kb value? Because there's a factor of 4.8 between them, in this sequential benchmark, but less than a factor of 2 difference between those drives on true 4k random reads:
      On true 4k random reads, the actual pattern of accesses to the flash chips will be random no matter what the SSD's scheduler does, because the number of requests in flight at any one time (which is also the re-ordering window the scheduler can see) is only a tiny fraction of the working set size. But with sequential accesses, it's possible to pessimize some workloads. Conisder:

      Code:
      I/O threads:
      
      T1: 1 2 3 4
      T2: 5 6 7 8
      T3: 9 10 11 12
      T4: 13 14 15 16
      
      Flash chip with unfair scheduling:
      
      1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
      
      Flash chip with fair scheduling:
      
      1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16​

      Comment


      • #23
        Originally posted by yump View Post
        if I look at /sys/block/nvme0n1/queue/max_sectors_kb, mine has a maximum actual I/O size of 512K. My understanding (which fio+iostat agrees with) is that a 2 MiB read gets split by the kernel and sent to the SSD as 4 512 KiB reads.
        Can you change this? If you do, is there any way to tell if it's having an effect? I guess one way might be to run iostat -x and see if it shows more IO operations per kB/s.

        Comment


        • #24
          Originally posted by coder View Post
          Can you change this? If you do, is there any way to tell if it's having an effect? I guess one way might be to run iostat -x and see if it shows more IO operations per kB/s.
          I can decrease it, but not increase it. Iostat indeed shows correspondingly smaller rareq-s​​z.

          There is also a max_hw_sectors_kb, which is also 512.

          Comment


          • #25
            Originally posted by yump View Post
            I can decrease it, but not increase it. Iostat indeed shows correspondingly smaller rareq-s​​z.

            There is also a max_hw_sectors_kb, which is also 512.
            Thanks for the info!

            It sure would be nice to test your theory (or see what's actually going on, if it's not just that). It bugs me not to know, but I don't really have any stake in it.

            I guess my work PC has a Micron-branded M.2 drive that I presume is probably a rebrand of Crucial model, but since it probably has very different firmware, I wouldn't expect to see the same effect...

            Comment

            Working...
            X