Announcement

Collapse
No announcement yet.

Axboe Achieves 8M IOPS Per-Core With Newest Linux Optimization Patches

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    I think the question of MastaG is more like: do any of this optimizations even affect regular desktop use case? E.g. when loading a game on Steam, IO concurrency (queue depth) is very low, and existing software does not explicitly use the io_uring API. So the question is, will existing desktop applications (like Steam games) see any improvement at all, or the code paths being optimized are already taking negligible time/power (in desktop use case) before the optimizations?

    Comment


    • #12
      Originally posted by blackshard View Post
      measuring throughput of an algorithm/api/whatever is sensible only when all the variables around stay the same.
      Those guys are all on different hardware and there is no comparison, so that's just pure numbers that actually tell nothing about the optimizations they do to the api...

      Paradoxically some kind of optimization for a setup may result in performance loss for another, where are the comparisons?
      It needs to be said that IO is inherently DISK LIMITED. No matter how many IO per core you can do, its limited to the disk speed you're attached to. Jens Axboe managed to do 8M IOPS on a single core, which is amazing because you would need less cores and operations to serve all disk bandwidth to your customers. Also, if you look at the pull request you'll see that the changes made are very general and not architecture specific so every architecture would see an improvement based on its capability. Most likely this would result in bigger cores and higher frequencies seeing more improvement but that's the case with all optimisations in general.

      Comment


      • #13
        Originally posted by turboNOMAD View Post
        I think the question of MastaG is more like: do any of this optimizations even affect regular desktop use case? E.g. when loading a game on Steam, IO concurrency (queue depth) is very low, and existing software does not explicitly use the io_uring API. So the question is, will existing desktop applications (like Steam games) see any improvement at all, or the code paths being optimized are already taking negligible time/power (in desktop use case) before the optimizations?
        I dont think you would see any improvements. My disk IO is not anywhere near 8M at queue depth 128 and I can't think of any program I use that uses IO-uring anyway. Disk IO handling shouldn't use alot of power anyway so lower power states aren't applicable. But maybe with tommorows NVME drivers you'll be able to scale all the way up 8 million ops on your laptop.

        Comment


        • #14
          Originally posted by turboNOMAD View Post
          So the question is, will existing desktop applications (like Steam games) see any improvement at all, or the code paths being optimized are already taking negligible time/power (in desktop use case) before the optimizations?
          Not likely IOPS wise, since many NVMe IIRC (including PCIe 4.0 devices atm) only manage around 1M IOPS peak. The device being used here specializes in random IO more than sequential, and in these tests they're using more than one now as that device throughput maxed out around 5M IOPS, so they've been trying to max out how much IOPS a single core on their prosumer CPU can handle.

          For most of us that sort of workload isn't realistic, but just because we don't maximize throughput, we would still see the 2-3x perf improvement (referring to the improvements between 2.5M to 8M IOPS articles). AFAIK it's only IO_URING though, I don't know what actually supports that regarding typical desktop or gamer usage, perhaps if using a network share like Samba?

          On the plus side, we know it's considerably better than what preceded IO_URING, so any software that would stand to benefit and doesn't yet support IO_URING would probably consider the value of the support more worthwhile now

          TL;DR: If you would benefit, it'd be in latency reduction for I/O. Should be more noticeable with heavy random I/O tasks, provided they leverage IO_URING.

          Comment


          • #15
            Originally posted by RedEyed View Post

            Potentially, any kind of optimisation is good for regular user.

            Even if it is not faster, it will use less power.
            Normal users don't use I/O that much. This is obvious when observing the decrease of the SSD cell lifespan every year. I also don't think e.g. SATA drive saturate even a single core in a 5900X or 11900K system.

            Comment


            • #16
              I looked at the article and all I saw was Schneider Weisse. I usually go for Tap 5, but I haven't seen it in quite a while

              Comment


              • #17
                Originally posted by turboNOMAD View Post
                I think the question of MastaG is more like: do any of this optimizations even affect regular desktop use case? E.g. when loading a game on Steam, IO concurrency (queue depth) is very low, and existing software does not explicitly use the io_uring API. So the question is, will existing desktop applications (like Steam games) see any improvement at all, or the code paths being optimized are already taking negligible time/power (in desktop use case) before the optimizations?
                Regular desktop users use Windows, so it won't affect them

                Comment


                • #18
                  These gains are great, but this is just a precursor to a more fundamental shift in design that is coming. Samsung is already developing memory chips that combine DDR with FLASH technologies, and other manufacturers will follow with their own innovations. It is now only a matter of time until main memory becomes persistent and software no longer has to load and store data explicitly on storage devices, and when all data will become available at all times.

                  I know some game designers are already waiting for such a change desperately where the data does not have to be streamed from a drive into main memory or a game world has to be cut up into sections just to fit into main memory.

                  Of course, some people will hold on to the classic design, because of their worries and "old school"-thinking, but when people's workflow changes and it is no longer a "load, work, save"-process but people can jump straight to the "work" part then it will cause a shift in designs. Old schoolers will still want to load and save their documents, and count files on a drive like these were eggs in a basket.

                  Comment


                  • #19
                    Originally posted by Yttrium View Post

                    It needs to be said that IO is inherently DISK LIMITED. No matter how many IO per core you can do, its limited to the disk speed you're attached to. Jens Axboe managed to do 8M IOPS on a single core, which is amazing because you would need less cores and operations to serve all disk bandwidth to your customers. Also, if you look at the pull request you'll see that the changes made are very general and not architecture specific so every architecture would see an improvement based on its capability. Most likely this would result in bigger cores and higher frequencies seeing more improvement but that's the case with all optimisations in general.
                    I understand, but it is not the idea of optimizing the api that I'm criticizing, but the numbers!
                    As long as there is not a serious benchmark with consistent variables, all those numbers (7M, 7.4M, 8M IOPS...) are just trash...
                    I mean: I could take a 5900X and do 8M IOPS. Then I overclock the 5900X to an higher stellar frequency and do 9M IOPS, and so I reach a new record; but what matters? The api/algorithm below isn't any better, just throwing out a bigger useless number.

                    Comment


                    • #20
                      Originally posted by turboNOMAD View Post
                      I think the question of MastaG is more like: do any of this optimizations even affect regular desktop use case? E.g. when loading a game on Steam, IO concurrency (queue depth) is very low, and existing software does not explicitly use the io_uring API. So the question is, will existing desktop applications (like Steam games) see any improvement at all, or the code paths being optimized are already taking negligible time/power (in desktop use case) before the optimizations?
                      It looks like there are some changes in the block layer as well, weather that will really help on largely sequential I/O is doubtful. Though it may help nvme in general.

                      It's definitely more important on the workstation or server to process data streams/tables/trees.

                      Comment

                      Working...
                      X