Announcement

Collapse
No announcement yet.

Uncached Buffered IO Is Performing Great, Working Now On Btrfs / EXT4 / XFS

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Uncached Buffered IO Is Performing Great, Working Now On Btrfs / EXT4 / XFS

    Phoronix: Uncached Buffered IO Is Performing Great, Working Now On Btrfs / EXT4 / XFS

    As covered last week Linux I/O expert Jens Axboe has been taking a fresh pursuit of uncached buffered I/O for Linux. This "RWF_UNCACHED" work was originally started back in 2019 while a renewed effort around it is showing ~65% faster read/write performance and so far has been extended to work across EXT4, Btrfs, and XFS file-systems...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    Originally posted by Jens Axboe (patch)
    Using it from applications is trivial - just set RWF_UNCACHED for the read or write, using pwritev2(2) or preadv2(2).
    Yeah, um please don't. Unless you have specific knowledge that the data you're reading or writing will have a low reuse rate, developers should just stick with normal cached+buffered I/O.

    Those who don't want cached I/O already know who they are. They're most likely already using either O_DIRECT, or at least posix_fadvise(..., POSIX_FADV_DONTNEED) .

    It would be interesting if this behavior could be controlled via process flags, extended attributes on directories, or mount options. That would provide control over caching behavior to the user, in those cases where they deem it preferable to avoid cache pollution.

    In general, the problems Jens noted with reclaiming cached pages are better addressed through I/O scheduler parameters or possibly enhancements.

    Comment


    • #3
      I imagine MySQL/MariaDB could take advantage of that given it can use O_DIRECT?

      Comment


      • #4
        Also, note that Jens' data, quoted in the original article, showed that he needed to sustain writes for more than about 4 seconds, before his new method started to offer significant gains over normal cached+buffered I/O.

        Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite


        This amount of time will obviously vary, based on how much free memory + non-dirty pages you have, but the point is that it's an optimization for sustained I/O and not sporadic reads or writes.

        Comment


        • #5
          Originally posted by coder View Post
          Yeah, um please don't. Unless you have specific knowledge that the data you're reading or writing will have a low reuse rate, developers should just stick with normal cached+buffered I/O.

          Those who don't want cached I/O already know who they are. They're most likely already using either O_DIRECT, or at least posix_fadvise(..., POSIX_FADV_DONTNEED) .

          It would be interesting if this behavior could be controlled via process flags, extended attributes on directories, or mount options. That would provide control over caching behavior to the user, in those cases where they deem it preferable to avoid cache pollution.

          In general, the problems Jens noted with reclaiming cached pages are better addressed through I/O scheduler parameters or possibly enhancements.
          most files created by applications are not to be reused however. Also many are probably not already using O_DIRECT since it isn't buffered. What really is lacking here is support to open a file in this mode via fopen() since that is where the vast majority of devs will use to open their files.

          Great use could be from backups (writes of backed up data would only pollute the cache and O_DIRECT would be an extremely slowdown unless the backup application perform internal buffering).

          Originally posted by Britoid View Post
          I imagine MySQL/MariaDB could take advantage of that given it can use O_DIRECT?
          Databases tend to want to perform the buffering themselves (hence their use of O_DIRECT) since they often know better how and when to buffer their data than the OS does. Which is not to say that there will not be any benefits from this, just that both MySQL and PostgreSQL will probably not focus on it ASAP.
          Last edited by F.Ultra; 13 November 2024, 09:40 AM.

          Comment


          • #6
            Originally posted by coder View Post
            Yeah, um please don't. Unless you have specific knowledge that the data you're reading or writing will have a low reuse rate, developers should just stick with normal cached+buffered I/O.

            Those who don't want cached I/O already know who they are. They're most likely already using either O_DIRECT, or at least posix_fadvise(..., POSIX_FADV_DONTNEED) .

            It would be interesting if this behavior could be controlled via process flags, extended attributes on directories, or mount options. That would provide control over caching behavior to the user, in those cases where they deem it preferable to avoid cache pollution.

            In general, the problems Jens noted with reclaiming cached pages are better addressed through I/O scheduler parameters or possibly enhancements.
            How about try it and benchmark it? It makes sense that as NVME drives get faster, the caching process is just adding extra steps. Even OpenZFS is now working on bypassing their ARC cache, and that's a filesystem that relies heavily on caching.

            Comment


            • #7
              Originally posted by Chugworth View Post
              How about try it and benchmark it?
              It's highly dependent on the particulars, but the kernel implements caching because it's usually a win. Often, it's a very big win.

              As recently as 2-3 years ago, I was still routinely running builds on a 8-core/16-thread server with a RAID of mechanical hard disks and lots of RAM. It worked fine, with virtually zero io_wait, because once the source files were resident in memory, the compiler wouldn't have to wait for them to get read back in from disk. More importantly, the linker wouldn't have to wait for the freshly-generated .o files and other libraries to get read in from disk, because they were all in cache. If the compiler and linker used this new flag, performance would've gone down the drain and it would've been heavily I/O-bottlenecked.

              Originally posted by Chugworth View Post
              It makes sense that as NVME drives get faster, the caching process is just adding extra steps.
              No, it's not adding extra steps. This method still uses the page cache. The main reason it's faster is that you (usually) don't have to wait for more dirty pages to get flushed before your I/O to proceed, but that's only something that happens once all the free memory and non-dirty pages have been evicted. Unless you're doing sustained I/O, it's not an issue because the kernel is continually trying to write out dirty pages, so the initial state will tend to be that lots of pages are available.

              As for benchmarking, Michael tends to use a fast SSD, but he only uses one to benchmark even 384-core servers. So, some of the benchmarks he runs that aren't currently I/O-bottlenecked would become extremely bottlenecked. Unlike Axboe, he can't afford a 32-drive RAID of fast SSDs for any of his test chassis, let alone all of them.

              Unless it's a very specialized app or library that does something like video streaming or a database, you really should leave the decision up to the user about whether to try and circumvent caching. That's why I said it should be externally-controllable and not hard-coded into a bunch of apps and libraries.

              Comment


              • #8
                The reason O_DIRECT has sucked so bad, for so long, is that bypassing cache usually isn't what you want. Torvalds is on the record as being extremely negative about it.

                This method is a slight compromise and eliminates some of the rough edges of O_DIRECT, but it's still a bad idea to avoid caching unless you have a high degree of certainty that it would benefit the overwhelming proportion of your users in the vast majority of cases.

                One case to be made for sticking with cached I/O is that maybe someone comes along and improves the kernel's page reclamation. That's basically Jens' main argument, here. If you use his method just because of that, not only might it no longer be necessary for you to do that, but you're depriving users of the benefits caching can provide, if they have lots of RAM and a lot more compute power than raw I/O bandwidth. So, you're leaving them up a creek without a paddle.

                Comment


                • #9
                  Originally posted by F.Ultra View Post
                  Great use could be from backups (writes of backed up data would only pollute the cache and O_DIRECT would be an extremely slowdown unless the backup application perform internal buffering).
                  Lots of people use rsync to make backups. However, rsync has other uses, too. It would be a mistake for it to use this flag, because that would harm the other use cases. Instead, you really want the user to control caching behavior, similar to how they can control CPU selection via tools like taskset.

                  Comment


                  • #10
                    Originally posted by coder View Post
                    Lots of people use rsync to make backups. However, rsync has other uses, too. It would be a mistake for it to use this flag, because that would harm the other use cases. Instead, you really want the user to control caching behavior, similar to how they can control CPU selection via tools like taskset.
                    rsync in particular would be a great candidate to have this as an option. In far too many scrips does one have to run it with nocache to force it not to pollute the cache when transferring TiB of data between systems.

                    Comment

                    Working...
                    X