Multi-Grained Timestamps Submitted For Linux 6.6

Written by Michael Larabel in Linux Storage on 28 August 2023 at 05:54 AM EDT. 12 Comments
LINUX STORAGE
In addition to the fchmodat2 system call, another early pull request submitted by Microsoft's Christian Brauner even before the Linux 6.5 kernel was released is one to introduce multi-grained timestamps with Linux 6.6. Multi-grained timestamps are intended to address an issue exhibited with NFS around caching and the current coarse-grained timestamp handling used for (in)validating caches.

Christian Brauner sent in the pull request that implements multi-grained timestamps support within the VFS layer as well as wiring it up for use by Tmpfs, XFS, EXT4, and the Btrfs file-systems. The work on multi-grained timestamps being to address the current coarse-grained timestamps when updating creation time and modification time that a lot of I/O activity can happen in the once per jiffy timestamp. NFS relies on those timestamps for validating caches and thus the current timestamp behavior can be problematic but is made more robust with this multi-grained timestamps code.

Brauner explained in the pull request:
"This adds VFS support for multi-grain timestamps and converts tmpfs, xfs, ext4, and btrfs to use them. This carries acks from all relevant filesystems.

The VFS always uses coarse-grained timestamps when updating the ctime and mtime after a change. This has the benefit of allowing filesystems to optimize away a lot of metadata updates, down to around 1 per jiffy, even when a file is under heavy writes.

Unfortunately, this has always been an issue when we're exporting via NFSv3, which relies on timestamps to validate caches. A lot of changes can happen in a jiffy, so timestamps aren't sufficient to help the client decide to invalidate the cache.

Even with NFSv4, a lot of exported filesystems don't properly support a change attribute and are subject to the same problems with timestamp granularity. Other applications have similar issues with timestamps (e.g., backup applications).

If we were to always use fine-grained timestamps, that would improve the situation, but that becomes rather expensive, as the underlying filesystem would have to log a lot more metadata updates.

This introduces fine-grained timestamps that are used when they are actively queried.

This uses the 31st bit of the ctime tv_nsec field to indicate that something has queried the inode for the mtime or ctime. When this flag is set, on the next mtime or ctime update, the kernel will fetch a fine-grained timestamp instead of the usual coarse-grained one.

As POSIX generally mandates that when the mtime changes, the ctime must also change the kernel always stores normalized ctime values, so only the first 30 bits of the tv_nsec field are ever used.

Filesytems can opt into this behavior by setting the FS_MGTIME flag in the fstype. Filesystems that don't set this flag will continue to use coarse-grained timestamps."

This code is awaiting merging by Linus Torvalds now for Linux 6.6.
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week