XFS Metadata Corruption On Linux 6.3 Tracked Down To One Missing One-Line Patch
Last week XFS users began encountering metadata corruption on the latest Linux 6.3 point releases. Following kernel developers and those testing the kernels on affected hardware over the US holiday weekend, it's believed the issue has been tracked down to one missing patch that is a one line of code being deleted.
XFS developer Dave Chinner at Red Hat suggested on Saturday trying this patch on the Linux 6.3 kernel for those plagued by this XFS metadata corruption problem. Chinner commented, "This is a bug fix that we thought just fixed a livelock on stripe aligned filesystems. I'm guessing that in certain circumstances instead of livelocking on repeated failed allocations, it results in a broken mapping being returned to the writeback code and hence misdirecting the writeback IO."
But it turns out this patch fixes the problem even for those not using XFS stripes. Patching Linux 6.3 with that one line of code deletion resolved the XFS issues for two affected individuals. Rune Kleveland who had been active in dealing with this issue commented, "[this build] has been stable for 90 minutes on the same type of hardware that all the other 6.3 kernels crashed within a couple of minutes after boot. So this seems to fix the issue for me."
Linux 6.3 builds with this patch included are on their way to the Fedora 37 and 38 testing repositories. This patch should also work its way into a new upstream Linux 6.3 point release in the coming days.
XFS developer Dave Chinner at Red Hat suggested on Saturday trying this patch on the Linux 6.3 kernel for those plagued by this XFS metadata corruption problem. Chinner commented, "This is a bug fix that we thought just fixed a livelock on stripe aligned filesystems. I'm guessing that in certain circumstances instead of livelocking on repeated failed allocations, it results in a broken mapping being returned to the writeback code and hence misdirecting the writeback IO."
But it turns out this patch fixes the problem even for those not using XFS stripes. Patching Linux 6.3 with that one line of code deletion resolved the XFS issues for two affected individuals. Rune Kleveland who had been active in dealing with this issue commented, "[this build] has been stable for 90 minutes on the same type of hardware that all the other 6.3 kernels crashed within a couple of minutes after boot. So this seems to fix the issue for me."
Linux 6.3 builds with this patch included are on their way to the Fedora 37 and 38 testing repositories. This patch should also work its way into a new upstream Linux 6.3 point release in the coming days.
19 Comments