It's Looking Like The EXT4 Corruption Issue On Linux 4.19 Is Caused By BLK-MQ
The saga about EXT4 file-system corruption on Linux 4.19 kernels that has increased in recent weeks might soon be drawing a close... This data corruption bug though is looking like it doesn't originate from within the EXT4 code at all.
While it's still not 100% settled, it's looking like the EXT4 corruption issues on Linux 4.19 are actually due to a problem within the multi-queue block code "blk-mq" for this current stable series. It's also looking like other file-systems could be/are affected, just that EXT4 is the most common file-system and thus the most reports. That's the latest belief for those anxious for details that haven't been tracking this problem closely.
Multiple users -- including upstream Linux kernel developers -- have found their data's stability to improve once disabling the MQ code. There are also some tentative patches for addressing the problem after bisecting finally turned up the likely problematic commits. It was a chore by those involved in simply bisecting the kernel code due to the pesky behavior, sometimes taking several minutes or longer before the data corruption issue would exhibit itself, and the issue not even originating from the EXT4 code.
The multi-queue block I/O code for the Linux kernel allows for much better performance/scaling potential on today's modern multi-core processors with speedy storage. BLK-MQ allows for handling multiple queues distributed across CPU threads that can then map to the number of available hardware queues for a storage device. With time more drivers have been supporting BLK-MQ while the key drivers like NVMe have supported it for quite some time.
More code has been switching over to MQ and the few missing features have been getting addressed. For kernels with the blk-mq mode not enabled by default, it can be enabled at boot time with scsi_mod.use_blk_mq=1. Checking if a drive is using the multi-queue I/O code can be done by checking for the presence of the /sys/block/DEVICE/mq directory.
At least for now, based upon user bisecting, it's looking like blk-mq: fail the request in case issue failure and blk-mq: issue directly if hw queue isn't busy in case of 'none' are the commits responsible for making the system prone to the data corruption problem on Linux 4.20+.
Linux block subsystem maintainer Jens Axboe of Facebook has put out a one line of code patch that appears to address this problem. Or the other option would be to not use blk-mq until the regression is resolved. At least with the speedy pace particularly in the past few days with more individuals hitting this problem, it's looking like the bug will soon be solved for good.
Update: The issue is now resolved.
While it's still not 100% settled, it's looking like the EXT4 corruption issues on Linux 4.19 are actually due to a problem within the multi-queue block code "blk-mq" for this current stable series. It's also looking like other file-systems could be/are affected, just that EXT4 is the most common file-system and thus the most reports. That's the latest belief for those anxious for details that haven't been tracking this problem closely.
Multiple users -- including upstream Linux kernel developers -- have found their data's stability to improve once disabling the MQ code. There are also some tentative patches for addressing the problem after bisecting finally turned up the likely problematic commits. It was a chore by those involved in simply bisecting the kernel code due to the pesky behavior, sometimes taking several minutes or longer before the data corruption issue would exhibit itself, and the issue not even originating from the EXT4 code.
The multi-queue block I/O code for the Linux kernel allows for much better performance/scaling potential on today's modern multi-core processors with speedy storage. BLK-MQ allows for handling multiple queues distributed across CPU threads that can then map to the number of available hardware queues for a storage device. With time more drivers have been supporting BLK-MQ while the key drivers like NVMe have supported it for quite some time.
More code has been switching over to MQ and the few missing features have been getting addressed. For kernels with the blk-mq mode not enabled by default, it can be enabled at boot time with scsi_mod.use_blk_mq=1. Checking if a drive is using the multi-queue I/O code can be done by checking for the presence of the /sys/block/DEVICE/mq directory.
At least for now, based upon user bisecting, it's looking like blk-mq: fail the request in case issue failure and blk-mq: issue directly if hw queue isn't busy in case of 'none' are the commits responsible for making the system prone to the data corruption problem on Linux 4.20+.
Linux block subsystem maintainer Jens Axboe of Facebook has put out a one line of code patch that appears to address this problem. Or the other option would be to not use blk-mq until the regression is resolved. At least with the speedy pace particularly in the past few days with more individuals hitting this problem, it's looking like the bug will soon be solved for good.
Update: The issue is now resolved.
20 Comments