EXT4 Has A Nice One-Line Performance Fix For Cases When Delayed Allocation Is Disabled
When an EXT4 file-system is running low on free space (or when toggled via the "nodelalloc" mount option), EXT4's delayed allocation mode can be disabled. This can result in a significant performance hit but now a patch is pending for what should land in Linux 5.20 with recovering that performance when delayed allocation is disabled.
EXT4's delayed allocation "delalloc" mode allows for deferring the mapping of new file data blocks to disl blocks until writeback time. The delayed allocation is said to help reduce file-system fragmentation and reduce CPU cycles spent on block allocation. EXT4 delayed allocation is enabled by default except when running low on free disk space or using the mount option to disable it -- it's in those cases that there is a performance fix/improvement coming.
SUSE Linux engineer Jan Kara has contributed a patch to improve the write performance when delalloc is disabled. Jan explains, "When delayed allocation is disabled (either through mount option or because we are running low on free space), ext4_write_begin() allocates blocks with EXT4_GET_BLOCKS_IO_CREATE_EXT flag. With this flag extent merging is disabled and since ext4_write_begin() is called for each page separately, we end up with a *lot* of 1 block extents in the extent tree and following writeback is writing 1 block at a time which results in very poor write throughput (4 MB/s instead of 200 MB/s). These days when ext4_get_block_unwritten() is used only by ext4_write_begin(), ext4_page_mkwrite() and inline data conversion, we can safely allow extent merging to happen from these paths since following writeback will happen on different boundaries anyway. So use EXT4_GET_BLOCKS_CREATE_UNRIT_EXT instead which restores the performance."
It's a simple one-line change that is now queued up in EXT4's "dev" code. In turn this should likely appear come Linux 5.20 later in the summer.
EXT4's delayed allocation "delalloc" mode allows for deferring the mapping of new file data blocks to disl blocks until writeback time. The delayed allocation is said to help reduce file-system fragmentation and reduce CPU cycles spent on block allocation. EXT4 delayed allocation is enabled by default except when running low on free disk space or using the mount option to disable it -- it's in those cases that there is a performance fix/improvement coming.
SUSE Linux engineer Jan Kara has contributed a patch to improve the write performance when delalloc is disabled. Jan explains, "When delayed allocation is disabled (either through mount option or because we are running low on free space), ext4_write_begin() allocates blocks with EXT4_GET_BLOCKS_IO_CREATE_EXT flag. With this flag extent merging is disabled and since ext4_write_begin() is called for each page separately, we end up with a *lot* of 1 block extents in the extent tree and following writeback is writing 1 block at a time which results in very poor write throughput (4 MB/s instead of 200 MB/s). These days when ext4_get_block_unwritten() is used only by ext4_write_begin(), ext4_page_mkwrite() and inline data conversion, we can safely allow extent merging to happen from these paths since following writeback will happen on different boundaries anyway. So use EXT4_GET_BLOCKS_CREATE_UNRIT_EXT instead which restores the performance."
It's a simple one-line change that is now queued up in EXT4's "dev" code. In turn this should likely appear come Linux 5.20 later in the summer.
1 Comment