I haven't seen any lockups with v5.9.9, but I woke up this morning to find both of my P9 boxes having input/output errors on their XFS filesytems. One box only saw it on its /home filesystem so I managed to get this out of /var/log/syslog (which is also on XFS):
[160843.255360] XFS (bcache0): xfs_do_force_shutdown(0x8) called from line 461 of file fs/xfs/libxfs/xfs_defer.c. Return address = 0000000065546a41
There was a stacktrace in dmesg, but I was too busy trying to recover the box and didn't think to save it, but I remember seeing something like detecting in memory corruption.
The other box (which isn't bcache at all) is using a single root XFS filesystem for everything, and I wasn't able to get anything out of that.
When initially trying to bring up both boxes, I let them both boot to v5.9.9 and and the affected filesystems refused to mount:
Interestingly, when booting back to v5.9.8, both filesystems mounted without any issues. On one box, I umounted, and ran xfs_repair on one of the systems without a single complaint. My PC's running v5.9.9 and XFS are completely fine so far.
Seems a number of XFS changes made it into v5.9.9 so maybe some of them aren't PPC friendly? Or maybe these changes are a red herring and some other changes relating to disk I/O were made that are incompatible with PowerPC?
In any case, I don't see and XFS changes/fixes for v5.9.10, so I'd recommend staying away from both of these kernels for now if you're running POWER+XFS. I'm currently running v5.9.10 on my testing box to see if the issue reproduces within the next 24-48 hours. Is anyone else encountering filesystem corruption issues on POWER with v5.9.9+? If I reproduce this with v5.9.10, I'll file a kernel bug report, and the more info I have, the better.
Annoyingly enough, v5.9.9 finally fixed a longstanding memory allocation issue I had where after 12-13 hours of uptime, one of my P9 boxes could no longer start new KVM instances.
[160843.255360] XFS (bcache0): xfs_do_force_shutdown(0x8) called from line 461 of file fs/xfs/libxfs/xfs_defer.c. Return address = 0000000065546a41
There was a stacktrace in dmesg, but I was too busy trying to recover the box and didn't think to save it, but I remember seeing something like detecting in memory corruption.
The other box (which isn't bcache at all) is using a single root XFS filesystem for everything, and I wasn't able to get anything out of that.
When initially trying to bring up both boxes, I let them both boot to v5.9.9 and and the affected filesystems refused to mount:
Code:
< bno + len at line 578 of file fs/xfs/libxfs/xfs_rmap.c. Caller xfs_rmap_unmap+0x79c/0xaa0 [ 7.763152] CPU: 37 PID: 3069 Comm: mount Not tainted 5.9.9-64k-pages #182 [ 7.763174] Call Trace: [ 7.763182] [c000002f9ac1f4c0] [c00000000082c1e0] dump_stack+0xc4/0x114 (unreliable) [ 7.763200] [c000002f9ac1f500] [c00000000069bfb4] xfs_corruption_error+0xf4/0x100 [ 7.763224] [c000002f9ac1f5a0] [c00000000067b654] xfs_rmap_unmap+0x774/0xaa0 [ 7.763239] [c000002f9ac1f6b0] [c000000000680b5c] xfs_rmap_finish_one+0x32c/0x3a0 [ 7.763274] [c000002f9ac1f7c0] [c0000000006e110c] xfs_rui_item_recover+0x27c/0x380 [ 7.763308] [c000002f9ac1f890] [c0000000006e7054] xlog_recover_process_intents.isra.28+0x204/0x320 [ 7.763334] [c000002f9ac1f910] [c0000000006e7aa0] xlog_recover_finish+0x40/0x120 [ 7.763350] [c000002f9ac1f980] [c0000000006d31dc] xfs_log_mount_finish+0x7c/0x170 [ 7.763383] [c000002f9ac1f9c0] [c0000000006c00fc] xfs_mountfs+0x55c/0x9c0 [ 7.763415] [c000002f9ac1fa70] [c0000000006c70e4] xfs_fc_fill_super+0x394/0x5e0 [ 7.763449] [c000002f9ac1fb10] [c000000000469334] get_tree_bdev+0x234/0x380 [ 7.763481] [c000002f9ac1fbb0] [c0000000006c6358] xfs_fc_get_tree+0x28/0x40 [ 7.763504] [c000002f9ac1fbd0] [c000000000466bdc] vfs_get_tree+0x4c/0x160 [ 7.763526] [c000002f9ac1fc50] [c0000000004a33fc] path_mount+0x4cc/0xd20 [ 7.763549] [c000002f9ac1fd10] [c0000000004a3cd0] do_mount+0x80/0xd0 [ 7.763562] [c000002f9ac1fd70] [c0000000004a42f8] sys_mount+0x158/0x180 [ 7.763585] [c000002f9ac1fdc0] [c0000000000336d0] system_call_exception+0x160/0x280 [ 7.763619] [c000002f9ac1fe20] [c00000000000d940] system_call_common+0xf0/0x27c [ 7.763668] XFS (bcache0): Corruption detected. Unmount and run xfs_repair [ 7.763705] XFS (bcache0): Internal error xfs_trans_cancel at line 954 of file fs/xfs/xfs_trans.c. Caller xfs_rui_item_recover+0x2d4/0x380
Seems a number of XFS changes made it into v5.9.9 so maybe some of them aren't PPC friendly? Or maybe these changes are a red herring and some other changes relating to disk I/O were made that are incompatible with PowerPC?
In any case, I don't see and XFS changes/fixes for v5.9.10, so I'd recommend staying away from both of these kernels for now if you're running POWER+XFS. I'm currently running v5.9.10 on my testing box to see if the issue reproduces within the next 24-48 hours. Is anyone else encountering filesystem corruption issues on POWER with v5.9.9+? If I reproduce this with v5.9.10, I'll file a kernel bug report, and the more info I have, the better.
Annoyingly enough, v5.9.9 finally fixed a longstanding memory allocation issue I had where after 12-13 hours of uptime, one of my P9 boxes could no longer start new KVM instances.
Comment