Announcement

Collapse
No announcement yet.

ASRock BIOS breaks Linux

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • sjug
    started a topic ASRock BIOS breaks Linux

    ASRock BIOS breaks Linux

    I've got a number of ASRock AMD systems, both on AM4 (2700X on X470) and on TR4 (1950X on X399).

    Within the past several weeks ASRock pushed updated BIOS to the AMD platform. With their update to PinnaclePI-AM4_1.0.0.4 on AM4 and even more recently ThreadRipperPI-SP3r2 1.1.0.0 for TR4.

    It seems every motherboard that has these new updates is affected and it's quite the impact. I of course am running Arch on all of these machines so they have the latest 4.17.9 kernel that ships stock.

    After systemd starts the system immediately hangs for the first 120 seconds, which seems to be the value set in `/proc/sys/kernel/hung_task_timeout_secs`.

    On AM4 in the kernel log we see:

    ```
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2611 '/devices/pci0000:00/0000:00:07.1/0000:27:00.2' killed [49/8561]
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2702 '/devices/system/cpu/cpu9' killed
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2693 '/devices/system/cpu/cpu14' killed
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2697 '/devices/system/cpu/cpu4' killed
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2687 '/devices/system/cpu/cpu0' killed
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2695 '/devices/system/cpu/cpu2' killed
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2701 '/devices/system/cpu/cpu8' killed
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2700 '/devices/system/cpu/cpu7' killed
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2696 '/devices/system/cpu/cpu3' killed
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2694 '/devices/system/cpu/cpu15' killed
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2699 '/devices/system/cpu/cpu6' killed
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2688 '/devices/system/cpu/cpu1' killed
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2692 '/devices/system/cpu/cpu13' killed
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2690 '/devices/system/cpu/cpu11' killed
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2691 '/devices/system/cpu/cpu12' killed
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2698 '/devices/system/cpu/cpu5' killed
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2689 '/devices/system/cpu/cpu10' killed
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [463] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [463] failed while handling '/devices/pci0000:00/0000:00:07.1/0000:27:00.2'
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [464] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [464] failed while handling '/devices/system/cpu/cpu9'
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [454] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [454] failed while handling '/devices/system/cpu/cpu3'
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [451] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [451] failed while handling '/devices/system/cpu/cpu0'
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [459] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [459] failed while handling '/devices/system/cpu/cpu1'
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [467] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [467] failed while handling '/devices/system/cpu/cpu12'
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [457] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [457] failed while handling '/devices/system/cpu/cpu6'
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [455] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [455] failed while handling '/devices/system/cpu/cpu5'
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [468] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [468] failed while handling '/devices/system/cpu/cpu14'
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [469] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [469] failed while handling '/devices/system/cpu/cpu13'
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [471] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [471] failed while handling '/devices/system/cpu/cpu10'
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [473] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [473] failed while handling '/devices/system/cpu/cpu11'
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [479] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [479] failed while handling '/devices/system/cpu/cpu8'
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [482] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [482] failed while handling '/devices/system/cpu/cpu15'
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [485] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [485] failed while handling '/devices/system/cpu/cpu4'
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [487] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [487] failed while handling '/devices/system/cpu/cpu7'
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [491] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [491] failed while handling '/devices/system/cpu/cpu2'
    Jul 18 18:32:59 ryzen kernel: INFO: task systemd-udevd:245 blocked for more than 120 seconds.
    Jul 18 18:32:59 ryzen kernel: Tainted: G O 4.17.9-1-ARCH #1
    Jul 18 18:32:59 ryzen kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Jul 18 18:32:59 ryzen kernel: systemd-udevd D 0 245 1 0x80000000
    Jul 18 18:32:59 ryzen kernel: Call Trace:
    Jul 18 18:32:59 ryzen kernel: ? __schedule+0x282/0x890
    Jul 18 18:32:59 ryzen kernel: ? preempt_count_add+0x68/0xa0
    Jul 18 18:32:59 ryzen kernel: schedule+0x32/0x90
    Jul 18 18:32:59 ryzen kernel: __sev_do_cmd_locked+0xd7/0x270 [ccp]
    Jul 18 18:32:59 ryzen kernel: ? wait_woken+0x80/0x80
    Jul 18 18:32:59 ryzen kernel: ? 0xffffffffc0467000
    Jul 18 18:32:59 ryzen kernel: __sev_platform_init_locked+0x2f/0x80 [ccp]
    Jul 18 18:32:59 ryzen kernel: ? _raw_write_unlock_irqrestore+0x1c/0x30
    Jul 18 18:32:59 ryzen kernel: sev_platform_init+0x1d/0x30 [ccp]
    Jul 18 18:32:59 ryzen kernel: psp_pci_init+0x40/0xe0 [ccp]
    Jul 18 18:32:59 ryzen kernel: ? 0xffffffffc0467000
    Jul 18 18:32:59 ryzen kernel: sp_mod_init+0x16/0x1000 [ccp]
    Jul 18 18:32:59 ryzen kernel: do_one_initcall+0x46/0x1f5
    Jul 18 18:32:59 ryzen kernel: ? free_unref_page_commit+0x70/0xf0
    Jul 18 18:32:59 ryzen kernel: ? kmem_cache_alloc_trace+0xbd/0x1d0
    Jul 18 18:32:59 ryzen kernel: ? do_init_module+0x22/0x210
    Jul 18 18:32:59 ryzen kernel: do_init_module+0x5a/0x210
    Jul 18 18:32:59 ryzen kernel: load_module+0x247a/0x29f0
    Jul 18 18:32:59 ryzen kernel: ? __vfs_read+0x124/0x170
    Jul 18 18:32:59 ryzen kernel: ? __se_sys_finit_module+0x97/0xf0
    Jul 18 18:32:59 ryzen kernel: __se_sys_finit_module+0x97/0xf0
    Jul 18 18:32:59 ryzen kernel: do_syscall_64+0x5b/0x170
    Jul 18 18:32:59 ryzen kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
    Jul 18 18:32:59 ryzen kernel: RIP: 0033:0x7f704225e0f9
    Jul 18 18:32:59 ryzen kernel: RSP: 002b:00007ffd2c605f68 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
    Jul 18 18:32:59 ryzen kernel: RAX: ffffffffffffffda RBX: 000055f9953f6ab0 RCX: 00007f704225e0f9
    Jul 18 18:32:59 ryzen kernel: RDX: 0000000000000000 RSI: 00007f7041b00ecd RDI: 000000000000000c
    Jul 18 18:32:59 ryzen kernel: RBP: 00007f7041b00ecd R08: 0000000000000000 R09: 0000000000000000
    Jul 18 18:32:59 ryzen kernel: R10: 000000000000000c R11: 0000000000000246 R12: 0000000000000000
    Jul 18 18:32:59 ryzen kernel: R13: 000055f9954017f0 R14: 0000000000020000 R15: 000055f9953f6ab0
    ```

    These events keep occuring every 120 seconds causing the system boot time to take a very long time as we must wait for at least one of these failure events. I don't think shutting down the system is possible through software anymore as it keeps recording these events. System performance seems impacted as well.

    Looking at dmesg on threadripper we see a virtually identical error:
    ```
    [ 1351.274639] INFO: task systemd-udevd:377 blocked for more than 120 seconds.
    [ 1351.274644] Tainted: G O 4.17.9-1-ARCH #1
    [ 1351.274646] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 1351.274648] systemd-udevd D 0 377 1 0x80000000
    [ 1351.274651] Call Trace:
    [ 1351.274660] ? __schedule+0x282/0x890
    [ 1351.274664] ? preempt_count_add+0x68/0xa0
    [ 1351.274667] schedule+0x32/0x90
    [ 1351.274674] __sev_do_cmd_locked+0xd7/0x270 [ccp]
    [ 1351.274677] ? wait_woken+0x80/0x80
    [ 1351.274680] ? 0xffffffffc0430000
    [ 1351.274686] __sev_platform_init_locked+0x2f/0x80 [ccp]
    [ 1351.274688] ? _raw_write_unlock_irqrestore+0x1c/0x30
    [ 1351.274693] sev_platform_init+0x1d/0x30 [ccp]
    [ 1351.274698] psp_pci_init+0x40/0xe0 [ccp]
    [ 1351.274699] ? 0xffffffffc0430000
    [ 1351.274704] sp_mod_init+0x16/0x1000 [ccp]
    [ 1351.274707] do_one_initcall+0x46/0x1f5
    [ 1351.274710] ? free_unref_page_commit+0x70/0xf0
    [ 1351.274713] ? kmem_cache_alloc_trace+0xbd/0x1d0
    [ 1351.274716] ? do_init_module+0x22/0x210
    [ 1351.274718] do_init_module+0x5a/0x210
    [ 1351.274721] load_module+0x247a/0x29f0
    [ 1351.274723] ? __vfs_read+0x124/0x170
    [ 1351.274728] ? __se_sys_finit_module+0x97/0xf0
    [ 1351.274730] __se_sys_finit_module+0x97/0xf0
    [ 1351.274734] do_syscall_64+0x5b/0x170
    [ 1351.274736] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [ 1351.274738] RIP: 0033:0x7f15eb8f40f9
    [ 1351.274739] RSP: 002b:00007ffd4568e288 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
    [ 1351.274742] RAX: ffffffffffffffda RBX: 000055c75bb8af00 RCX: 00007f15eb8f40f9
    [ 1351.274743] RDX: 0000000000000000 RSI: 00007f15eb196ecd RDI: 000000000000000c
    [ 1351.274744] RBP: 00007f15eb196ecd R08: 0000000000000000 R09: 0000000000000000
    [ 1351.274745] R10: 000000000000000c R11: 0000000000000246 R12: 0000000000000000
    [ 1351.274746] R13: 000055c75bb998d0 R14: 0000000000020000 R15: 000055c75bb8af00
    ```

    I've raised this with ASRock, but I would be surprised if they know what to do. I might have to escalate this somewhere more capable like AMD support...
    Has anyone else seen this? Surely other people out here are using a modern kernel and recent BIOS?
    Starts
    07-25-2018
    Ends
    07-25-2018

  • lu_tze
    replied
    4.18.10 fixes it, the CCP driver has timeout for SEV to return.

    Finally it works, amd_kvm and all.

    Leave a comment:


  • dc740
    replied
    If you have a backup for your GA-AX370-Gaming K7 I recommend the BIOS "F23f" (which is the beta previous to the F23).
    This beta BIOS was better than the final release because:
    * It didn't have the PSP bug (or didn't expose it to Linux): it didn't have the timeouts seen in this post
    * There was an option to DISABLE PSP: the option was removed in F23.
    * "Typical Current" option appeared for the first time: This was the first BIOS that exposed the option that made Ryzen systems usable for many people. More info here: https://www.phoronix.com/forums/foru...pu-gcn-1-1-bug

    So... if you have F23f, or a backup, stick to that BIOS.

    Here is the download link
    http://download.gigabyte.us/FileList...ng-k7_f23f.zip

    Get it while you can. It's not shown in the Gigabyte download page, but if you try to download the F23 version and add the extra F manually, you get the working download link like I posted above, and the download works fine. They are probably going to remove it at some point, so make sure you make a backup. That BIOS version is gold for this motherboard (specially for the ability to disable PSP, which is no longer available)

    Good luck
    Last edited by dc740; 09-17-2018, 07:51 AM.

    Leave a comment:


  • dfyt
    replied
    Well mine was 100% the bios update "Update AGESA 1.0.0.4" F23. Check the manjaro link I provided. With the bios update upto 10 mins to shutdown only when ZFS is installed. Rolled back the bios to F22 "Update AGESA 1.0.0.1a" no more issues.

    https://www.gigabyte.com/us/Motherbo...upport-dl-bios

    Leave a comment:


  • chithanh
    replied
    There are now reports that 4.19_rc kernels will work fine with PinnaclePI-1.0.0.4

    Originally posted by SwooshyCueb View Post
    I wonder why they updated to AGESA 1.0.0.2 when 1.0.0.6 was released over a year ago https://www.anandtech.com/show/11447...sa-1006-update
    Originally posted by lu_tze View Post
    Because version numbers for TR AGESA and Ryzen/AM4 AGESA are different. The article linked is about Ryzen AGESA.
    Actually, Threadripper is also Ryzen...
    But the AGESA version numbers are actually different for different chips (Summt, Raven, Pinnacle).

    Originally posted by dfyt View Post
    So I suspect I am having the same issue https://forum.manjaro.org/t/4-17-9-k...s-zfs/57439/21
    I think this issue is unrelated to ZFS.

    Leave a comment:


  • dfyt
    replied
    So I suspect I am having the same issue https://forum.manjaro.org/t/4-17-9-k...s-zfs/57439/21

    Anyone had any luck with this? I also did a bios update and I can't lose amd_kvm. My board is a Gigabyte X370 K7. Note that booting a LTS kernel fixes the issue.

    Leave a comment:


  • lu_tze
    replied
    Originally posted by SwooshyCueb View Post
    I wonder why they updated to AGESA 1.0.0.2 when 1.0.0.6 was released over a year ago https://www.anandtech.com/show/11447...sa-1006-update
    Because version numbers for TR AGESA and Ryzen/AM4 AGESA are different. The article linked is about Ryzen AGESA.

    Leave a comment:


  • lu_tze
    replied
    Originally posted by djselbeck View Post

    You can add module_blacklist=ccp to your kernel commandline
    While blacklisting ccp allows at least to boot, it breaks kvm_amd.

    Back to 2.30 for me.

    Leave a comment:


  • djselbeck
    replied
    Originally posted by brent View Post
    Or maybe the PSP driver is simply buggy? It's a rather new driver. I wonder if you can disable it without recompiling the kernel.
    You can add module_blacklist=ccp to your kernel commandline

    Leave a comment:


  • SwooshyCueb
    replied
    I wonder why they updated to AGESA 1.0.0.2 when 1.0.0.6 was released over a year ago https://www.anandtech.com/show/11447...sa-1006-update

    Leave a comment:

Working...
X