Announcement

Collapse
No announcement yet.

ASRock BIOS breaks Linux

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • ASRock BIOS breaks Linux

    I've got a number of ASRock AMD systems, both on AM4 (2700X on X470) and on TR4 (1950X on X399).

    Within the past several weeks ASRock pushed updated BIOS to the AMD platform. With their update to PinnaclePI-AM4_1.0.0.4 on AM4 and even more recently ThreadRipperPI-SP3r2 1.1.0.0 for TR4.

    It seems every motherboard that has these new updates is affected and it's quite the impact. I of course am running Arch on all of these machines so they have the latest 4.17.9 kernel that ships stock.

    After systemd starts the system immediately hangs for the first 120 seconds, which seems to be the value set in `/proc/sys/kernel/hung_task_timeout_secs`.

    On AM4 in the kernel log we see:

    ```
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2611 '/devices/pci0000:00/0000:00:07.1/0000:27:00.2' killed [49/8561]
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2702 '/devices/system/cpu/cpu9' killed
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2693 '/devices/system/cpu/cpu14' killed
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2697 '/devices/system/cpu/cpu4' killed
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2687 '/devices/system/cpu/cpu0' killed
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2695 '/devices/system/cpu/cpu2' killed
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2701 '/devices/system/cpu/cpu8' killed
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2700 '/devices/system/cpu/cpu7' killed
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2696 '/devices/system/cpu/cpu3' killed
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2694 '/devices/system/cpu/cpu15' killed
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2699 '/devices/system/cpu/cpu6' killed
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2688 '/devices/system/cpu/cpu1' killed
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2692 '/devices/system/cpu/cpu13' killed
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2690 '/devices/system/cpu/cpu11' killed
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2691 '/devices/system/cpu/cpu12' killed
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2698 '/devices/system/cpu/cpu5' killed
    Jul 18 18:32:46 ryzen systemd-udevd[450]: seq 2689 '/devices/system/cpu/cpu10' killed
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [463] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [463] failed while handling '/devices/pci0000:00/0000:00:07.1/0000:27:00.2'
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [464] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [464] failed while handling '/devices/system/cpu/cpu9'
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [454] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [454] failed while handling '/devices/system/cpu/cpu3'
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [451] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [451] failed while handling '/devices/system/cpu/cpu0'
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [459] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [459] failed while handling '/devices/system/cpu/cpu1'
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [467] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [467] failed while handling '/devices/system/cpu/cpu12'
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [457] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [457] failed while handling '/devices/system/cpu/cpu6'
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [455] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [455] failed while handling '/devices/system/cpu/cpu5'
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [468] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [468] failed while handling '/devices/system/cpu/cpu14'
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [469] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [469] failed while handling '/devices/system/cpu/cpu13'
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [471] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [471] failed while handling '/devices/system/cpu/cpu10'
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [473] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [473] failed while handling '/devices/system/cpu/cpu11'
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [479] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [479] failed while handling '/devices/system/cpu/cpu8'
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [482] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [482] failed while handling '/devices/system/cpu/cpu15'
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [485] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [485] failed while handling '/devices/system/cpu/cpu4'
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [487] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [487] failed while handling '/devices/system/cpu/cpu7'
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [491] terminated by signal 9 (KILL)
    Jul 18 18:32:47 ryzen systemd-udevd[450]: worker [491] failed while handling '/devices/system/cpu/cpu2'
    Jul 18 18:32:59 ryzen kernel: INFO: task systemd-udevd:245 blocked for more than 120 seconds.
    Jul 18 18:32:59 ryzen kernel: Tainted: G O 4.17.9-1-ARCH #1
    Jul 18 18:32:59 ryzen kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Jul 18 18:32:59 ryzen kernel: systemd-udevd D 0 245 1 0x80000000
    Jul 18 18:32:59 ryzen kernel: Call Trace:
    Jul 18 18:32:59 ryzen kernel: ? __schedule+0x282/0x890
    Jul 18 18:32:59 ryzen kernel: ? preempt_count_add+0x68/0xa0
    Jul 18 18:32:59 ryzen kernel: schedule+0x32/0x90
    Jul 18 18:32:59 ryzen kernel: __sev_do_cmd_locked+0xd7/0x270 [ccp]
    Jul 18 18:32:59 ryzen kernel: ? wait_woken+0x80/0x80
    Jul 18 18:32:59 ryzen kernel: ? 0xffffffffc0467000
    Jul 18 18:32:59 ryzen kernel: __sev_platform_init_locked+0x2f/0x80 [ccp]
    Jul 18 18:32:59 ryzen kernel: ? _raw_write_unlock_irqrestore+0x1c/0x30
    Jul 18 18:32:59 ryzen kernel: sev_platform_init+0x1d/0x30 [ccp]
    Jul 18 18:32:59 ryzen kernel: psp_pci_init+0x40/0xe0 [ccp]
    Jul 18 18:32:59 ryzen kernel: ? 0xffffffffc0467000
    Jul 18 18:32:59 ryzen kernel: sp_mod_init+0x16/0x1000 [ccp]
    Jul 18 18:32:59 ryzen kernel: do_one_initcall+0x46/0x1f5
    Jul 18 18:32:59 ryzen kernel: ? free_unref_page_commit+0x70/0xf0
    Jul 18 18:32:59 ryzen kernel: ? kmem_cache_alloc_trace+0xbd/0x1d0
    Jul 18 18:32:59 ryzen kernel: ? do_init_module+0x22/0x210
    Jul 18 18:32:59 ryzen kernel: do_init_module+0x5a/0x210
    Jul 18 18:32:59 ryzen kernel: load_module+0x247a/0x29f0
    Jul 18 18:32:59 ryzen kernel: ? __vfs_read+0x124/0x170
    Jul 18 18:32:59 ryzen kernel: ? __se_sys_finit_module+0x97/0xf0
    Jul 18 18:32:59 ryzen kernel: __se_sys_finit_module+0x97/0xf0
    Jul 18 18:32:59 ryzen kernel: do_syscall_64+0x5b/0x170
    Jul 18 18:32:59 ryzen kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
    Jul 18 18:32:59 ryzen kernel: RIP: 0033:0x7f704225e0f9
    Jul 18 18:32:59 ryzen kernel: RSP: 002b:00007ffd2c605f68 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
    Jul 18 18:32:59 ryzen kernel: RAX: ffffffffffffffda RBX: 000055f9953f6ab0 RCX: 00007f704225e0f9
    Jul 18 18:32:59 ryzen kernel: RDX: 0000000000000000 RSI: 00007f7041b00ecd RDI: 000000000000000c
    Jul 18 18:32:59 ryzen kernel: RBP: 00007f7041b00ecd R08: 0000000000000000 R09: 0000000000000000
    Jul 18 18:32:59 ryzen kernel: R10: 000000000000000c R11: 0000000000000246 R12: 0000000000000000
    Jul 18 18:32:59 ryzen kernel: R13: 000055f9954017f0 R14: 0000000000020000 R15: 000055f9953f6ab0
    ```

    These events keep occuring every 120 seconds causing the system boot time to take a very long time as we must wait for at least one of these failure events. I don't think shutting down the system is possible through software anymore as it keeps recording these events. System performance seems impacted as well.

    Looking at dmesg on threadripper we see a virtually identical error:
    ```
    [ 1351.274639] INFO: task systemd-udevd:377 blocked for more than 120 seconds.
    [ 1351.274644] Tainted: G O 4.17.9-1-ARCH #1
    [ 1351.274646] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 1351.274648] systemd-udevd D 0 377 1 0x80000000
    [ 1351.274651] Call Trace:
    [ 1351.274660] ? __schedule+0x282/0x890
    [ 1351.274664] ? preempt_count_add+0x68/0xa0
    [ 1351.274667] schedule+0x32/0x90
    [ 1351.274674] __sev_do_cmd_locked+0xd7/0x270 [ccp]
    [ 1351.274677] ? wait_woken+0x80/0x80
    [ 1351.274680] ? 0xffffffffc0430000
    [ 1351.274686] __sev_platform_init_locked+0x2f/0x80 [ccp]
    [ 1351.274688] ? _raw_write_unlock_irqrestore+0x1c/0x30
    [ 1351.274693] sev_platform_init+0x1d/0x30 [ccp]
    [ 1351.274698] psp_pci_init+0x40/0xe0 [ccp]
    [ 1351.274699] ? 0xffffffffc0430000
    [ 1351.274704] sp_mod_init+0x16/0x1000 [ccp]
    [ 1351.274707] do_one_initcall+0x46/0x1f5
    [ 1351.274710] ? free_unref_page_commit+0x70/0xf0
    [ 1351.274713] ? kmem_cache_alloc_trace+0xbd/0x1d0
    [ 1351.274716] ? do_init_module+0x22/0x210
    [ 1351.274718] do_init_module+0x5a/0x210
    [ 1351.274721] load_module+0x247a/0x29f0
    [ 1351.274723] ? __vfs_read+0x124/0x170
    [ 1351.274728] ? __se_sys_finit_module+0x97/0xf0
    [ 1351.274730] __se_sys_finit_module+0x97/0xf0
    [ 1351.274734] do_syscall_64+0x5b/0x170
    [ 1351.274736] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [ 1351.274738] RIP: 0033:0x7f15eb8f40f9
    [ 1351.274739] RSP: 002b:00007ffd4568e288 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
    [ 1351.274742] RAX: ffffffffffffffda RBX: 000055c75bb8af00 RCX: 00007f15eb8f40f9
    [ 1351.274743] RDX: 0000000000000000 RSI: 00007f15eb196ecd RDI: 000000000000000c
    [ 1351.274744] RBP: 00007f15eb196ecd R08: 0000000000000000 R09: 0000000000000000
    [ 1351.274745] R10: 000000000000000c R11: 0000000000000246 R12: 0000000000000000
    [ 1351.274746] R13: 000055c75bb998d0 R14: 0000000000020000 R15: 000055c75bb8af00
    ```

    I've raised this with ASRock, but I would be surprised if they know what to do. I might have to escalate this somewhere more capable like AMD support...
    Has anyone else seen this? Surely other people out here are using a modern kernel and recent BIOS?
    Starts
    07-25-2018
    Ends
    07-26-2018

  • #2
    I'm seeing the same issue with an ASRock mainboard and the new BIOS. Just so you know that you are not alone.

    Comment


    • #3
      So I've been trying to follow up on this as more reports are surfacing online.

      I've reached out to Lianbo Jiang as he describes this as a known SEV issue on the lkml.

      I've also heard back from ASRock:
      In BIOS 1.11A, we updated AGESA 1.0.0.2 Patch C.
      From BIOS 1.11 to 1.30, we changed few things such as DRAM timing for performance or default option for power setting.
      There are not going to have the impact of Linux OS by changing above setting.
      The major difference is AGESA code. We updated to latest AGESA PinnaclePI-AM4_1.0.0.4.
      So we guess the AGESA code might be the possible reason.
      If customer is going to use Linux, please choose 1.11A.
      Should also claim again, we still recommend for Windows OS, and both BIOS can work for Windows.
      Also it seems it is the AGESA based on some other discussion on the gigabyte X399 side (here). I will try the kernel change mentioned (CONFIG_CRYPTO_DEV_SP_PSP=n) and see if that helps.

      I have also opened a ticket with AMD to get their take.

      Comment


      • #4
        Credit to tcrider84 on the gigabtye forums, as his kernel config does in fact prevent the errors from happening on at least Threadripper with kernel 4.17.x. I'll roll out to the custom kernel configs to the rest of my AMD systems to validate.

        Ensure that you set:
        Code:
        CONFIG_CRYPTO_DEV_SP_PSP=n
        Still haven't heard back from AMD as it seems to be their fault due to the AGESA code change.

        Comment


        • #5
          Or maybe the PSP driver is simply buggy? It's a rather new driver. I wonder if you can disable it without recompiling the kernel.

          Comment


          • #6
            FWIW it doesn't look like this is an ASRock specific issue. It's a bug either in the PSP firmware of the Linux PSP driver:

            http://forum.gigabyte.us/thread/4818...scrollTo=21659

            Comment


            • #7
              I wonder why they updated to AGESA 1.0.0.2 when 1.0.0.6 was released over a year ago https://www.anandtech.com/show/11447...sa-1006-update

              Comment


              • #8
                Originally posted by brent View Post
                Or maybe the PSP driver is simply buggy? It's a rather new driver. I wonder if you can disable it without recompiling the kernel.
                You can add module_blacklist=ccp to your kernel commandline

                Comment

                Working...
                X