Announcement

Collapse
No announcement yet.

4.19.80 amdgpu powerplay crash Linux kernel

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • 4.19.80 amdgpu powerplay crash Linux kernel

    Tried update Linux kernel from 4.19.66 to 4.19.80.
    But it always hangsup with long backtraces:
    Code:
    [ 629.301429] calling amdgpu_init+0x0/0x78 [amdgpu] @ 9350
    [ 629.301452] [drm] amdgpu kernel modesetting enabled.
    [ 629.301662] [drm] initializing kernel modesetting (VEGA10 0x1002:0x687F 0x1043:0x04C4 0xC1).
    [ 629.301728] [drm] register mmio base: 0xFCD00000
    [ 629.301728] [drm] register mmio size: 524288
    [ 629.301736] [drm] add ip block number 0 <soc15_common>
    [ 629.301737] [drm] add ip block number 1 <gmc_v9_0>
    [ 629.301737] [drm] add ip block number 2 <vega10_ih>
    [ 629.301737] [drm] add ip block number 3 <psp>
    [ 629.301738] [drm] add ip block number 4 <powerplay>
    [ 629.301738] [drm] add ip block number 5 <dm>
    [ 629.301738] [drm] add ip block number 6 <gfx_v9_0>
    [ 629.301739] [drm] add ip block number 7 <sdma_v4_0>
    [ 629.301739] [drm] add ip block number 8 <uvd_v7_0>
    [ 629.301739] [drm] add ip block number 9 <vce_v4_0>
    [ 629.301744] [drm] UVD(0) is enabled in VM mode
    [ 629.301744] [drm] UVD(0) ENC is enabled in VM mode
    [ 629.301744] [drm] VCE enabled in VM mode
    [ 629.301762] amdgpu 0000:0b:00.0: No more image in the PCI ROM
    [ 629.301777] ATOM BIOS: 115-D050PIL-100
    [ 629.301800] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
    [ 629.301805] amdgpu 0000:0b:00.0: VRAM: 8176M 0x000000F400000000 - 0x000000F5FEFFFFFF (8176M used)
    [ 629.301806] amdgpu 0000:0b:00.0: GART: 512M 0x000000F600000000 - 0x000000F61FFFFFFF
    [ 629.301809] [drm] Detected VRAM RAM=8176M, BAR=256M
    [ 629.301809] [drm] RAM width 2048bits HBM
    [ 629.301864] [TTM] Zone kernel: Available graphics memory: 16428114 kiB
    [ 629.301865] [TTM] Zone dma32: Available graphics memory: 2097152 kiB
    [ 629.301865] [TTM] Initializing pool allocator
    [ 629.301868] [TTM] Initializing DMA pool allocator
    [ 629.301886] [drm] amdgpu: 8176M of VRAM memory ready
    [ 629.301886] [drm] amdgpu: 8176M of GTT memory ready.
    [ 629.301893] [drm] GART: num cpu pages 131072, num gpu pages 131072
    [ 629.302017] [drm] PCIE GART of 512M enabled (table at 0x000000F400900000).
    [ 629.302815] [drm] use_doorbell being set to: [true]
    [ 629.302853] [drm] use_doorbell being set to: [true]
    [ 629.302887] [drm] Found UVD firmware Version: 65.29 Family ID: 17
    [ 629.302889] [drm] PSP loading UVD firmware
    [ 629.303429] [drm] Found VCE firmware Version: 57.1 Binary ID: 4
    [ 629.303436] [drm] PSP loading VCE firmware
    [ 629.543039] [drm] Display Core initialized with v3.1.59!
    [ 629.558335] [drm] SADs count is: -2, don't need to read it
    [ 629.583284] [drm] SADs count is: -2, don't need to read it
    [ 629.595588] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
    [ 629.595589] [drm] Driver supports precise vblank timestamp query.
    [ 630.121925] clocksource: timekeeping watchdog on CPU11: Marking clocksource 'tsc' as unstable because the skew is too large:
    [ 630.121932] clocksource: 'hpet' wd_now: 19e25ec4 wd_last: 195b5dd6 mask: ffffffff
    [ 630.121934] clocksource: 'tsc' cs_now: 2214e3d3930 cs_last: 220df4a0464 mask: ffffffffffffffff
    [ 630.121938] tsc: Marking TSC unstable due to clocksource watchdog
    [ 630.121951] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
    [ 630.121953] sched_clock: Marking unstable (630128957938, -7008050)<-(630237120456, -115170579)
    [ 630.525586] clocksource: Switched to clocksource hpet
    [ 631.332453] amdgpu: [powerplay] Failed message: 0x4, input parameter: 0x10, error code: 0xffffffff
    [ 631.332455] ------------[ cut here ]------------
    [ 631.332517] WARNING: CPU: 8 PID: 9350 at drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c:1390 uvd_v7_0_ring_insert_nop+0x31/0x160 [amdgpu]
    [ 631.332518] Modules linked in: amdgpu(+) mfd_core chash gpu_sched ttm backlight tun af_packet bridge stp ipv6 llc iptable_filter ip_tables x_tables binfmt_misc nls_cp1251 nls_cp866 vfat input_leds led_class joydev ramoops pstore reed_solomon it87_wdt evdev btusb btrtl btbcm btintel hwmon_vid bluetooth msr snd_hda_codec_realtek pci_stub jitterentropy_rng vboxpci(O) snd_hda_codec_generic uvcvideo hmac videobuf2_vmalloc videobuf2_memops vboxnetadp(O) videobuf2_v4l2 videobuf2_common drbg vboxnetflt(O) hid_logitech_hidpp videodev ecdh_generic snd_hda_intel rfkill snd_hda_codec edac_mce_amd snd_hwdep snd_hda_core kvm_amd snd_pcm kvm snd_timer pcspkr irqbypass vboxdrv(O) efivars snd i2c_piix4 k10temp ccp sha1_generic button acpi_cpufreq xts aesni_intel crypto_simd cryptd glue_helper aes_x86_64 cbc sha256_generic
    [ 631.332546] macvlan r8169 libphy igb ptp pps_core msdos fat efivarfs squashfs zstd_decompress xxhash loop fuse nfs lockd grace sunrpc multipath linear raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 md_mod firewire_core crc_itu_t hid_logitech_dj hid_logitech ff_memless usb_storage sr_mod cdrom sg
    [ 631.332563] CPU: 8 PID: 9350 Comm: modprobe Tainted: G O 4.19.80 #4
    [ 631.332563] Hardware name: System manufacturer System Product Name/ROG CROSSHAIR VI EXTREME, BIOS 6903 03/19/2019
    [ 631.332596] RIP: 0010:uvd_v7_0_ring_insert_nop+0x31/0x160 [amdgpu]
    [ 631.332597] Code: 54 55 53 48 89 fb 48 83 ec 08 4c 8b 2f f6 87 e8 01 00 00 01 0f 84 01 01 00 00 48 c7 c7 e0 0d 9f a0 89 74 24 04 e8 d8 b0 7f e0 <0f> 0b 8b 74 24 04 d1 ee 89 f5 0f 84 04 01 00 00 8b 83 00 02 00 00
    [ 631.332598] RSP: 0018:ffffc90000873a00 EFLAGS: 00010282
    [ 631.332598] RAX: 0000000000000024 RBX: ffff888790286f98 RCX: 0000000000000006
    [ 631.332599] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff8887fec15310
    [ 631.332599] RBP: ffff888790280000 R08: 0000000000000991 R09: 0000000000000001
    [ 631.332600] R10: 0000000000000000 R11: 0000000000000001 R12: 00000000000083bd
    [ 631.332600] R13: ffff888790280000 R14: ffff888790280000 R15: ffff888790280000
    [ 631.332601] FS: 00007ffff7d88740(0000) GS:ffff8887fec00000(0000) knlGS:0000000000000000
    [ 631.332602] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 631.332602] CR2: 0000555555812fb0 CR3: 00000007fb70e000 CR4: 00000000003406e0
    [ 631.332602] Call Trace:
    [ 631.332632] amdgpu_ring_commit+0x30/0x80 [amdgpu]
    [ 631.332664] uvd_v7_0_ring_test_ring+0x108/0x200 [amdgpu]
    [ 631.332695] uvd_v7_0_hw_init+0x11bc/0x1b40 [amdgpu]
    [ 631.332732] amdgpu_device_init.cold.15+0xd06/0xe55 [amdgpu]
    [ 631.332759] amdgpu_driver_load_kms+0x75/0x1e0 [amdgpu]
    [ 631.332762] drm_dev_register+0x104/0x140
    [ 631.332787] amdgpu_pci_probe+0x132/0x1b0 [amdgpu]
    [ 631.332789] pci_device_probe+0xd0/0x150
    [ 631.332791] really_probe+0x1f7/0x270
    [ 631.332792] driver_probe_device+0x8d/0xb0
    [ 631.332793] __driver_attach+0xbc/0xc0
    [ 631.332794] ? driver_probe_device+0xb0/0xb0
    [ 631.332795] bus_for_each_dev+0x5e/0x90
    [ 631.332796] bus_add_driver+0x197/0x1e0
    [ 631.332797] ? 0xffffffffa0adb000
    [ 631.332798] driver_register+0x66/0xb0
    [ 631.332799] ? 0xffffffffa0adb000
    [ 631.332800] do_one_initcall+0x3a/0x197
    [ 631.332802] ? __vunmap+0x75/0xb0
    [ 631.332804] ? _cond_resched+0x14/0x30
    [ 631.332806] do_init_module+0x55/0x1e0
    [ 631.332807] load_module+0x2275/0x2440
    [ 631.332809] ? __se_sys_finit_module+0x88/0xa0
    [ 631.332810] __se_sys_finit_module+0x88/0xa0
    [ 631.332811] do_syscall_64+0x43/0x100
    [ 631.332813] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [ 631.332814] RIP: 0033:0x7ffff7e90039
    [ 631.332815] Code: 00 00 00 75 05 48 83 c4 18 c3 e8 c2 94 01 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 9e 0c 00 f7 d8 64 89 01 48
    [ 631.332815] RSP: 002b:00007fffffffca88 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
    [ 631.332816] RAX: ffffffffffffffda RBX: 0000555555779e50 RCX: 00007ffff7e90039
    [ 631.332816] RDX: 0000000000000000 RSI: 000055555577c700 RDI: 0000000000000008
    [ 631.332817] RBP: 000055555577c700 R08: 0000000000000000 R09: 000055555577a7d0
    [ 631.332817] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000000
    [ 631.332817] R13: 0000555555779f80 R14: 0000000000040000 R15: 0000555555779e50
    [ 631.332818] ---[ end trace b91173db8cd45c68 ]---
    [ 634.762176] sched: RT throttling activated
    [ 636.376950] hrtimer: interrupt took 605155753 ns
    [ 691.332706] rcu: INFO: rcu_sched self-detected stall on CPU
    [ 691.332706] rcu: 8-....: (3048 ticks this GP) idle=bf6/1/0x4000000000000002 softirq=9891/9898 fqs=2890
    [ 691.332706] rcu: (t=60016 jiffies g=24889 q=640)
    [ 691.332706] NMI backtrace for cpu 8
    [ 691.348031] CPU: 8 PID: 9350 Comm: modprobe Tainted: G W O 4.19.80 #4
    [ 691.348035] Hardware name: System manufacturer System Product Name/ROG CROSSHAIR VI EXTREME, BIOS 6903 03/19/2019
    [ 691.348036] Call Trace:
    [ 691.348036] <IRQ>
    [ 691.348044] dump_stack+0x46/0x60
    [ 691.348048] nmi_cpu_backtrace.cold.0+0x13/0x50
    [ 691.348048] ? lapic_can_unplug_cpu.cold.6+0x35/0x35
    [ 691.348048] nmi_trigger_cpumask_backtrace+0x8f/0x91
    [ 691.348048] rcu_dump_cpu_stacks+0x82/0xad
    [ 691.348048] rcu_check_callbacks.cold.62+0x1db/0x335
    [ 691.348048] ? tick_sched_handle.isra.6+0x40/0x40
    [ 691.348048] update_process_times+0x23/0x60
    [ 691.348065] tick_sched_handle.isra.6+0x30/0x40
    [ 691.348065] tick_sched_timer+0x36/0x70
    [ 691.348065] __hrtimer_run_queues+0xfe/0x280
    [ 691.348065] hrtimer_interrupt+0xfb/0x210
    [ 691.348065] smp_apic_timer_interrupt+0x62/0x130
    [ 691.348065] apic_timer_interrupt+0xf/0x20
    [ 691.348065] </IRQ>
    [ 691.348065] RIP: 0010:amdgpu_mm_rreg+0x7b/0xd0 [amdgpu]
    [ 691.348065] Code: 05 00 00 8b 68 04 4c 89 ef e8 f1 e4 dc e0 8b 05 33 33 29 00 85 c0 7f 19 5b 89 e8 5d 41 5c 41 5d c3 48 03 87 d8 05 00 00 8b 28 <eb> e2 e9 6e e5 02 00 48 8b 43 10 44 0f b7 68 3e 65 8b 05 ae 93 81
    [ 691.348065] RSP: 0018:ffffc90000873a28 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
    [ 691.348065] RAX: ffffc900097a0ef4 RBX: ffff888790280000 RCX: 0000000000000002
    [ 691.348065] RDX: 0000000000000000 RSI: 00000000000083bd RDI: ffff888790280000
    [ 691.348065] RBP: 00000000ffffffff R08: 00000000ffffffff R09: 0000000000000000
    [ 691.348065] R10: 0000000000000002 R11: 00000000000000f0 R12: 00000000000083bd
    [ 691.348065] R13: 0000000000000000 R14: 00000000ffffffff R15: ffff888790280000
    [ 691.348065] uvd_v7_0_ring_test_ring+0x149/0x200 [amdgpu]
    [ 691.348065] uvd_v7_0_hw_init+0x11bc/0x1b40 [amdgpu]
    [ 691.348065] amdgpu_device_init.cold.15+0xd06/0xe55 [amdgpu]
    [ 691.348065] amdgpu_driver_load_kms+0x75/0x1e0 [amdgpu]
    [ 691.348065] drm_dev_register+0x104/0x140
    [ 691.348065] amdgpu_pci_probe+0x132/0x1b0 [amdgpu]
    [ 691.348065] pci_device_probe+0xd0/0x150
    [ 691.348065] really_probe+0x1f7/0x270
    [ 691.348065] driver_probe_device+0x8d/0xb0
    [ 691.348065] __driver_attach+0xbc/0xc0
    [ 691.348065] ? driver_probe_device+0xb0/0xb0
    [ 691.348065] bus_for_each_dev+0x5e/0x90
    [ 691.348065] bus_add_driver+0x197/0x1e0
    [ 691.348065] ? 0xffffffffa0adb000
    [ 691.348065] driver_register+0x66/0xb0
    [ 691.348065] ? 0xffffffffa0adb000
    [ 691.348065] do_one_initcall+0x3a/0x197
    [ 691.348065] ? __vunmap+0x75/0xb0
    [ 691.348065] ? _cond_resched+0x14/0x30
    [ 691.348065] do_init_module+0x55/0x1e0
    [ 691.348065] load_module+0x2275/0x2440
    [ 691.348065] ? __se_sys_finit_module+0x88/0xa0
    [ 691.348065] __se_sys_finit_module+0x88/0xa0
    [ 691.348065] do_syscall_64+0x43/0x100
    [ 691.348065] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [ 691.348065] RIP: 0033:0x7ffff7e90039
    [ 691.348065] Code: 00 00 00 75 05 48 83 c4 18 c3 e8 c2 94 01 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 9e 0c 00 f7 d8 64 89 01 48
    [ 691.348065] RSP: 002b:00007fffffffca88 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
    [ 691.348065] RAX: ffffffffffffffda RBX: 0000555555779e50 RCX: 00007ffff7e90039
    [ 691.348065] RDX: 0000000000000000 RSI: 000055555577c700 RDI: 0000000000000008
    [ 691.348065] RBP: 000055555577c700 R08: 0000000000000000 R09: 000055555577a7d0
    [ 691.348065] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000000
    [ 691.348065] R13: 0000555555779f80 R14: 0000000000040000 R15: 0000555555779e50
    Any ideas how to solve it?

    PS:
    I want stay on LTS kernels only.
Working...
X