Announcement

Collapse
No announcement yet.

Kernel 5.12 with amdgpu issues?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Kernel 5.12 with amdgpu issues?

    Hi there,
    since I use a kernel 5.12 (currently 5.12.6) I have freezes with KDE Plasma. dmesg gives me these error messages I post below. Are there known severe bugs with kernel 5.12 and amdgpu? Any workaround known? I do not use any special options with amdgpu.

    System:
    kernel 5.12.6
    KDE Plasma 5.18.6

    Code:
    [ 1345.093153] amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 2470 thread X:cs0 pid 2528)
    [ 1345.093165] amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800101600000 from client 27
    [ 1345.093175] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
    [ 1345.093179] amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
    [ 1345.093182] amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
    [ 1345.093185] amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
    [ 1345.093187] amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
    [ 1345.093190] amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
    [ 1345.093192] amdgpu 0000:05:00.0: amdgpu: RW: 0x0
    [ 1345.093200] amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 2470 thread X:cs0 pid 2528)
    [ 1345.093205] amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800101603000 from client 27
    [ 1345.093215] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
    [ 1345.093217] amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
    [ 1345.093220] amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
    [ 1345.093222] amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
    [ 1345.093225] amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
    [ 1345.093227] amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
    [ 1345.093229] amdgpu 0000:05:00.0: amdgpu: RW: 0x0
    [ 1345.093235] amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 2470 thread X:cs0 pid 2528)
    [ 1345.093241] amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800101606000 from client 27
    [ 1345.093250] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
    [ 1345.093253] amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
    [ 1345.093256] amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
    [ 1345.093258] amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
    [ 1345.093260] amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
    [ 1345.093262] amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
    [ 1345.093265] amdgpu 0000:05:00.0: amdgpu: RW: 0x0
    [ 1345.093270] amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 2470 thread X:cs0 pid 2528)
    [ 1345.093275] amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800101609000 from client 27
    [ 1345.093285] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
    [ 1345.093287] amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
    [ 1345.093290] amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
    [ 1345.093292] amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
    [ 1345.093294] amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
    [ 1345.093297] amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
    [ 1345.093299] amdgpu 0000:05:00.0: amdgpu: RW: 0x0
    [ 1345.093304] amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process X pid 2470 thread X:cs0 pid 2528)
    [ 1345.093309] amdgpu 0000:05:00.0: amdgpu: in page starting at address 0x800101601000 from client 27
    [ 1345.093318] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
    [ 1345.093321] amdgpu 0000:05:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
    [ 1345.093324] amdgpu 0000:05:00.0: amdgpu: MORE_FAULTS: 0x1
    [ 1345.093326] amdgpu 0000:05:00.0: amdgpu: WALKER_ERROR: 0x0
    [ 1345.093344] amdgpu 0000:05:00.0: amdgpu: PERMISSION_FAULTS: 0x3
    [ 1345.093347] amdgpu 0000:05:00.0: amdgpu: MAPPING_ERROR: 0x0
    [ 1345.093349] amdgpu 0000:05:00.0: amdgpu: RW: 0x0

  • #2
    After doing some search I now found out, this is a known issue with amdgpu and AMD hardware, it seems to affect both mobile AMD Ryzen APUs like mine and desktop Ryzen with descrete AMD GPU hardware.

    This seems to have occured early April ´21 with kernel 5.12 and/or Linux firmware from that date.

    Some believe it has to do with amdgpu reset function. Maybe amdgpu.reset_method offers some workaround?

    Comment


    • #3
      You are getting a GPU page fault (GPU accessing memory not mapped into it's virtual address space). Did you also update Mesa? Can you narrow down what component update caused this?

      Comment


      • #4
        Originally posted by agd5f View Post
        You are getting a GPU page fault (GPU accessing memory not mapped into it's virtual address space). Did you also update Mesa? Can you narrow down what component update caused this?
        This issue can´t be marked solved, but explainable and workaround exists.

        This has occured when using a recent / "bleeding edge" GPU firmware for my Ryzen 5 2500U (Raven 1).
        When using a older firmware (before 03-2021) this freezes do not happen, thus I currently stick with the stable firmware.
        Other Ryzen APU / GPU users could confirm this, freezes with the latest GPU firmware, no freezes with older GPU firmware.

        Note: this is about the GPU firmware, not the CPU microcode.

        Comment


        • #5
          Can you try the latest firmware from the linux-firmware tree?

          Comment

          Working...
          X