Announcement

Collapse
No announcement yet.

R7 260X GPU lockup on peak temperature and fails to resume dpm.

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • R7 260X GPU lockup on peak temperature and fails to resume dpm.

    Hey guys, I'm having this GPU lockup while playing XCOM, the screen goes black and then resumes after a while with incredible lower performance.
    My exact environment is as follow:

    I'm Debian Jessie, stock kernel 3.14-2-amd64, no present xorg configuration, Radeon R7 x260 Powercolor brand.
    It usually works perfectly fine, then I tried playing XCOM which is one of the most demanding games available right now for me, performance is fine everything works peach even on high settings.
    After a while I get stuck, black screen, a few seconds later image returns and huge drop in performance, digging it up on the log files I've found a rise in temperature, cpu lockup and reset, after the reset dpm setting fails and that's causing the drop in performance.

    Some logs of interest: dmesg | grep radeon
    Code:
    [ 1104.485875] radeon 0000:01:00.0: GPU lockup CP stall for more than 10020msec
    [ 1104.485884] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000048fc9 last fence id 0x0000000000048fc3 on ring 0)
    [ 1104.486014] radeon 0000:01:00.0: failed to get a new IB (-35)
    [ 1104.520164] radeon 0000:01:00.0: sa_manager is not empty, clearing anyway
    [ 1104.527391] radeon 0000:01:00.0: Saved 1460 dwords of commands on ring 0.
    [ 1104.527412] radeon 0000:01:00.0: GPU softreset: 0x00000009
    [ 1104.527416] radeon 0000:01:00.0:   GRBM_STATUS=0xE5D00028
    [ 1104.527420] radeon 0000:01:00.0:   GRBM_STATUS2=0x50000008
    [ 1104.527424] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0xEC400000
    [ 1104.527428] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0xEC400000
    [ 1104.527431] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
    [ 1104.527435] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
    [ 1104.527438] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
    [ 1104.527442] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
    [ 1104.527446] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
    [ 1104.527449] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEED57
    [ 1104.527453] radeon 0000:01:00.0:   CP_STAT = 0x84038600
    [ 1104.527456] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000c00
    [ 1104.527460] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00010000
    [ 1104.527464] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000400
    [ 1104.527467] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x00000006
    [ 1104.527470] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000003
    [ 1104.527474] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x80000063
    [ 1104.527478] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000000
    [ 1104.527481] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
    [ 1104.527485] radeon 0000:01:00.0:   CP_CPC_STATUS = 0x00000000
    [ 1104.527488] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
    [ 1104.527492] radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
    [ 1104.535760] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00010001
    [ 1104.535816] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
    [ 1104.536967] radeon 0000:01:00.0:   GRBM_STATUS=0x00003028
    [ 1104.536971] radeon 0000:01:00.0:   GRBM_STATUS2=0x00000008
    [ 1104.536974] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000006
    [ 1104.536978] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000006
    [ 1104.536981] radeon 0000:01:00.0:   GRBM_STATUS_SE2=0x00000006
    [ 1104.536984] radeon 0000:01:00.0:   GRBM_STATUS_SE3=0x00000006
    [ 1104.536988] radeon 0000:01:00.0:   SRBM_STATUS=0x20000040
    [ 1104.536991] radeon 0000:01:00.0:   SRBM_STATUS2=0x00000000
    [ 1104.536995] radeon 0000:01:00.0:   SDMA0_STATUS_REG   = 0x46CEE557
    [ 1104.536998] radeon 0000:01:00.0:   SDMA1_STATUS_REG   = 0x46CEED57
    [ 1104.537002] radeon 0000:01:00.0:   CP_STAT = 0x00000000
    [ 1104.537005] radeon 0000:01:00.0:   CP_STALLED_STAT1 = 0x00000000
    [ 1104.537009] radeon 0000:01:00.0:   CP_STALLED_STAT2 = 0x00000000
    [ 1104.537012] radeon 0000:01:00.0:   CP_STALLED_STAT3 = 0x00000000
    [ 1104.537015] radeon 0000:01:00.0:   CP_CPF_BUSY_STAT = 0x00000000
    [ 1104.537019] radeon 0000:01:00.0:   CP_CPF_STALLED_STAT1 = 0x00000000
    [ 1104.537022] radeon 0000:01:00.0:   CP_CPF_STATUS = 0x00000000
    [ 1104.537026] radeon 0000:01:00.0:   CP_CPC_BUSY_STAT = 0x00000000
    [ 1104.537029] radeon 0000:01:00.0:   CP_CPC_STALLED_STAT1 = 0x00000000
    [ 1104.537032] radeon 0000:01:00.0:   CP_CPC_STATUS = 0x00000000
    [ 1104.537056] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
    [ 1104.758290] [drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed
    [ 1104.760697] radeon 0000:01:00.0: WB enabled
    [ 1104.760743] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000080000c00 and cpu addr 0xffff8802355b3c00
    [ 1104.760745] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000080000c04 and cpu addr 0xffff8802355b3c04
    [ 1104.760747] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000080000c08 and cpu addr 0xffff8802355b3c08
    [ 1104.760748] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000080000c0c and cpu addr 0xffff8802355b3c0c
    [ 1104.760750] radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000080000c10 and cpu addr 0xffff8802355b3c10
    [ 1104.761135] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000076c98 and cpu addr 0xffffc90011336c98
    [ 1104.951746] [drm:cik_ring_test] *ERROR* radeon: ring 1 test failed (scratch(0x3010C)=0xCAFEDEAD)
    sensor data:
    Code:
    radeon-pci-0100
    Adapter: PCI adapter
    temp1:        +53.0?C  (crit =  +0.0?C, hyst =  +0.0?C)
    
    fam15h_power-pci-00c4
    Adapter: PCI adapter
    power1:       73.69 W  (crit =  94.99 W)
    
    k10temp-pci-00c3
    Adapter: PCI adapter
    temp1:        +36.6?C  (high = +70.0?C)
                           (crit = +90.0?C, hyst = +87.0?C)
    From yesterday I was checking it and I think temperature reached around 65?C.
    I think it reaches a peak of 70?C before the lockup, right now its sitting at a 52?C with close to no demand.
    Its the first time I'm experiencing this, also I'm in Brazil and days are getting very hot. About a constant 27?C room temperature.
    Could this only be happening because the Powercolor card fan isn't enough to keep it working?

    After the lockup it apparently fails to resume power management, sitting at its lower setting, requiring a reboot to work properly again.
    I'm actually glad the card isn't overheating, avoiding permanent damage. But preferably I would like some suggestion on how to avoid the lock-ups at all.
    Later this day I'll add another case fan right above the graphic card and check if that helps, on colder days it wasn't really necessary.

    So any users experienced this? Any suggestions?
Working...
X