khnazile My memory clocks aren't locked though. The card can decide when to use which mclk/sclk/socclk DPM state at any time. I've also never experienced this with stock settings. Now I'm curious about why this affects some people, but not others…
Announcement
Collapse
No announcement yet.
Annoying AMD Linux Graphics Driver Crashes With "Timed Out Fences" Has A Fix Coming
Collapse
X
-
Something that made me think of like this: according to those engineers who repair failed mining equipment, AMD chips have those weird failures when GPU stays partially operational way more often than nvidia counterparts. That makes diagnostics much harder, and it's one of the reasons why many of them refuse to do repairs of AMD hardware.
Comment
-
I think these patches were the ones from drm-next that tried to improve the situation:
It can happen that we query the sequence value before the callback had a chance to run. Workaround that by grabbing the fence lock and releasing it again. Should be replaced by hw handling soon. Signed-off-by: Christian König CC: [email protected] # 5.19+ Fixes:...
Setting this flag on a scheduler fence prevents pipelining of jobs depending on this fence. In other words we always insert a full CPU round trip before dependen jobs are pushed to the pipeline. Signed-off-by: Christian König
Make sure that we always have a CPU round trip to let the submission code correctly decide if a TLB flush is necessary or not. Signed-off-by: Christian König
They improve the situation some what, e.g. plasmashell or kwin were broke after a few hours because of the issue work fine now.
However Firefox still freezes and wait for the card to be available I think, sometimes Firefox reacts again and sometimes I just give up, kill firefox and start it again.
Comment
-
Originally posted by khnazile View Post
I believe that it's some sort of racing condition and chip quality issue. Better chips are faster to leave 'dangerous' state, so there's much lower chance that they hit the issue.
Something that made me think of like this: according to those engineers who repair failed mining equipment, AMD chips have those weird failures when GPU stays partially operational way more often than nvidia counterparts. That makes diagnostics much harder, and it's one of the reasons why many of them refuse to do repairs of AMD hardware.
However, regarding cases of driver issues on Linux, but not on Windows, I think this is simply because Windows's WDDM driver infrastructure is superior and more robust than what Linux has. A classic example is incorrect graphics api usage: On Linux it will cause GPU hang, but on Windows it will only cause an app crash. Windows is also way better at handling driver resets.Last edited by user1; 15 November 2022, 10:58 AM.
Comment
Comment