Announcement

Collapse
No announcement yet.

AMDGPU Reset Recovery To Be Flipped On By Default For Newer Radeon GPUs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by dwagner View Post
    I tried amdgpu.gpu_recovery=1 in the past, but for the system crashes I experience when using amdgpu.dc=1 it makes no difference whether amdgpu.gpu_recovery is set to 0 or 1 - the system crashes hard either way.
    It wasn't ready. Apparently now it is.

    Comment


    • #32
      Originally posted by shmerl View Post
      It wasn't ready. Apparently now it is.
      Not sure if I missed an irony tag, but the same hope was stated when amdgpu.dc=1 became the default.

      Comment


      • #33
        Originally posted by dwagner View Post
        Not sure if I missed an irony tag, but the same hope was stated when amdgpu.dc=1 became the default.
        I don't think anyone declared recovery ready before.

        Comment


        • #34
          This'll be handy on my PRIME system, when it overheats it usually leads to lockups where a GPU reset might fix it

          Comment


          • #35
            Everyone who tries to run Mario Kart 8 under Cemu in wine on a Vega64/56, will get a system-wide freeze which can only be recovered from via REISUB or the reset switch.
            They're not making any progress here either: https://bugs.freedesktop.org/show_bug.cgi?id=105251#c17

            I test this every Sunday morning after updating everything. Hopefully this patch will save me from having to reset at least.

            Comment


            • #36
              Originally posted by wizard69 View Post
              Interesting as my Ryzen Mobile does hang randomly but infrequently. I’m not even sure it is a GPU hang though it certainly feels like it.
              For those well versed in Linux what is the best way to turn this on with a new distro like Fedora 29?
              Current suggestion is to do fresh install to resolve the issue from users running on updated beta release.
              Test done on HP Envy x360 Ryzen 2500U.

              Comment


              • #37
                Originally posted by FireBurn View Post
                This'll be handy on my PRIME system, when it overheats it usually leads to lockups where a GPU reset might fix it
                That's not a fix. A fix is to fix the overheating problem. Overheating is not a normal operating condition for a computer or pretty much for any system electric or otherwise. That's why it's called OVERheating.

                Comment


                • #38
                  Originally posted by Brisse View Post

                  That's not a fix. A fix is to fix the overheating problem. Overheating is not a normal operating condition for a computer or pretty much for any system electric or otherwise. That's why it's called OVERheating.
                  Either way, the graphics card going titsup shouldn't take out the whole system

                  Comment


                  • #39
                    Originally posted by tildearrow View Post
                    I can't encode in H.264... (the encoding slice is now present but is useless)
                    Just curious do you have this error in your dmesg ?
                    [drm] dce110_link_encoder_construct: Failed to get encoder_cap_info from VBIOS with error code 4!



                    Those who would give up Essential Liberty to purchase a little Temporary Safety,deserve neither Liberty nor Safety.
                    Ben Franklin 1755

                    Comment


                    • #40
                      Originally posted by Med_ View Post
                      This is good news. I consistently get hangs with games. Typically once every few hours. I do not bother reporting them as I cannot reproduce on demand and the bug tracker is full of them with similar logs ([drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=145101, last emitted seq=145103). I have activated the option, we will see whether that at least prevents the power button treatment.
                      Hi,

                      I get same issue on my Gigabyte Radeon RX VEGA 64 GAMING OC 8G, see below

                      28 22:23:41 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=2945324, emitted seq=2945326
                      28 22:23:41 kernel: [drm] GPU recovery disabled.
                      --> Had to hold power button
                      -- Reboot --

                      I have been logging GPU temperature via lm-sensors, the critical is set to 89.0°C and I get max 72°C while playing DOOM 2016 in 4K, so I guess, that GPU is not overheating as someone mentioned, the PC is brand new, so no settled dust, or anything like that.
                      There is a review at Tom`s Hardware, they have measured 74-75°C
                      If there were enough Vega GPUs to go around, Gigabyte's Radeon RX Vega 64 OC 8G would probably be great for high-end performance at a reasonable price. Unfortunately, a lack of availability means you probably won't be able to find one.


                      Other interesting kernel messages I get:

                      30 17:54:49 kernel: [drm:amdgpu_ctx_mgr_entity_fini [amdgpu]] *ERROR* ctx 000000003349f739 is still alive
                      30 17:54:49 kernel: [drm:amdgpu_ctx_mgr_fini [amdgpu]] *ERROR* ctx 000000003349f739 is still alive
                      --> freeze while running command "shutdown -h now"

                      I am using xubuntu 18.04.1 LTS, kernel 4.19 mainline from kernel.ubuntu.com with Padoka Stable PPA repo.

                      Gigabyte released F2 VGA BIOS for my card I am going to give it try.










                      Comment

                      Working...
                      X