Announcement

Collapse
No announcement yet.

Vega issue

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Vega issue

    So I've got a Vega FE some time ago, and I'm having a major trouble with the open driver.

    The issue is that after ~3-30 minutes of uptime, under X, the card hangs and the tach turns fully on. And I'm unable to recover a console (not even via SysRq).

    I'm still able to use the hwmon, though.

    ​​​​amdgpu_gpu_reset only prints "[drm] No hardware hang detected. Did some blocks stall?" in the dmesg.

    Memory usage has been fine, so it is not a memory leak.

    The card works perfectly under AMDGPU-PRO.

    I'm using the kernel driver from AMDGPU-PRO, with the Mesa userspace driver, because I don't want to quit using the awesome RT patches.

    Oh, and the system's specs:

    - Motherboard: MSI Z170A GAMING PRO
    - Processor: Intel® Core™ i7-6700K (stock freqs)
    - Distro: Arch Linux
    - Kernel version: 4.9.20-4-rt16-rt-bfq
    - LLVM/Clang version: 5.0 RC2
    - Mesa version: 17.2-rc4

    I'll try to provide more information if needed. Thanks for almost any help.

    Oh, one more thing: I don't want to use Debian testing with XFCE and Ubuntu packages.

  • #2
    Does removing the RT patches help?
    You can also try these branches:
    https://cgit.freedesktop.org/~agd5f/...aging-drm-next
    or
    https://cgit.freedesktop.org/~agd5f/...d-staging-4.12

    Comment


    • #3
      Not really. I just tried with amd-staging-drm-next and hung again.

      (although it took much longer to hang)

      Comment


      • #4
        You are using really old kernel, no wonder you vega does not work. Forget your rt patches and use agd5f wip kernels. Make a non debug 1000Hz timer kernel, there is enough real time. Check that you have enabled: Reroute Broken IRQ, Virtualization KVM and 1000Hz CPU timer, I also disabled Swap, Kernel Debug, CPU Freq scaling , Cpu handling in Acpi, Used Bios to control CPU and devices.

        Comment


        • #5
          Originally posted by debianxfce View Post
          You are using really old kernel, no wonder you vega does not work. Forget your rt patches and use agd5f wip kernels. Make a non debug 1000Hz timer kernel, there is enough real time. Check that you have enabled: Reroute Broken IRQ, Virtualization KVM and 1000Hz CPU timer, I also disabled Swap, Kernel Debug, CPU Freq scaling , Cpu handling in Acpi, Used Bios to control CPU and devices.
          I doubt that's going to help. I have most of these things set.

          Also, did you read I tested the amd-staging-drm-next kernel?

          Comment


          • #6
            Originally posted by tildearrow View Post

            I doubt that's going to help. I have most of these things set.

            Also, did you read I tested the amd-staging-drm-next kernel?
            Playing with kernel config you can try to make it stable. Try also https://cgit.freedesktop.org/~agd5f/...d-staging-4.12 kernel. Phoronix did not have such problems, maybe you should try with Debian testing , a custom kernel and Padoka ppa.
            Last edited by debianxfce; 08-20-2017, 05:35 AM.

            Comment


            • #7
              Originally posted by debianxfce View Post

              Playing with kernel config you can try to make it stable. Try also https://cgit.freedesktop.org/~agd5f/...d-staging-4.12 kernel. Phoronix did not have such problems, maybe you should try with Debian testing , a custom kernel and Padoka ppa.
              I'll check if this reproduces using a different distro later, I think.

              Comment


              • #8
                Update: it crashes on AMDGPU-PRO as well, although very rarely (~2 times per month). Is my card defective?

                dmesg: https://pastebin.com/jYrVf5Y4
                Last edited by tildearrow; 08-30-2017, 01:02 AM. Reason: add dmesg

                Comment


                • #9
                  Originally posted by tildearrow View Post
                  Update: it crashes on AMDGPU-PRO as well, although very rarely (~2 times per month). Is my card defective?

                  dmesg: https://pastebin.com/jYrVf5Y4
                  You are using some old and weird kernel;
                  1. [12411.895407] Tainted: GF O 4.9.20-4-rt16-rt-bfq #1
                  Try with Debian testing Xfce, a custom ~agd5f/linux/log/?h=amd-staging-4.12 kernel and Padoka ppa.

                  Comment

                  Working...
                  X