Announcement

Collapse
No announcement yet.

Error initializing 580 as eGPU - hunch that Vega M is causing problems

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Error initializing 580 as eGPU - hunch that Vega M is causing problems

    Hello All!

    I've been struggling to get my RX 580 working as an eGPU over Thunderbolt 3 - it appears that there might be a bug with amdgpu initializing the RX 580.

    I've opened a bug report here:


    My hunch is that the Vega M (part of Kaby Lake G) is somehow interfering with the initialization of the eGPU (perhaps something to do with it's power management). I've run across a couple other folks that also have laptops with Vega M that have run into similar problems with running AMD GPUs as eGPUs.
    See: https://forum.manjaro.org/t/rx-580-i...gpu-dock/58210

    I'm curious is anybody knows a mechanism for disabling the Vega M using kernel boot parameters? I've poured over the amdgpu documentation and have tried a variety of different boot parameters, I've also tried allocating the Vega M's PCI ID as part of the "pci-stub" kernel module to try to prevent amdgpu from binding to it, but this didn't work.

    I've tried blacklisting amdgpu entirely, but then the eGPU doesn't work - as it also wants to use amdgpu. There also doesn't appear to be a way to blacklist a specific device, but perhaps I'm wrong?

    Unfortunately the BIOS for my Dell 9575 does not let me disable the Vega M, so I need to figure out another way to do this.

    My main goal is to determine if the eGPU works *if* the Vega M is completely disabled. I'm hoping this can help folks pinpoint the bug.

    Thanks for any help and/or assistance you can offer!

  • #2
    Thanks for the suggestion!

    I tried to disable the PCI device using the sysfs approach you linked to - but I was unsuccessful. Removing the Vega M using that approach caused a hard lockup of my system.

    I'm still digging into the issue, and even compiled 4.19 with a hack to give the eGPU more time to initialize (5 seconds up to 15 seconds) - but I still ran into the same problem. Hopefully some of the amdgpu developers will get a chance to look at this bug, as it's very annoying.

    I did confirm that I could get the eGPU enclosure working with a Nvidia GPU (GTX 1060), so the eGPU enclosure over the TB3 interface appears to be sound. I've been trying really hard to move away from Nvidia hardware over the past couple of years - hence my excitement at getting the RX 580 setup correctly as an eGPU.

    Comment


    • #3
      For those of you that are interested, an amdgpu developer advised me to comment out the device IDs for Vega M in the kernel source (using 4.19) located here: /drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c

      You can see the full source for this file here:
      Elixir Cross Referencer - Explore source code in your browser - Particularly useful for the Linux kernel and other low-level projects in C/C++ (bootloaders, C libraries...)


      These are the lines in question:
      /* VEGAM */
      {0x1002, 0x694C, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_VEGAM},
      {0x1002, 0x694E, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_VEGAM},


      This did indeed cause my Vega M to not be initialized, *but* the problem I'm having with the eGPU remains. So it appears my hunch that the Vega M is interfering with the eGPU initialization are incorrect, and I'm back to square one...

      Comment

      Working...
      X