Announcement

Collapse
No announcement yet.

AMD GPU PCI BAR / MMIO Address Space

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • AMD GPU PCI BAR / MMIO Address Space

    We are building a GPU server compute platform and trying to squeeze as many GPUs as we can into it. We were using a system with 8 280x AMD GPUs that boot and work fine. We recently switched to 380x GPUs, but the BIOS fails to boot past 6 GPUs, which I’m assuming is because its running out of MMIO address space for them. I have included the pci bus ranges for both the 280x vs 380x for comparison below.
    280x
    [ 0.210403] pci 0000:05:01.0: PCI bridge to [bus 06]
    [ 0.210405] pci 0000:05:01.0: bridge window [io 0xb000-0xbfff]
    [ 0.210410] pci 0000:05:01.0: bridge window [mem 0xf7b00000-0xf7bfffff]
    [ 0.210413] pci 0000:05:01.0: bridge window [mem 0xc0000000-0xcfffffff 64bit pref]

    380x
    [ 0.189688] pci 0000:02:03.0: PCI bridge to [bus 05]
    [ 0.189689] pci 0000:02:03.0: bridge window [io 0x9000-0x9fff]
    [ 0.189693] pci 0000:02:03.0: bridge window [mem 0xf7800000-0xf78fffff]
    [ 0.189696] pci 0000:02:03.0: bridge window [mem 0x40000000-0x501fffff 64bit pref]

    Looks like the main difference between the two cards is that that 280x has a 64bit pref window size of 0xfffffff, while the 380x is 0x101fffff



    My question is if there is a way to edit either the GPU onboard BIOS rom, or the system BIOS to reduce the amount of address space assigned to each GPU. We are only using 1x lanes on the GPUs so I’m assuming a lot of the space the GPU uses is unneeded.

    I know enabling > 4GB of address space in the motherboard BIOS is an option that could work for this case, but that is usually only available for higher end chipsets which we don't want to utilize currently.

    If anyone can think of a workable solution for this issue it would be appreciated!


  • #2
    The MMIO address space has nothing to do with the number of pcie lanes. It's used for configuring the GPU (registers) and providing CPU access to VRAM.

    Comment


    • #3
      Originally posted by agd5f View Post
      The MMIO address space has nothing to do with the number of pcie lanes. It's used for configuring the GPU (registers) and providing CPU access to VRAM.
      Right I understand. The main issue is how we can get the MMIO/BARs correctly configured so all 8 devices can be properly configured. We have a custom BIOS for a H87 and H97 based boards which allow for >4GB of address space enabled. This allows for the BIOS to actually post with all 8 GPUs, and fglrx_core properly recognizes all 8, but it seems like they are not properly configured since any openCL command immediately crashes (including clinfo).

      This occurs even when one GPU is connected with >4GB enabled. So I'm assuming the issue lies with how the kernel does 64bit BAR decoding. Attached is a pastbin of kernel messages with all 8 GPUs connected. You can see there are some issues with PnP disabling because of memory overlaps, and some BARs not being claimed assigned.

      Since you know how the drivers work inside out maybe there is a simple fix to this I have not come across.

      Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

      Comment

      Working...
      X