We are building a GPU server compute platform and trying to squeeze as many GPUs as we can into it. We were using a system with 8 280x AMD GPUs that boot and work fine. We recently switched to 380x GPUs, but the BIOS fails to boot past 6 GPUs, which I’m assuming is because its running out of MMIO address space for them. I have included the pci bus ranges for both the 280x vs 380x for comparison below.
280x
[ 0.210403] pci 0000:05:01.0: PCI bridge to [bus 06]
[ 0.210405] pci 0000:05:01.0: bridge window [io 0xb000-0xbfff]
[ 0.210410] pci 0000:05:01.0: bridge window [mem 0xf7b00000-0xf7bfffff]
[ 0.210413] pci 0000:05:01.0: bridge window [mem 0xc0000000-0xcfffffff 64bit pref]
380x
[ 0.189688] pci 0000:02:03.0: PCI bridge to [bus 05]
[ 0.189689] pci 0000:02:03.0: bridge window [io 0x9000-0x9fff]
[ 0.189693] pci 0000:02:03.0: bridge window [mem 0xf7800000-0xf78fffff]
[ 0.189696] pci 0000:02:03.0: bridge window [mem 0x40000000-0x501fffff 64bit pref]
Looks like the main difference between the two cards is that that 280x has a 64bit pref window size of 0xfffffff, while the 380x is 0x101fffff
My question is if there is a way to edit either the GPU onboard BIOS rom, or the system BIOS to reduce the amount of address space assigned to each GPU. We are only using 1x lanes on the GPUs so I’m assuming a lot of the space the GPU uses is unneeded.
I know enabling > 4GB of address space in the motherboard BIOS is an option that could work for this case, but that is usually only available for higher end chipsets which we don't want to utilize currently.
If anyone can think of a workable solution for this issue it would be appreciated!
280x
[ 0.210403] pci 0000:05:01.0: PCI bridge to [bus 06]
[ 0.210405] pci 0000:05:01.0: bridge window [io 0xb000-0xbfff]
[ 0.210410] pci 0000:05:01.0: bridge window [mem 0xf7b00000-0xf7bfffff]
[ 0.210413] pci 0000:05:01.0: bridge window [mem 0xc0000000-0xcfffffff 64bit pref]
380x
[ 0.189688] pci 0000:02:03.0: PCI bridge to [bus 05]
[ 0.189689] pci 0000:02:03.0: bridge window [io 0x9000-0x9fff]
[ 0.189693] pci 0000:02:03.0: bridge window [mem 0xf7800000-0xf78fffff]
[ 0.189696] pci 0000:02:03.0: bridge window [mem 0x40000000-0x501fffff 64bit pref]
Looks like the main difference between the two cards is that that 280x has a 64bit pref window size of 0xfffffff, while the 380x is 0x101fffff
My question is if there is a way to edit either the GPU onboard BIOS rom, or the system BIOS to reduce the amount of address space assigned to each GPU. We are only using 1x lanes on the GPUs so I’m assuming a lot of the space the GPU uses is unneeded.
I know enabling > 4GB of address space in the motherboard BIOS is an option that could work for this case, but that is usually only available for higher end chipsets which we don't want to utilize currently.
If anyone can think of a workable solution for this issue it would be appreciated!
Comment