So I've been trying to figure out why some users are experiencing errors and some not when trying to run an OpenCL 2.0 application from Einstein@home. I have a hunch that the issue is with the drivers being used, but some users are getting errors even with what should be the right drivers. which then makes me think it's some platform+drivers issue in that the drivers are OK, except when used on certain hardware. several of the users experiencing issues are using Radeon VII (GFX9) GPUs.
from the ROCm hardware support document here: https://github.com/RadeonOpenCompute...ftware-Support
but then later:
but then later again:
so which is it?
what is the "default" configuration? how would one change away from "default" into the required state to support <PCIe 3.0 and without atomics?
do the notes about not requiring atomics apply if the GPU is attached via a switch or southbridge chipset? or do those rules still apply?
it's all very unclear what the support ACTUALLY is.
from the ROCm hardware support document here: https://github.com/RadeonOpenCompute...ftware-Support
As described in the next section, GFX8 GPUs require PCI Express 3.0 (PCIe 3.0) with support for PCIe atomics. This requires both CPU and motherboard support. GFX9 GPUs require PCIe 3.0 with support for PCIe atomics by default, but they can operate in most cases without this capability.
Beginning with ROCm 1.8, GFX9 GPUs (such as Vega 10) no longer require PCIe atomics. We have similarly opened up more options for number of PCIe lanes. GFX9 GPUs can now be run on CPUs without PCIe atomics and on older PCIe generations, such as PCIe 2.0. This is not supported on GPUs below GFX9, e.g. GFX8 cards in the Fiji and Polaris families.
If you are using any PCIe switches in your system, please note that PCIe Atomics are only supported on some switches, such as Broadcom PLX. When you install your GPUs, make sure you install them in a PCIe 3.1.0 x16, x8, x4, or x1 slot attached either directly to the CPU's Root I/O controller or via a PCIe switch directly attached to the CPU's Root I/O controller.
In our experience, many issues stem from trying to use consumer motherboards which provide physical x16 connectors that are electrically connected as e.g. PCIe 2.0 x4, PCIe slots connected via the Southbridge PCIe I/O controller, or PCIe slots connected through a PCIe switch that does not support PCIe atomics.
If you are using any PCIe switches in your system, please note that PCIe Atomics are only supported on some switches, such as Broadcom PLX. When you install your GPUs, make sure you install them in a PCIe 3.1.0 x16, x8, x4, or x1 slot attached either directly to the CPU's Root I/O controller or via a PCIe switch directly attached to the CPU's Root I/O controller.
In our experience, many issues stem from trying to use consumer motherboards which provide physical x16 connectors that are electrically connected as e.g. PCIe 2.0 x4, PCIe slots connected via the Southbridge PCIe I/O controller, or PCIe slots connected through a PCIe switch that does not support PCIe atomics.
In the default ROCm configuration, GFX8 and GFX9 GPUs require PCI Express 3.0 with PCIe atomics.
what is the "default" configuration? how would one change away from "default" into the required state to support <PCIe 3.0 and without atomics?
do the notes about not requiring atomics apply if the GPU is attached via a switch or southbridge chipset? or do those rules still apply?
it's all very unclear what the support ACTUALLY is.
Comment