Announcement

Collapse
No announcement yet.

Does ROCm require PCIe 3.0 with PCIe atomics or not?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Does ROCm require PCIe 3.0 with PCIe atomics or not?

    So I've been trying to figure out why some users are experiencing errors and some not when trying to run an OpenCL 2.0 application from Einstein@home. I have a hunch that the issue is with the drivers being used, but some users are getting errors even with what should be the right drivers. which then makes me think it's some platform+drivers issue in that the drivers are OK, except when used on certain hardware. several of the users experiencing issues are using Radeon VII (GFX9) GPUs.

    from the ROCm hardware support document here: https://github.com/RadeonOpenCompute...ftware-Support

    As described in the next section, GFX8 GPUs require PCI Express 3.0 (PCIe 3.0) with support for PCIe atomics. This requires both CPU and motherboard support. GFX9 GPUs require PCIe 3.0 with support for PCIe atomics by default, but they can operate in most cases without this capability.
    but then later:
    Beginning with ROCm 1.8, GFX9 GPUs (such as Vega 10) no longer require PCIe atomics. We have similarly opened up more options for number of PCIe lanes. GFX9 GPUs can now be run on CPUs without PCIe atomics and on older PCIe generations, such as PCIe 2.0. This is not supported on GPUs below GFX9, e.g. GFX8 cards in the Fiji and Polaris families.

    If you are using any PCIe switches in your system, please note that PCIe Atomics are only supported on some switches, such as Broadcom PLX. When you install your GPUs, make sure you install them in a PCIe 3.1.0 x16, x8, x4, or x1 slot attached either directly to the CPU's Root I/O controller or via a PCIe switch directly attached to the CPU's Root I/O controller.

    In our experience, many issues stem from trying to use consumer motherboards which provide physical x16 connectors that are electrically connected as e.g. PCIe 2.0 x4, PCIe slots connected via the Southbridge PCIe I/O controller, or PCIe slots connected through a PCIe switch that does not support PCIe atomics.
    but then later again:
    In the default ROCm configuration, GFX8 and GFX9 GPUs require PCI Express 3.0 with PCIe atomics.
    so which is it?

    what is the "default" configuration? how would one change away from "default" into the required state to support <PCIe 3.0 and without atomics?

    do the notes about not requiring atomics apply if the GPU is attached via a switch or southbridge chipset? or do those rules still apply?

    it's all very unclear what the support ACTUALLY is.








  • #2
    One user in particular was using the following combination:

    AMD FX-8350 CPU
    Vega 10 GPU
    ROCm 4.3 drivers
    Ubuntu 20.04.3 w/ 5.11.0-27 kernel

    he did not get an error, but the GPU was essentially idle and not doing anything while attempting to run the application. 0% load and 0% computation progress.

    yes, the CPU is not on the supported list, but since CPU support seems to be centered around PCIe 3.0 and PCIe atomics capability, and since this ROCm version and GPU combination claims that PCIe 3.0 and atomics are not required, does it even matter?

    Comment


    • #3
      PCI atomics are not required by the firmware on vega parts. They are required by the firmware on navi parts, but we are in the process of fixing that. Beyond the firmware, there are no requirements for PCI atomics in the greater ROCm stack in general, although they are required for certain features (e.g., atomic shader instructions writing to system memory).

      Comment


      • #4
        Originally posted by agd5f View Post
        PCI atomics are not required by the firmware on vega parts. They are required by the firmware on navi parts, but we are in the process of fixing that. Beyond the firmware, there are no requirements for PCI atomics in the greater ROCm stack in general, although they are required for certain features (e.g., atomic shader instructions writing to system memory).
        do you have any explanation then why the above configuration does not work? Does the CPU not being on the CPU support list matter if PCIe atomics are not required? from what I can find, this CPU has all PCIe lanes run through the chipset.

        why are the notes about PCIe atomics support so contradictory? some sections say it's needed, some say not.
        Last edited by gsrcrxsi; 01 September 2021, 03:48 PM.

        Comment


        • #5
          Originally posted by gsrcrxsi View Post

          do you have any explanation then why the above configuration does not work? Does the CPU not being on the CPU support list matter if PCIe atomics are not required? from what I can find, this CPU has all PCIe lanes run through the chipset.
          I'm not sure off hand.

          Originally posted by gsrcrxsi View Post
          why are the notes about PCIe atomics support so contradictory? some sections say it's needed, some say not.
          It was originally required for all chips, but the requirement was dropped for vega later. I suspect the some of the documentation didn't get updated properly.

          Comment

          Working...
          X