AMD AOMP 17.0-1 Compiler Switches To Its Next-Gen Plugin For Better Performance

Written by Michael Larabel in AMD on 18 April 2023 at 06:30 AM EDT. 1 Comment
AMD
AMD has released AOMP 17.0-1 as the newest version of this open-source compiler focused on providing the latest OpenMP offloading support for Radeon and Instinct accelerator products.

AOMP is a set of patches carried atop the latest upstream LLVM/Clang state. AMD engineers continue working on upstreaming their various improvements to LLVM while for those wanting the leading-edge, best support have AOMP to leverage the best OpenMP device capabilities right now.

AOMP 17.0-1 is based on the state of upstream LLVM/Clang from earlier this month and is built with AMD's ROCm 5.4.4 sources. Notable with this update is switching over to its next-gen plug-in by default that in turn should yield significant OpenMP performance improvements.

AOMP 17.0-1


The release announcement mentions the following changes for AOMP 17.0-1:
- Switch to nextgen plugin as default. This has shown significant performance improvements. To revert to the old plugin set LIBOMPTARGET_NEXTGEN_PLUGINS=OFF.

- Switch from hostrpc to hostexec. hostexec is a significant rewrite of hostrpc. The device hostexec_invoke is now written in OpenMP for portability to other platforms. The names of the wrapper (stub) to run a host function has changed to hostexec() and hostexec_() . hostexec also uses a global variable to find the transfer payload buffer instead of AMD implicit kernel args. This will support portability of hostexec, printf, and fprintf to other platforms. The update to this device global is made with global variable services in the nextgen plugin.

- An example on the use of hostexec to run MPI_Send and MPI_Recv in a target region is given. This example demonstrates how library owners can build a supplemental header file to enable transparent host execution of selected library functions within an OpenMP target regions with the same host interface. This eliminates the need for any source changes in the user code when host execution from a target region is desired. Before hostexec, users would typically have to end their target region, execute a host-only function, then start another target region. This feature significantly increases general purpose computing capabilities of OpenMP on GPGPU platforms.

- OMPT target support is incomplete with the nextgen plugin. To use OMPT, set the environment variable LIBOMPTARGET_NEXTGEN_PLUGINS=OFF.

- Set GPU_MAX_HW_QUEUES in gpurun

- Critical regions created via the critical directive are now more efficient: by relaxing the semantics of locks and combining that with the use of acquire and release fences we can limit the flushing of the GPU caches to every time the lock is acquired instead of at every lock check.

- When inlining functions called from the kernel, move allocas for their arguments in the kernel entry block instead of leaving them at launch point.

- Respect environment variable to force synchronous target region executions. Available via OMPX_FORCE_SYNC_REGIONS=1.

AOMP 17.0-1 downloads and more details at GitHub including RHEL/Debian binaries to complement the sources.
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week