Intel oneAPI DPC++ Compiler 2020-08 Released With Explicit SIMD Extension
Along with this week marking the release of oneAPI Level Zero 1.0, the oneAPI Data Parallel C++ compiler has seen its newest tagged release.
The Intel oneAPI DPC++ Compiler is the company's LLVM-based compiler around their Data Parallel C++ initiative for oneAPI built atop Khronos' SYCL single source programming standard and ISO C++. With the oneAPI DPC++ Compiler 2020-08 release one of the most significant additions is the introduction of the Intel Explicit SIMD extension for low-level GPU performance optimization tuning. This Explicit SIMD extension is for those developers trying to write their own hand-optimized code as opposed to hoping the compiler will optimize most effectively. The Explicit SIMD mode allows for manual vectorization of device code not contingent upon the compiler's optimization abilities and also providing new low-level APIs that map very well for Intel's Gen graphics hardware.
The DPC++ 2020-08 release also adds a new SYCL extension (SYCL_INTEL_usm_address_spaces) providing two new address spaces and are added to provide optimization information to their compiler. From the tentative spec, "The goal of this division of the global address space is to enable users to explicitly tell the compiler which address space a pointer resides in for the purposes of enabling optimization. While automatic address space inference is often possible for accessors, it is harder for USM pointers as it requires inter-procedural optimization with the host code. This additional information can be particularly beneficial on FPGA targets where knowing that a pointer only ever accesses host or device memory can allow compilers to produce more area efficient memory-accessing hardware."
Another SYCL extension added is INTEL_use_pinned_host_memory for utilizing pinned host memory. There is also support for other elements of the Khronos SYCL 2020 provisional specification. The oneAPI Data Parallel C++ Compiler 2020-08 release also has a number of improvements to its NVIDIA CUDA back-end, the standard optimization pipeline for the device code is now enabled by default (a new flag added to disable), various library improvements, and many bug fixes.
More details on this month's LLVM-based DPC++ compiler update via GitHub.
The Intel oneAPI DPC++ Compiler is the company's LLVM-based compiler around their Data Parallel C++ initiative for oneAPI built atop Khronos' SYCL single source programming standard and ISO C++. With the oneAPI DPC++ Compiler 2020-08 release one of the most significant additions is the introduction of the Intel Explicit SIMD extension for low-level GPU performance optimization tuning. This Explicit SIMD extension is for those developers trying to write their own hand-optimized code as opposed to hoping the compiler will optimize most effectively. The Explicit SIMD mode allows for manual vectorization of device code not contingent upon the compiler's optimization abilities and also providing new low-level APIs that map very well for Intel's Gen graphics hardware.
The DPC++ 2020-08 release also adds a new SYCL extension (SYCL_INTEL_usm_address_spaces) providing two new address spaces and are added to provide optimization information to their compiler. From the tentative spec, "The goal of this division of the global address space is to enable users to explicitly tell the compiler which address space a pointer resides in for the purposes of enabling optimization. While automatic address space inference is often possible for accessors, it is harder for USM pointers as it requires inter-procedural optimization with the host code. This additional information can be particularly beneficial on FPGA targets where knowing that a pointer only ever accesses host or device memory can allow compilers to produce more area efficient memory-accessing hardware."
Another SYCL extension added is INTEL_use_pinned_host_memory for utilizing pinned host memory. There is also support for other elements of the Khronos SYCL 2020 provisional specification. The oneAPI Data Parallel C++ Compiler 2020-08 release also has a number of improvements to its NVIDIA CUDA back-end, the standard optimization pipeline for the device code is now enabled by default (a new flag added to disable), various library improvements, and many bug fixes.
More details on this month's LLVM-based DPC++ compiler update via GitHub.
1 Comment