Announcement

**Marc Driftmeyer** · 14 November 2016, 11:42 PM

Originally posted by Amarildo View Post

Looking at John's words I don't have hopes for opensource OpenCL on my GCN 1.0 anymore. Total and utter bullshit. I really thought support was coming.

Thanks for nothing AMD.

Buy a newer card. You're 4 revisions behind.

**smitty3268** · 15 November 2016, 01:35 AM

Originally posted by Amarildo View Post

Looking at John's words I don't have hopes for opensource OpenCL on my GCN 1.0 anymore. Total and utter bullshit. I really thought support was coming.

Thanks for nothing AMD.

I don't mean to minimize this, but exactly what did you plan on using OpenCL for on a GCN 1.0 GPU anyway? That's outdated enough it's hard to imagine it being good for anything except maybe practice or bragging rights on the internet. For practice, you could just use a CPU based OpenCL implementation.

**newwen** · 15 November 2016, 05:44 AM

If I understand correctly, AMD ROC is equivalent to NVIDIA CUDA, but with heterogeneous computing capabilities. Why target it directly instead of using SyCL, which is the standard for single source C++ heterogeneous computing? The ROCm stack may be open source, but it is definitely not an standard.

**Azpegath** · 15 November 2016, 06:31 AM

Originally posted by Marc Driftmeyer View Post

Buy a newer card. You're 4 revisions behind.

Originally posted by smitty3268 View Post

I don't mean to minimize this, but exactly what did you plan on using OpenCL for on a GCN 1.0 GPU anyway? That's outdated enough it's hard to imagine it being good for anything except maybe practice or bragging rights on the internet. For practice, you could just use a CPU based OpenCL implementation.

I also don't mean to be disrespectful, but I think that we can't except AMD to provide support for cards that were bought/producted in 2011, especially not since they are working with limited resources to catch up with both OpenGL, OpenCL and Vulkan specs, while stabilizing and improving performance of the driver.

I agree with the sentiment that it might be time to spend a couple hundred bucks on a modern generation, allowing for AMD to focus their resources on a smaller scope.

**Azpegath** · 15 November 2016, 06:32 AM

Originally posted by bridgman View Post

Ahh, OK... none of my answers will help much if it's not clear what ROC is.

ROC is Radeon Open Compute, what you think of as "the HSA stack" extended to support non-HSA-compliant dGPUs (hence the need for a different name) and with additional features specific to high performance computing.

---------------------

The ROC stack includes a few different components:

1. KFD (drivers/gpu/drm/amd/amdkfd)

The KFD (originally "Kernel Fusion Driver" back when HSA was called FSA) exposes an additional set of IOCTLs to user space, initially designed around usermode queues and GPU access to unpinned memory via ATS/PRI protocol between ATC (Address Translation Cache) in the GPU and IOMMUv2 in the CPU. Kaveri was the first part to provide native support, and Carrizo added context switching aka Compute Wave Save/Restore.

For anyone not familiar with the ATC/IOMMUv2 combination it allows a dGPU to access unpinned memory, and generates page faults at an IOMMU level which are then handled by the upstream IOMMUv2 driver, while ATC caches translations in the GPU and allows high performance access to system memory.

For dGPU we are not able to rely on IOMMUv2 since (a) some of our target market uses Intel and other CPUs without IOMMUv2 and (b) dGPUs rely heavily on local VRAM/HBM which is not managed by IOMMUv2 anyways. As a result, the initial ROC implementation relied on pinning memory from userspace, which made it non-upstreamable. We recently finished implementation of an eviction mechanism which allows "pinned from userspace" memory to be evicted anyways (after temporarily disabling the associated userspace queues), which allows dGPU ROC programs to dynamically share physical memory with other DRM drivers, eg amdgpu/radeon and will hopefully allow dGPU support in KFD to go upstream.

The KFD relies on radeon/amdgpu for HW initialization and most memory management operations, and primarily interacts with HSA/ROC-specific hardware added to CI and above in the form of the MEC (Micro-Engine Compute) blocks within CP. It also talks directly to the IOMMUv2 driver on APUs.

2. Libhsakmt

The libhsakmt code (sometimes referred to as "thunk" or Radeon Open Compute Thunk) performs the same function as libdrm but for the new IOCTLs - basically a userspace wrapper for the kernel driver functionality.

3. ROC runtime

This is the userspace driver that exposes ROC functionality to an application or toolchain (generally the latter). Unlike OpenGL or OpenCL the runtime does not include functions to submit work, just to create and manage userspace queues where the application/toolchain can submit work directly.

4. HCC

HCC compiler grew out of what was initially the Kalmar C++ AMP compiler, extended for C++17 and parallel STL... basically open standards catching up with proprietary ones.

5. HIP

HIP is a portability suite (tools + libraries) which do most of the work porting CUDA programs to a portable C++ 17 form which can run over NVCC or HCC.

---------------------

Until all of the KFD code gets upstream we are shipping the ROC stack separately, but what we are doing is:

- taking a copy of the amdgpu staging tree (the ROCM 1.3 release forked off agd5f's amd-staging-4.6 tree)
- making some changes to amdgpu and ttm which are not upstreamable until the corresponding kfd code is upstream to use them
- adding a much newer version of amdkfd with dGPU support and various HPC features
- testing and publishing in a set with matching versions of libhsakmt, ROC runtime, HCC and HIP

Once we are able to get dGPU support in amdkfd upstream the ROC stack will become just another part of the open source and PRO stacks.

---------------------

Not sure what the plans are for device ID switching inside the OpenCL runtime - I expect that specific device IDs will use the ROC back-end while everything else will use the current Orca back-end (we don't want to break existing users), so may not be possible to run unsupported HW over ROCm until the OpenCL code gets open sourced.

Thank you Bridgman, extremely informative answer =)
Perhaps Michael should add your description to the article, allowing for more people to understand it on a deeper level.

**finalzone** · 15 November 2016, 06:40 AM

Originally posted by smitty3268 View Post

I don't mean to minimize this, but exactly what did you plan on using OpenCL for on a GCN 1.0 GPU anyway? That's outdated enough it's hard to imagine it being good for anything except maybe practice or bragging rights on the internet. For practice, you could just use a CPU based OpenCL implementation.

Here is a catch. What about the hybrid AMD APUs using GCN 1.0 as dedicated GPU that were released in 2014 notably R7265DX and below?

**fakenmc** · 15 November 2016, 06:41 AM

Originally posted by pal666 View Post

you already have opensource opencl support in mesa

But it doesn't really work :/

**fakenmc** · 15 November 2016, 06:51 AM

Originally posted by smitty3268 View Post

I don't mean to minimize this, but exactly what did you plan on using OpenCL for on a GCN 1.0 GPU anyway? That's outdated enough it's hard to imagine it being good for anything except maybe practice or bragging rights on the internet. For practice, you could just use a CPU based OpenCL implementation.

The 7970 was launched in 2012, has 3GB of RAM, and is still a beast of an OpenCL card. We have one in our research lab, and the computer hosting it is running an old distro with an old kernel just so we can use it. It is not easy to get funding for replacing a 4-year old piece of hardware when we still have Core i7's from 2009 crunching numbers like there's no tomorrow. As for using an OpenCL CPU implementation, for certain workloads that "old" 7970 is like 10 times faster than a brand new 10-core/20-thread Xeon CPU... So there is plenty of reasons to keep using a GCN 1.0 GPU.

**mibo** · 15 November 2016, 06:58 AM

I also think, the HD7970 cards should get good OpenCL support (like with Catalyst - but open source, not the MESA one).
But, I still did not understand - maybe this is already the plan. There are so many interfaces and names. I'm confused.

**bridgman** · 15 November 2016, 08:45 AM

Originally posted by pal666 View Post

but will it use iommu when available (host memory on amd cpu)?

Performance seems to be a bit faster without it in most cases, although that may be a function of the ATC on the dGPU rather than the IOMMUv2 itself. If using ATC/IOMMUv2 for system memory was enough to get upstream we would be doing that, but IOMMUv2 doesn't help with VRAM and dGPU performance using only system memory is not high enough to be attractive.

Originally posted by newwen View Post

If I understand correctly, AMD ROC is equivalent to NVIDIA CUDA, but with heterogeneous computing capabilities. Why target it directly instead of using SyCL, which is the standard for single source C++ heterogeneous computing? The ROCm stack may be open source, but it is definitely not an standard.

Not sure I understand the question. We aren't expecting applications to target ROC directly, but rather to target toolchains running over ROC. I am not in the language working groups, but my understanding is that HCC is essentially where we think SyCL should be heading once it moves to C++ 17 standards.

Originally posted by Amarildo View Post

Looking at John's words I don't have hopes for opensource OpenCL on my GCN 1.0 anymore. Total and utter bullshit. I really thought support was coming. Thanks for nothing AMD.

Geez, can't you at least wait for me to respond before blowing up ? I do have to sleep sometimes. First, is it actually "open source OpenCL" you are looking for (ie you are planning to modify the source code yourself) or "OpenCL running with the open source stack" ie dealing with the current IOCTL gap between upstream and hybrid kernel drivers ?

Or is it just "I decided AMD was going to give me this specific deliverable although they never said they would (we always talked about open source OpenCL running over ROC) but now I decided they aren't going to do it so they suck" ?

There will be an open source OpenCL runtime supporting SI and higher and an open source shader compiler supporting SI and higher; what we don't have a plan for yet AFAIK (remember we are still working on the underlying SI amdgpu support) is how to plumb the OpenCL runtime and compiler-generated binaries into the amdgpu IOCTLs for compute queues on non-ROC-supported hardware. Worst case you end up with "mostly open source OpenCL working with the open source graphics stack and Vulkan".

Announcement

Radeon Open Compute 1.3 Platform Brings Polaris & Other Features

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment