Announcement

**Tomin** · 06 May 2018, 11:02 AM

Why there are ROCm and PAL OpenCL implementations? Why would I use one over the other?

**StillStuckOnSI** · 06 May 2018, 11:57 AM

Originally posted by lostdistance View Post

I was able to run clinfo against amdgpu-pro-18.10 on an AMD Radeon HD 8570 Oland (GCN SI):

clinfo works for sure, but not any program that actually does reads and writes from memory in a OpenCL kernel. I tried (among many others) a simple vector addition kernel (ref. https://www.olcf.ornl.gov/tutorials/...ctor-addition/), and do not get the expected output.

It seems like there is some strangeness related to caching of built kernels and/or some other race condition surrounding clEnqueueNDRangeKernel. Running the exact same kernel twice with the exact same arguments, however, works and returns the correct result in the output buffer! Assuming the kernel args (including input buffer memory contents) are generated deterministically like in the program linked above, running the same program twice (thereby calling clEnqueueNDRangeKernel again) works as well. For non-deterministic arguments (e.g. random numbers as input), a simple copy-paste duplication of the clEnqueueNDRangeKernel call before calling clEnqueueReadBuffer seems to do the trick.

As a complete beginner in OpenCL programming, I've no idea why this behaviour occurs or why the above workaround resolves it. Assuming it isn't some bizarre oversight like CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE being set by default for clCreateCommandQueue, I can only assume there's something synchronization-related affecting clEnqueueNDRangeKernel.

**bridgman** · 06 May 2018, 03:50 PM

Originally posted by StillStuckOnSI View Post

Apologies for all the questions, but just to clarify: Is GCN 1.0/SI completely unsupported (under Orca) until experimental support is flipped on in the PAL, or is it actually supported at present under the "legacy" implementation and any buggy OpenCL behaviour should be considered as such?

My understanding is that SI is supported today (at least for the families/parts listed in the release notes) and so OpenCL bugs should be considered & reported as bugs. That said, I don't think we are getting as much QA coverage on the early parts as on more recent parts so there may be bugs getting through at the moment. We are ramping up test coverage on the Linux side so that should help.

Originally posted by Tomin View Post

Why there are ROCm and PAL OpenCL implementations? Why would I use one over the other?

Quick answer is that the ROCm stack can run a bit faster (since it makes use of HSA hardware features) but the PAL stack can run on all our hardware, not just parts explicitly designed with full HSA/ROCm hardware support. In general you will see the ROCm paths tested more heavily on ROCm stack releases while testing for the amdgpu/pro packaged releases will focus on PAL paths.

**bridgman** · 06 May 2018, 03:52 PM

Originally posted by Qaridarium

I have 2 Threatripper systems with 3 Vega-64 per PC and the one with the asrock mainboard the "sudo ./amdgpu-install --opencl=pal --headless"
solution works perfect but the other one with the MSI TR4 mainboard "sudo ./amdgpu-install --opencl=pal --headless" results in very loud FAN spin.
even with upgrading to 4.17rc3 kernel it is very loud. sure i have to check maybe there is dirt/dust inside of the cards who blocks the air... i will check this lader

Maybe try installing without the --headless option ? If you are running KDE then I imagine you want accelerated graphics, and having the X driver active may influence power mgmt.

There were also a few requests to force higher clock and fan speeds on dedicated compute rigs (to get best performance with bursty workloads) - not sure if that got implemented on the AMDGPU/PRO stack releases but if it was then it might be tied to headless installs.

**Tomin** · 06 May 2018, 03:59 PM

Originally posted by bridgman View Post

Quick answer is that the ROCm stack can run a bit faster (since it makes use of HSA hardware features) but the PAL stack can run on all our hardware, not just parts explicitly designed with full HSA/ROCm hardware support. In general you will see the ROCm paths tested more heavily on ROCm stack releases while testing for the amdgpu/pro packaged releases will focus on PAL paths.

I see. That makes perfect sense. Thank you!

I think the only problem that I have then is that there is no Tensorflow for PAL, only for ROCm which is quite strict about the platforms that it supports. Anyway, there never was Tensorflow support for AMD before ROCm so it's not worse now than it used to be.

**bridgman** · 06 May 2018, 04:08 PM

Originally posted by Tomin View Post

I think the only problem that I have then is that there is no Tensorflow for PAL, only for ROCm which is quite strict about the platforms that it supports. Anyway, there never was Tensorflow support for AMD before ROCm so it's not worse now than it used to be.

Other than Tahiti (HD 79xx) my impression was that we had ROCm support on all of the parts which were sufficiently powerful (and had sufficient memory) to be worth running Tensorflow on - where do you see the gaps ?

**Tomin** · 06 May 2018, 04:12 PM

Originally posted by bridgman View Post

Other than Tahiti (HD 79xx) my impression was that we had ROCm support on all of the parts which were sufficiently powerful (and had sufficient memory) to be worth running Tensorflow on - where do you see the gaps ?

Probably just on your documentation: https://rocm.github.io/hardware.html

Is it possible to use it with AMD processors like Phenom II paired with some AMD card?

**bridgman** · 06 May 2018, 06:40 PM

Originally posted by Tomin View Post

Probably just on your documentation: https://rocm.github.io/hardware.html

Is it possible to use it with AMD processors like Phenom II paired with some AMD card?

Whoops, you're right - you said "platform" not "GPU". Even worse, English IS my first language

**juno** · 07 May 2018, 04:56 AM

Originally posted by juno View Post

Do you know if OpenCL/PAL requires any non-upstreamed patches for kernel, libdrm and LLVM or would it "just work" on an up-to-date (rolling release) system?

Sorry for the bump, bridgman

**Tomin** · 07 May 2018, 07:11 AM

Originally posted by bridgman View Post

Whoops, you're right - you said "platform" not "GPU". Even worse, English IS my first language

It's alright. I take it that success is not guaranteed with Phenom II and it's even unlikely on the latest GPUs.

I noticed that OpenMI has also OpenCL version. Does that work on other OpenCL stacks than just ROCm OpenCL? It seems interesting and I wonder why people porting Tensorflow to OpenCL haven't mentioned it.

Announcement

Radeon Software 18.20 Preview Offers Early Support For Ubuntu 18.04 LTS & RHEL 7.5

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment