Announcement

Collapse
No announcement yet.

Radeon Software 18.20 Preview Offers Early Support For Ubuntu 18.04 LTS & RHEL 7.5

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Why there are ROCm and PAL OpenCL implementations? Why would I use one over the other?

    Comment


    • #32
      Originally posted by lostdistance View Post

      I was able to run clinfo against amdgpu-pro-18.10 on an AMD Radeon HD 8570 Oland (GCN SI):
      clinfo works for sure, but not any program that actually does reads and writes from memory in a OpenCL kernel. I tried (among many others) a simple vector addition kernel (ref. https://www.olcf.ornl.gov/tutorials/...ctor-addition/), and do not get the expected output.

      It seems like there is some strangeness related to caching of built kernels and/or some other race condition surrounding clEnqueueNDRangeKernel. Running the exact same kernel twice with the exact same arguments, however, works and returns the correct result in the output buffer! Assuming the kernel args (including input buffer memory contents) are generated deterministically like in the program linked above, running the same program twice (thereby calling clEnqueueNDRangeKernel again) works as well. For non-deterministic arguments (e.g. random numbers as input), a simple copy-paste duplication of the clEnqueueNDRangeKernel call before calling clEnqueueReadBuffer seems to do the trick.

      As a complete beginner in OpenCL programming, I've no idea why this behaviour occurs or why the above workaround resolves it. Assuming it isn't some bizarre oversight like CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE being set by default for clCreateCommandQueue, I can only assume there's something synchronization-related affecting clEnqueueNDRangeKernel.

      Comment


      • #33
        Originally posted by StillStuckOnSI View Post
        Apologies for all the questions, but just to clarify: Is GCN 1.0/SI completely unsupported (under Orca) until experimental support is flipped on in the PAL, or is it actually supported at present under the "legacy" implementation and any buggy OpenCL behaviour should be considered as such?
        My understanding is that SI is supported today (at least for the families/parts listed in the release notes) and so OpenCL bugs should be considered & reported as bugs. That said, I don't think we are getting as much QA coverage on the early parts as on more recent parts so there may be bugs getting through at the moment. We are ramping up test coverage on the Linux side so that should help.

        Originally posted by Tomin View Post
        Why there are ROCm and PAL OpenCL implementations? Why would I use one over the other?
        Quick answer is that the ROCm stack can run a bit faster (since it makes use of HSA hardware features) but the PAL stack can run on all our hardware, not just parts explicitly designed with full HSA/ROCm hardware support. In general you will see the ROCm paths tested more heavily on ROCm stack releases while testing for the amdgpu/pro packaged releases will focus on PAL paths.
        Test signature

        Comment


        • #34
          Originally posted by Qaridarium
          I have 2 Threatripper systems with 3 Vega-64 per PC and the one with the asrock mainboard the "sudo ./amdgpu-install --opencl=pal --headless"
          solution works perfect but the other one with the MSI TR4 mainboard "sudo ./amdgpu-install --opencl=pal --headless" results in very loud FAN spin.
          even with upgrading to 4.17rc3 kernel it is very loud. sure i have to check maybe there is dirt/dust inside of the cards who blocks the air... i will check this lader
          Maybe try installing without the --headless option ? If you are running KDE then I imagine you want accelerated graphics, and having the X driver active may influence power mgmt.

          There were also a few requests to force higher clock and fan speeds on dedicated compute rigs (to get best performance with bursty workloads) - not sure if that got implemented on the AMDGPU/PRO stack releases but if it was then it might be tied to headless installs.
          Last edited by bridgman; 06 May 2018, 03:54 PM.
          Test signature

          Comment


          • #35
            Originally posted by bridgman View Post
            Quick answer is that the ROCm stack can run a bit faster (since it makes use of HSA hardware features) but the PAL stack can run on all our hardware, not just parts explicitly designed with full HSA/ROCm hardware support. In general you will see the ROCm paths tested more heavily on ROCm stack releases while testing for the amdgpu/pro packaged releases will focus on PAL paths.
            I see. That makes perfect sense. Thank you!

            I think the only problem that I have then is that there is no Tensorflow for PAL, only for ROCm which is quite strict about the platforms that it supports. Anyway, there never was Tensorflow support for AMD before ROCm so it's not worse now than it used to be.

            Comment


            • #36
              Originally posted by Tomin View Post
              I think the only problem that I have then is that there is no Tensorflow for PAL, only for ROCm which is quite strict about the platforms that it supports. Anyway, there never was Tensorflow support for AMD before ROCm so it's not worse now than it used to be.
              Other than Tahiti (HD 79xx) my impression was that we had ROCm support on all of the parts which were sufficiently powerful (and had sufficient memory) to be worth running Tensorflow on - where do you see the gaps ?
              Test signature

              Comment


              • #37
                Originally posted by bridgman View Post
                Other than Tahiti (HD 79xx) my impression was that we had ROCm support on all of the parts which were sufficiently powerful (and had sufficient memory) to be worth running Tensorflow on - where do you see the gaps ?
                Probably just on your documentation: https://rocm.github.io/hardware.html

                Is it possible to use it with AMD processors like Phenom II paired with some AMD card?
                Last edited by Tomin; 06 May 2018, 04:19 PM. Reason: any -> some, sorry but English is not my first language

                Comment


                • #38
                  Originally posted by Tomin View Post
                  Probably just on your documentation: https://rocm.github.io/hardware.html

                  Is it possible to use it with AMD processors like Phenom II paired with some AMD card?
                  Whoops, you're right - you said "platform" not "GPU". Even worse, English IS my first language
                  Test signature

                  Comment


                  • #39
                    Originally posted by juno View Post
                    Do you know if OpenCL/PAL requires any non-upstreamed patches for kernel, libdrm and LLVM or would it "just work" on an up-to-date (rolling release) system?
                    Sorry for the bump, bridgman

                    Comment


                    • #40
                      Originally posted by bridgman View Post
                      Whoops, you're right - you said "platform" not "GPU". Even worse, English IS my first language
                      It's alright. I take it that success is not guaranteed with Phenom II and it's even unlikely on the latest GPUs.

                      I noticed that OpenMI has also OpenCL version. Does that work on other OpenCL stacks than just ROCm OpenCL? It seems interesting and I wonder why people porting Tensorflow to OpenCL haven't mentioned it.

                      Comment

                      Working...
                      X