Announcement

Collapse
No announcement yet.

AMDKFD Looking To Be Merged Into AMDGPU Linux DRM Kernel Driver

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Originally posted by bridgman View Post

    Do you mean "radeon gcn 1.1, amdgpu gcn 1.2 and higher" ? I'm guessing that your "thanks" is sarcastic but not sure why.

    Note that we have switched to using amdgpu with ROCM rather than radeon, even for GCN 1.1 - that is one of the things that makes the proposed merge possible.
    I for one am thankful that it's not "Vega or better". It's pretty cool to get your feedback on this site. You always have great info bridgman.

    Comment


    • #12
      Originally posted by guglovich View Post
      Nearly. I hint that the Kaveri are deprived of functionality in the future kernel. After all, as you wrote for Kaveri already Radeon DRM. And from the news, I realized that AMDKFD removed from the kernel and a duplicate code from the Radeon DRM. Correct if I'm wrong.
      Yep, you are wrong

      Kaveri is supported by both radeon and amdgpu kernel drivers today, although radeon is still the upstream default while we work through remaining amdgpu issues on SI/CI. That said, we are using amdgpu in the packaged drivers for all GCN generations right back to SI/GCN 1.0 although the focus there is on dGPU rather than APU at the moment.

      Nothing is being removed from the kernel - we just plumbed amdkfd into amdgpu rather than into radeon for CIK/GCN 1.1. I believe we did that almost a year ago. EDIT - about 9 months ago:

      https://cgit.freedesktop.org/~agd5f/...a3d93d1fe371af

      Anyways, all this means is that you will need to use amdgpu rather than radeon on Kaveri if you want ROCm support. That can be done with boot parms now (one to disable radeon CIK support and another to enable amdgpu CIK support), although if you are using the radeon X driver you will need to tweak X.conf as well IIRC.

      I have received maybe 50 "OMG why are you making me still use radeon instead of making amdgpu the default for SI/CI" posts, but yours is the first "OMG you are going to make me use amdgpu" post so far.

      I should mention for completeness that right now dGPUs (and mostly high end dGPUs) are the main focus area for ROCm, but I think our intent is to keep it running on the APUs we used for initial development as well.

      Originally posted by guglovich View Post
      Also interesting is whether this means that there will finally be compatibility with Mesa OpenCL?
      Mesa OpenCL has pretty much stalled as far as I can see. We worked on it for a few years hoping it would catch on as a community standard, but it didn't happen. There was still a LOT of work to do before it could be a viable alternative to our in-house OpenCL driver, so we focused on open sourcing our in-house OpenCL driver instead:

      https://github.com/RadeonOpenCompute...OpenCL-Runtime

      I'm not saying that Mesa OpenCL (aka the clover state tracker) is dead - there is still occasional work being done on it - but it is not part of our plans at the moment.

      https://cgit.freedesktop.org/mesa/me...rackers/clover
      Last edited by bridgman; 05 July 2018, 04:37 PM.
      Test signature

      Comment


      • #13
        Originally posted by AndyChow View Post
        It's pretty cool to get your feedback on this site. You always have great info bridgman.
        Thanks !
        Test signature

        Comment


        • #14
          Originally posted by bridgman View Post
          Mesa OpenCL has pretty much stalled as far as I can see. We worked on it for a few years hoping it would catch on as a community standard, but it didn't happen. There was still a LOT of work to do before it could be a viable alternative to our in-house OpenCL driver, so we focused on open sourcing our in-house OpenCL driver instead:

          https://github.com/RadeonOpenCompute...OpenCL-Runtime

          I'm not saying that Mesa OpenCL (aka the clover state tracker) is dead - there is still occasional work being done on it - but it is not part of our plans at the moment.
          Which isn't very surprising.

          Anyway, no that OpenCL is based on SPIR-V and Khronos is working on Vulkan-OpenCL convergence, maybe radv is going be the Mesa OpenCL driver one day

          Comment


          • #15
            Originally posted by bridgman View Post

            Yep, you are wrong

            Kaveri is supported by both radeon and amdgpu kernel drivers today, although radeon is still the upstream default while we work through remaining amdgpu issues on SI/CI. That said, we are using amdgpu in the packaged drivers for all GCN generations right back to SI/GCN 1.0 although the focus there is on dGPU rather than APU at the moment.

            Nothing is being removed from the kernel - we just plumbed amdkfd into amdgpu rather than into radeon for CIK/GCN 1.1. I believe we did that almost a year ago. EDIT - about 9 months ago:

            https://cgit.freedesktop.org/~agd5f/...a3d93d1fe371af

            Anyways, all this means is that you will need to use amdgpu rather than radeon on Kaveri if you want ROCm support. That can be done with boot parms now (one to disable radeon CIK support and another to enable amdgpu CIK support), although if you are using the radeon X driver you will need to tweak X.conf as well IIRC.

            I have received maybe 50 "OMG why are you making me still use radeon instead of making amdgpu the default for SI/CI" posts, but yours is the first "OMG you are going to make me use amdgpu" post so far.

            I should mention for completeness that right now dGPUs (and mostly high end dGPUs) are the main focus area for ROCm, but I think our intent is to keep it running on the APUs we used for initial development as well.



            Mesa OpenCL has pretty much stalled as far as I can see. We worked on it for a few years hoping it would catch on as a community standard, but it didn't happen. There was still a LOT of work to do before it could be a viable alternative to our in-house OpenCL driver, so we focused on open sourcing our in-house OpenCL driver instead:

            https://github.com/RadeonOpenCompute...OpenCL-Runtime

            I'm not saying that Mesa OpenCL (aka the clover state tracker) is dead - there is still occasional work being done on it - but it is not part of our plans at the moment.

            https://cgit.freedesktop.org/mesa/me...rackers/clover

            Sincerely I thank. I thought that AMDGPU for Kaveri is already at an impasse, and that's why I was afraid for the future of ROCm on Kaveri. In the end, for him, there is still experimental support.

            Also interested in information about how the Kaveri + dGPU package will work? Will technology inside HSA have advantages in ROCm? For example, for example, hUMA. And how does dGPU manage without APU?

            Thanks again. I've been following HSA for a long time, and I even refused to buy Vishera (FX) for such a promising technology.

            Comment


            • #16
              Originally posted by bridgman View Post

              Now that the KFD code is substantially upstream the next step is to align our internal trees with the upstream user/kernel interface, which is slightly different from the UKI in our internal trees (and hence in ROCm releases). The goal is to include this in the upcoming ROCm 1.9 release.

              Note that the initial release may not include testing against upstream, just rebasing our internal trees against upstream KFD and then testing against the internal trees we use for the ROCm release. So we're going from saying "it won't work against upstream" to "it should work against upstream, but use the kernel code we release if you want to be sure".

              Being able to say "it will work against upstream" will require at least doubling our current test coverage, which will take some time, but it is part of the plan.
              Great, this look promising! Thanks a lot for clarifying it, this is useful information for actual or future users of AMD hardware

              I wonder why the KFD code has been developed as separate of AMDGPU driver, instead developing on it from the start of it. Are there reasons for this?

              Comment


              • #17
                Originally posted by timofonic View Post
                I wonder why the KFD code has been developed as separate of AMDGPU driver, instead developing on it from the start of it. Are there reasons for this?
                Complex technical reasons... the amdgpu driver didn't exist when we started working on KFD

                I forget the exact timing, but IIRC we started work on HSA (and KFD) before the amdgpu driver was anything more than a proposal. There was also a lot more Windows focus for HSA at the time and so Windows design decisions also influenced Linux decisions.

                I came into the HSA effort after the initial design decisions had been made, and I probably could have convinced everyone to integrate KFD into the graphics driver for Linux, but given the fact that we were just starting (or thinking about starting) a separate graphics driver (amdgpu) it seemed reasonable to keep KFD separate and have it work with both drivers for the first couple of years. I believe we first started hooking it into amdgpu when we started working on Carrizo support in mid-late 2014.

                All of our initial development was done on Tahiti (HD79xx) plus Trinity (since Trinity exposed IOMMUv2 to the PCIE bus but not to the internal GPU) but since SI didn't have any of the HSA features other than ability to work with IOMMUv2 the implementation was pretty ugly. Once the first Kaveri samples came back (in early 2013, I think) we heaved a great sigh of relief and switched to Kaveri.
                Last edited by bridgman; 05 July 2018, 06:08 PM.
                Test signature

                Comment


                • #18
                  Originally posted by bridgman View Post

                  Complex technical reasons... the amdgpu driver didn't exist when we started working on KFD
                  I see. An egg and chicken problem? Thanks for your clarification

                  Comment


                  • #19
                    Originally posted by guglovich View Post
                    Sincerely I thank. I thought that AMDGPU for Kaveri is already at an impasse, and that's why I was afraid for the future of ROCm on Kaveri. In the end, for him, there is still experimental support.
                    Makes sense... sorry for worrying you.

                    Originally posted by guglovich View Post
                    Also interested in information about how the Kaveri + dGPU package will work? Will technology inside HSA have advantages in ROCm? For example, for example, hUMA. And how does dGPU manage without APU?
                    ROCm is based on HSA, just modified a bit to work on dGPUs which do not have recoverable page fault support via GPUVM.

                    Rather than allowing the GPU to access unpinned memory the way we can on an APU (where all the system memory accesses can be made to go through IOMMUv2) we play a delicate dance - having allocated memory pinned all the time from the GPU's perspective, but being able to unpin it temporarily by "evicting" ROCm processes (pre-empting the process's user queues) so we don't block cases where a graphics process needs memory or the OS needs to change userspace mappings underneath a running program. Without eviction it would have been pretty much impossible to get KFD dGPU support upstream.

                    It took a lot of work by Felix and others to get eviction working, but the basic idea is that we use MMU notifiers to intercept OS requests and hook into TTM's data structures to detect graphics requests then temporarily evict and unpin enough processes to let the requests be satisfied.

                    Starting with Vega10 we have a new GPUVM block, with 4 level page tables and the ability to recover from GPUVM page faults as well as IOMMUv2 faults. We are not taking advantage of that in the ROCm stack yet but now that KFD is upstream we can start scheduling some of that work - and once it is done we will no longer need eviction on sufficiently new HW.

                    BTW we could potentially use IOMMUv2 for system memory accesses rather than having to pin that memory, but AFAIK Intel CPUs do not expose IOMMUv2 (aka ATS/PRI services) to the PCIE connector, so our first implementation had to rely on memory pinning in order to work on both Intel and AMD CPUs.
                    Last edited by bridgman; 05 July 2018, 07:48 PM.
                    Test signature

                    Comment


                    • #20
                      Originally posted by guglovich View Post

                      Resentment. I took Kaveri just for the sake of HSA. Oh, this marketing of AMD...
                      I also brought a Kaveri gcn 1.1 because of that, ...we have been left in the dark

                      Comment

                      Working...
                      X