Announcement

Collapse
No announcement yet.

Cache Coherent Device Memory For HMM

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cache Coherent Device Memory For HMM

    Phoronix: Cache Coherent Device Memory For HMM

    Jerome Glisse at Red Hat continues working on Heterogeneous Memory Management for the mainline Linux kernel and hopefully will be merged soon. He's now extended HMM with cache-coherent device memory support...

    http://www.phoronix.com/scan.php?pag...CDM-HMM-Memory

  • #2
    I'm still a bit confused. What does HMM do above and beyond AMD's and the ARM chip alliance's HSA ( Heterogenous System Architecture )

    Quick Wiki explanation of HSA:

    Heterogeneous System Architecture (HSA) is a cross-vendor set of specifications that allow for the integration of central processing units and graphics processors on the same bus, with shared memory and tasks.[1] The HSA is being developed by the HSA Foundation, which includes (among many others) AMD and ARM. The platform's stated aim is to reduce communication latency between CPUs, GPUs and other compute devices, and make these various devices more compatible from a programmer's perspective,[2]:3[3] relieving the programmer of the task of planning the moving of data between devices' disjoint memories (as must currently be done with OpenCL or CUDA).[4]

    So....is this Nvidia's version of HSA ? Or if by adding it to the kernel can it be exposed and used for AMD systems particularly AMD's APUs such as Carrizo, Bristol Ridge and the Zen based Raven Ridge ?

    Comment


    • #3
      Really? A Premium Member and just asking a question about the difference between HSA and HMM and I get "UNAPPOVED" ??

      Comment


      • #4
        Originally posted by Jumbotron View Post
        Really? A Premium Member and just asking a question about the difference between HSA and HMM and I get "UNAPPOVED" ??
        Don't take it personally - sometimes the anti-spam here is weird about its pickiness.

        Anyway to my understanding, I get the impression HSA occurs closer to the hardware level and depends on things like IOMMU. HSA is also intended to work for anything that can utilize shared memory, not just GPUs. Meanwhile, it seems to me HMM is done entirely through software, and seems to be GPU-centric.

        If I'm right about what I said, I'm assuming HSA is more efficient, but HMM is more portable.

        Comment


        • #5
          Thanks for reply. Sounds reasonable. Just hoping since Nvidia seems to be pushing HMM that it doesn't evolve into yet another CUDA vs OpenCL or CUDA vs Radeon battle.

          Comment


          • #6
            Originally posted by Jumbotron View Post
            Thanks for reply. Sounds reasonable. Just hoping since Nvidia seems to be pushing HMM that it doesn't evolve into yet another CUDA vs OpenCL or CUDA vs Radeon battle.
            Good thing is that it is "open" enough that it doesn't have to. Also I think the Cuda v. OpenCL battle is soon to be over, since third parties are implementing Cuda on OpenCL, and OpenCL support might soon be better on NVIDIA.

            Comment


            • #7
              Originally posted by Jumbotron View Post
              Really? A Premium Member and just asking a question about the difference between HSA and HMM and I get "UNAPPOVED" ??
              Small explanation on how everything fit together:

              https://lkml.org/lkml/2017/6/12/968

              Comment


              • #8
                Originally posted by glisse View Post

                Small explanation on how everything fit together:

                https://lkml.org/lkml/2017/6/12/968

                Ahhhhh.....THANKS !! VERY helpful in getting my head wrapped around this. So...in brief...HSA is geared more towards SoC's such as AMD APU's and all of the ARM based chips while HMM can handle that but also card based GPU's and other card based accelerators or even other socket based items.

                Comment


                • #9
                  Marc.2377 explained on the last version published:
                  https://www.phoronix.com/forums/foru...413#post953413

                  Comment


                  • #10
                    Originally posted by Jumbotron View Post
                    Ahhhhh.....THANKS !! VERY helpful in getting my head wrapped around this. So...in brief...HSA is geared more towards SoC's such as AMD APU's and all of the ARM based chips while HMM can handle that but also card based GPU's and other card based accelerators or even other socket based items.
                    Not exactly... what you're thinking of as "HSA" has the same capabilities as CAPI; we just implemented it first on devices which did not have device memory. The difference between HMM and other approaches is really the amount of native HW support available and the amount of SW required to complete a solution.

                    Address translation / SVM and cache coherency are largely independent although HSA and HMM touch on both of them.

                    --------------

                    Let's talk about cache coherency first...

                    One part of HSA is the two-way cache coherency logic which interconnects like HyperTransport or OpenCAPI provide. The coherent interconnect happens to be local to the chip in our APUs rather than being exposed on an external bus but it provides the same functionality. Without full cache coherency you need to disable caches on CPU and/or device when accessing shared memory.

                    If you do have two-way cache coherency established, however, then in principle you can access device memory as easily as system memory by pointing CPU page table entries at device memory (assuming you have large BAR support enabled so the CPU can access all of device memory). I don't think the distinction between CAPI and "HSA" in Jerome's list is really important unless you want to say that B is simply the case where no device memory is present.

                    --------------

                    Another part of HSA is sharing the application address space between device (GPU) and CPU.

                    A and B in Jerome's list (CAPI and HSA APUs) use an IOMMU-type model for address where hardware on the device shares page tables with the CPU(s) by requesting translations from the CPU/chipset and manages synchronization between page tables and cached translations (in the device's Address Translation Cache (ATC)) as part of the bus protocol.

                    The question is what you do on CPUs which do not expose sufficient address translation services to PCIE, and that's where HMM comes in.

                    Modern GPUs have their own memory management units built in - we usually refer to ours as the GPUVM block and I will use "GPUVM" generically here. The GPUVM block has its own set of page tables, and all non-kernel accesses to system or device memory use addresses which have been translated through those page tables.

                    Rather than relying on centralized address translation in the CPU using CPU page tables directly, another option is to synchronize CPU and GPU page tables using software so that the GPUVM block can perform a similar role to what the ATC/IOMMUv2 combination does. HMM contains the OS portion of that software, organized as a set of helper functions that drivers can use.

                    This is not "different from HSA" since the same SVM behaviour is achieved but it is "different from the HW implementation of our APUs" and "different from CAPI".

                    Think of HMM as part of a software solution (GPU drivers being the other part) which can work with suitable GPUVM HW to satisfy the "HSA MMU" requirement in the HSA System Architecture Specification or in similar programming models like the GP100 Unified Memory portion of CUDA 8.

                    HSA standards specify the functionality available to an application, not a specific HW implementation.

                    --------------

                    Going back to cache coherency for a minute, on systems which do not provide HW support for two-way cache coherency HMM provides the next best thing by forcing memory pages to be resident on whatever device is currently accessing them, even if that involves flipping pages back and forth between CPU and GPU (or GPU and GPU).

                    AFAIK it's not actually "where the page that resides" which matters as much as getting a page fault notification allowing you to flush caches on one device before allowing the other device to start accessing, ie you should be able to leave the page in place and still get the same benefit, but I need to get back to memory management long enough to have that discussion with Jerome et al (OK, email finally sent).

                    EDIT - Jerome reminded me that I missed CPU atomics from my list of reasons for migrating back to system memory, so something to investigate with the CPU/chipset folks.
                    Last edited by bridgman; 06-16-2017, 08:33 PM.

                    Comment

                    Working...
                    X