Announcement

Collapse
No announcement yet.

AMD Sends Out Latest Patches For HMM-Based SVM Memory Manager

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • AMD Sends Out Latest Patches For HMM-Based SVM Memory Manager

    Phoronix: AMD Sends Out Latest Patches For HMM-Based SVM Memory Manager

    Published back in January was the initial work on a HMM-based SVM memory manager for the open-source Radeon compute stack. A second version of that work is now available as it continues working towards the mainline kernel...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    This is what bridgman told me a few months ago, in a very enlightening back and forth discussion, would be some of the keys in getting back to the dream and purpose of HSA and the Fusion initiative by AMD during their Dozer/Driver based APUs. Only Kaveri, Carrizo and Bristol Ridge ever achieved full HSA 1.0 certification before AMD turned hard to port and began the Zen initiative under Lisa Su in order to reclaim the Server and HPC crown. Even then only Kaveri was seriously looked at for dense clusters of APU equipped servers as Carrizo and Bristol Ridge were only ever used for laptops and a few Bristol Ridge equipped desktops as a transition point to give AMD more time to release Zen.

    Plus, HSA only worked in those above mentioned APUs because the CPU and GPU cores used the same memory address space which, if memory serves me, was 48 bit. This allows easy memory access and management without copying. In other words zero copy of data. That was lost in all Zen based APUs as the Zen CPU cores and Radeon GPU cores do not have the same memory bit space.

    So...am I wrong to think... (and I usually am)...that SVM in conjunction with HMM and ROCm eventually and in particular in conjunction with Zen 4/Genoa/Infinity Architecture 3.0/RDNA 3.0/CDNA 3.0 will bring about a return to HSA and Fusion WITHOUT necessarily requiring the CPU cores and the GPU cores to have the same memory address space ?

    Comment


    • #3
      Quick response for now, will try to come back with more later.

      Before Vega/GFX9 we only had 2 level page table support and were limited to 40 bit addressing at most from the GPUs. Beginning with Vega we moved to 4-level (with an optional +1) page tables and made 48-bit addressing standard on all our GPUs... so we did get to the same address space as CPU on our dGPUs, just 2 generations later than on APUs (Kaveri was GFX7, Vega is GFX9).

      The next obvious question is "what about 57-bit addressing on CPUs ?". The short term answer is that in cases where CPU/GPU pointer equivalence is required we will need to keep application addresses with 2^^47.

      The good news is that Linux implementation of 57-bit support automatically constrains addresses for userspace allocations to 48 (47) bit unless a larger hint address is provided. Eventually we are all going to need 57-bit GPU addresses, but 48 bit seems to be a workable standard for now.

      EDIT - just realized I didn't fully answer the question. We were able to make ROCm work on 40-bit address dGPUs while maintaining CPU/GPU pointer equivalence by carving out a block of address space under 2^^40 and sub-allocating from there. It only works for API-allocated memory, of course, but I believe the same approach could work with HMM as long as you stayed with API-allocated memory.
      Last edited by bridgman; 23 March 2021, 05:22 PM.
      Test signature

      Comment


      • #4
        "Navi is still TBD. This patch series focuses on GFXv9 because that's the IP our data center GPUs are on." - that is exactly what I feared: CDNA gets all the fancy new compute features, RDNA does not. At least not anytime soon. I guess there are no fundamental problems with RDNA supporting HMM/SVM, right?! I remember Felix talking about some limitations in the architecture in the discussion of the first version of that patch set which makes it harder to implement though.

        As HMM has been talked about for years thankfully it is getting closer to see usage for AMD hardware. Jumbotron Yes, I am also sure that means that HSA/Fusion-functionality will see a come back. It took way longer than I hoped for, but appearently it took the industry that much of development time to standardize at least some of the needed technologies (e.g. CXL) to have a working ecosystem.

        Comment


        • #5
          If we reset the OS without power cycle, is the GPU persistent ?

          ​​​​​​Food for thought ?

          Comment


          • #6
            Originally posted by bridgman View Post
            Quick response for now, will try to come back with more later.

            Before Vega/GFX9 we only had 2 level page table support and were limited to 40 bit addressing at most from the GPUs. Beginning with Vega we moved to 4-level (with an optional +1) page tables and made 48-bit addressing standard on all our GPUs... so we did get to the same address space as CPU on our dGPUs, just 2 generations later than on APUs (Kaveri was GFX7, Vega is GFX9).

            The next obvious question is "what about 57-bit addressing on CPUs ?". The short term answer is that in cases where CPU/GPU pointer equivalence is required we will need to keep application addresses with 2^^47.

            The good news is that Linux implementation of 57-bit support automatically constrains addresses for userspace allocations to 48 (47) bit unless a larger hint address is provided. Eventually we are all going to need 57-bit GPU addresses, but 48 bit seems to be a workable standard for now.

            EDIT - just realized I didn't fully answer the question. We were able to make ROCm work on 40-bit address dGPUs while maintaining CPU/GPU pointer equivalence by carving out a block of address space under 2^^40 and sub-allocating from there. It only works for API-allocated memory, of course, but I believe the same approach could work with HMM as long as you stayed with API-allocated memory.
            This is why bridgman is a treasure !!

            Comment


            • #7
              Is this already in the latest AMDGPU-pro and will it cure this?

              GPU fault detected: 146 0x08f8770c for process resolve pid 2520 thread resolve pid 2520
              VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0011B71F
              VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0207700C

              Comment


              • #8
                Originally posted by MadeUpName View Post
                Is this already in the latest AMDGPU-pro and will it cure this?
                GPU fault detected: 146 0x08f8770c for process resolve pid 2520 thread resolve pid 2520
                VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0011B71F
                VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0207700C
                At the moment I think it's fair to say that we see HMM being used primarily for HPC rather than for graphics or casual compute.

                I don't think it's possible to say whether switching to HMM would affect the error you listed without knowing the root cause of the error. If it is a compute app AND if the root cause is an application pointing to a buffer it does not own then HMM would throw a different error, more like a segfault, but the problem would still be there. If the problem is a compute driver bug then whether HMM helped or not would depend on whether it happened to replace the code where the bug existed.

                If the problem is coming from a graphics app then I would not expect HMM to have any effect.
                Test signature

                Comment


                • #9
                  It is the fusion part of Davinci Resolve so a graphics app doing compute. But it looks like it is not my solution so thanks for the feedback.

                  Comment


                  • #10
                    Originally posted by Jumbotron View Post
                    This is what bridgman told me a few months ago, in a very enlightening back and forth discussion, would be some of the keys in getting back to the dream and purpose of HSA and the Fusion initiative by AMD during their Dozer/Driver based APUs. Only Kaveri, Carrizo and Bristol Ridge ever achieved full HSA 1.0 certification before AMD turned hard to port and began the Zen initiative under Lisa Su in order to reclaim the Server and HPC crown. Even then only Kaveri was seriously looked at for dense clusters of APU equipped servers as Carrizo and Bristol Ridge were only ever used for laptops and a few Bristol Ridge equipped desktops as a transition point to give AMD more time to release Zen.

                    Plus, HSA only worked in those above mentioned APUs because the CPU and GPU cores used the same memory address space which, if memory serves me, was 48 bit. This allows easy memory access and management without copying. In other words zero copy of data. That was lost in all Zen based APUs as the Zen CPU cores and Radeon GPU cores do not have the same memory bit space.

                    So...am I wrong to think... (and I usually am)...that SVM in conjunction with HMM and ROCm eventually and in particular in conjunction with Zen 4/Genoa/Infinity Architecture 3.0/RDNA 3.0/CDNA 3.0 will bring about a return to HSA and Fusion WITHOUT necessarily requiring the CPU cores and the GPU cores to have the same memory address space ?
                    All of those pre-Zen CPUs were pretty crappy, no one would've seriously used those for server deployments. iGPU just isn't fast enough for such usecases, it's only useful on client devices, and once again, crappy CPU performance hampers everything.

                    Comment

                    Working...
                    X