Announcement

Collapse
No announcement yet.

Linux 6.10 Improves AMD ROCm Compute Support For "Small" Ryzen APUs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Linux 6.10 Improves AMD ROCm Compute Support For "Small" Ryzen APUs

    Phoronix: Linux 6.10 Improves AMD ROCm Compute Support For "Small" Ryzen APUs

    Sneaking in as a "fix" for the Linux 6.10 kernel is an enhancement to the AMDKFD kernel compute driver used by the ROCm compute stack for better supporting small Ryzen APUs like client and embedded SoCs...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    I don't exactly understand. I'm at least 40% sure my 6600 HS can use more ram then what I assigned as GPU ram with ROCm. Or was this the Vulkan compute version I remember?

    Comment


    • #3
      Oh boy, so many questions this evening!
      • How will the changes to handle duplicate BOs in reserve_bo_and_cond_vms affect the stability and performance of the system? Memory leak prevention?
      • What are the implications of allowing VRAM allocations to go to the GTT domain on small APUs, and how does it improve memory handling?
      • On the non-AMD piece, hat prompted the decision to fix the bo metadata uAPI for vm bind in nouveau now?
      • Is the reset fix for panthor’s heap logical block a temporary workaround or a long-term solution?
      • For AMD: Are there plans to extend these improvements to other APU generations or discrete GPUs
      • (i.e. retain GFX9 support, perhaps GFX8/Polaris/Fiji even if they have a "community validated" status?)
        • (Bonus points for keeping/re-adding the pieces GFX7 for devices such as the 16GB W8100, 8GB W7100, 8GB R9 390X which stopped building for me around ROCm 5.0)
      Importantly: How can the community contribute to testing the myriad of hardware and validating these fixes? (Is there a centralized location to submit results to?)

      For example: I have everything from an Athlon 3000G to 4600G to 5300G w/64GB for STX-sized builds. I have a laptop with a 7945HX with the 2CU 610 APU. I have the other devices mentioned above. RX 550 is still sold as a budget option from MicroCenter and Newegg and I could buy one for GFX8/Polaris testing. I also have a laptop with an A9-9425. All systems have 32GB RAM except for the 7945HX which has 64GB. So each would be suited to various pytorch-based workloads and tests using max RAM possible.

      I would gladly contribute what I can, as I am sure other enthusiasts would (and AMD evangelists would) - given the reporting infrastructure is in place. Openbenchmarking exists - I'm sure AMD could reach out to Michael to contract to build a customized validation/test suite which gets published either to a subset of the openbenchmarking.org page itself (such as with a large tab for "AMD User Validation" or something... or AMD uses their own community solution. Whatever is clever but AMD needs to start to actually rally the most enthusiastic users; rather than alienating them by telling them their system will be dropped from support for local AI experiences and more.)

      Of course there is also the OpenCompute.org's Test and Validation Enablement Initiative to standardize open-sourced testing. Its meant for open-source hardware but could be applied to here also: https://www.opencompute.org/projects...ent-initiative

      Comment


      • #4
        Eirikr1848 those sound like excellent questions and a great opening offer to do useful testing for them. I hope you are attempting to contact the players by some more direct channel than the comments forum under a Phoronix article!

        Comment


        • #5
          Originally posted by filbo View Post
          Eirikr1848 those sound like excellent questions and a great opening offer to do useful testing for them. I hope you are attempting to contact the players by some more direct channel than the comments forum under a Phoronix article!
          It is mostly a “shouting into the void” sort of message, to be honest. Just enjoy putting stuff out there for discussion purposes and idea sharing. If someone finds me somehow and wants to share ideas; or wants someone to coordinate an effort or test something and provide results? Great!
          Last edited by Eirikr1848; 26 May 2024, 04:38 AM.

          Comment


          • #6
            bridgman, sorry for bothering your retirement but is there a new AMD rep user in phoronix that can chime in on such occasions?

            ps: many many thanks again for the awesome work and for being so present around these parts! ❤️

            Comment


            • #7
              Originally posted by marlock View Post
              bridgman, sorry for bothering your retirement but is there a new AMD rep user in phoronix that can chime in on such occasions?
              Good question. On the graphics side agd5f and twriter largely took over well before I retired, but we don't really have someone identified to cover ROCm the same way as far as I know. I'm just in the process of rebuilding contacts but will ask.
              Test signature

              Comment


              • #8
                Why do APUs still need the VRAM carveout anyway? I remember that for some things like scanout older GPUs needed physically linear memory layout. Is that still true? It doesn't really make sense to have dGPU style VRAM allocations with APUs. Why does it still exist?

                Comment


                • #9
                  Originally posted by Eirikr1848 View Post
                  • How will the changes to handle duplicate BOs in reserve_bo_and_cond_vms affect the stability and performance of the system? Memory leak prevention?
                  It avoids potential duplicate locking an prevents locking splat in the kernel log.

                  Originally posted by Eirikr1848 View Post
                  • What are the implications of allowing VRAM allocations to go to the GTT domain on small APUs, and how does it improve memory handling?
                  It allows additional apps to run that look for a certain amount of VRAM. On APUs, VRAM is just system memory so whether you use system memory or VRAM is irrelevant performance-wise. However, a number of applications don't take this into account and just always use VRAM. Since the VRAM carve out is relatively small on APUs, apps that require large amounts of VRAM won't run.

                  Originally posted by Eirikr1848 View Post
                  • For AMD: Are there plans to extend these improvements to other APU generations or discrete GPUs
                  It's not applicable to dGPUs. On dGPUs, VRAM is significantly more performant than system memory so you can't use the pools interchangeably. It's currently enabled on all APUs.

                  Originally posted by Eirikr1848 View Post
                  • (i.e. retain GFX9 support, perhaps GFX8/Polaris/Fiji even if they have a "community validated" status?)
                    • (Bonus points for keeping/re-adding the pieces GFX7 for devices such as the 16GB W8100, 8GB W7100, 8GB R9 390X which stopped building for me around ROCm 5.0)
                  Importantly: How can the community contribute to testing the myriad of hardware and validating these fixes? (Is there a centralized location to submit results to?)

                  GFX9 is still well supported. All CDNA parts are based on gfx9. Kernel driver issues can be reported here:
                  amd (amdgpu, amdkfd, radeon) drm project, currently for issues only.

                  Kernel driver patches should be submitted to:

                  Patches or bug reports for ROCm user mode components should be filed here:
                  AMD ROCm™ Software - GitHub Home. Contribute to ROCm/ROCm development by creating an account on GitHub.


                  Comment


                  • #10
                    Originally posted by brent View Post
                    Why do APUs still need the VRAM carveout anyway? I remember that for some things like scanout older GPUs needed physically linear memory layout. Is that still true? It doesn't really make sense to have dGPU style VRAM allocations with APUs. Why does it still exist?
                    It's mainly for pre-OS display buffers.

                    Comment

                    Working...
                    X