Announcement

Collapse
No announcement yet.

Linux 6.10 Improves AMD ROCm Compute Support For "Small" Ryzen APUs

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • podejib737
    replied
    I installed Linux kernel 6.10-rc1 on my 6800U laptop and run stable diffusion without problem. However when I tried to play video on browser the system freeze. I have to hold the power button to turn off my laptop. Here is the kern.log message.

    Code:
    2024-05-29T09:08:35.105988+08:00 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring vcn_dec_0 timeout, signaled seq=9117, emitted seq=9120
    2024-05-29T09:08:35.106015+08:00 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process RDD Process pid 18573 thread f
    irefox-bi:cs0 pid 20093
    2024-05-:35.106018+08:00 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
    2024-05-29T09:08:36.296719+08:00 kernel: [drm] Register(0) [mmUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
    2024-05-29T09:08:36.585714+08:00 kernel: [drm] Register(0) [mmUVD_RBC_RB_RPTR] failed to reach value 0x000002c0 != 0x00000200n
    2024-05-29T09:08:36.639720+08:00 kernel: [drm] Register(0) [mmUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
    2024-05-29T09:08:36.639735+08:00 kernel: amdgpu 0000:03:00.0: amdgpu: MODE2 reset
    2024-05-29T09:08:36.651699+08:00 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset succeeded, trying to resume
    2024-05-29T09:08:36.652723+08:00 kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F41FC00000).
    2024-05-29T09:08:36.652739+08:00 kernel: [drm] VRAM is lost due to GPU reset!
    2024-05-29T09:08:36.652743+08:00 kernel: amdgpu 0000:03:00.0: amdgpu: PSP is resuming...
    2024-05-29T09:08:36.674720+08:00 kernel: amdgpu 0000:03:00.0: amdgpu: reserve 0xa00000 from 0xf41e000000 for PSP TMR
    2024-05-29T09:08:37.003715+08:00 kernel: amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available
    2024-05-29T09:08:37.014719+08:00 kernel: amdgpu 0000:03:00.0: amdgpu: RAP: optional rap ta ucode is not available
    2024-05-29T09:08:37.014734+08:00 kernel: amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
    2024-05-29T09:08:37.014737+08:00 kernel: amdgpu 0000:03:00.0: amdgpu: SMU is resuming...
    2024-05-29T09:08:37.017713+08:00 kernel: amdgpu 0000:03:00.0: amdgpu: SMU is resumed successfully!
    2024-05-29T09:08:37.018712+08:00 kernel: [drm] DMUB hardware initialized: version=0x04000044
    2024-05-29T09:08:37.684726+08:00 kernel: [drm] kiq ring mec 2 pipe 1 q 0
    2024-05-29T09:08:37.974903+08:00 kernel: amdgpu 0000:03:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vcn_dec_0 test failed (-110)
    2024-05-29T09:08:37.974925+08:00 kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <vcn_v3_0> failed -110
    2024-05-29T09:08:37.974929+08:00 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset(1) failed
    2024-05-29T09:08:37.974937+08:00 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset end with ret = -110
    2024-05-29T09:08:37.975743+08:00 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -110
    2024-05-29T09:08:39.260728+08:00 kernel: [drm] Register(0) [mmUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
    2024-05-29T09:08:39.545722+08:00 kernel: [drm] Register(0) [mmUVD_RBC_RB_RPTR] failed to reach value 0x00000010 != 0x00000000n​
    I am not sure where should I post this to but hopefully someone can forward this to AMD Linux kernel developer. Thanks

    Leave a comment:


  • Eirikr1848
    replied
    Originally posted by filbo View Post
    Eirikr1848 those sound like excellent questions and a great opening offer to do useful testing for them. I hope you are attempting to contact the players by some more direct channel than the comments forum under a Phoronix article!
    Looks like the biggest, baddest, coolest playa replied, so there ya have it. 🥳

    Leave a comment:


  • Eirikr1848
    replied
    Originally posted by agd5f View Post

    It avoids potential duplicate locking an prevents locking splat in the kernel log.



    It allows additional apps to run that look for a certain amount of VRAM. On APUs, VRAM is just system memory so whether you use system memory or VRAM is irrelevant performance-wise. However, a number of applications don't take this into account and just always use VRAM. Since the VRAM carve out is relatively small on APUs, apps that require large amounts of VRAM won't run.



    It's not applicable to dGPUs. On dGPUs, VRAM is significantly more performant than system memory so you can't use the pools interchangeably. It's currently enabled on all APUs.



    GFX9 is still well supported. All CDNA parts are based on gfx9. Kernel driver issues can be reported here:
    amd (amdgpu, amdkfd, radeon) drm project, currently for issues only.

    Kernel driver patches should be submitted to:

    Patches or bug reports for ROCm user mode components should be filed here:
    AMD ROCm™ Software - GitHub Home. Contribute to ROCm/ROCm development by creating an account on GitHub.

    You are a beautiful soul, thank you so much for replying! I asked some questions because:

    - A Reddit user in the /r/AMD group stated Vega APUs are not included in this, and that they bought an MI50 and it was no longer supported with ROCm.

    Silliness, I suppose.

    - The /r/LocalLLaMA group wants to be able to enable their GPUs to use system RAM in low-VRAM situations, such as running a 32GB model on a 24GB GPU, and hoped this would also cover their situation. (There was also discussion there of using their GFX8 GPUs in a similar fashion - RX 580 8GB GPUs + using more VRAM )

    - There was also discussion about using their GFX8 APUs on some older HP “ROCm certified” mini PCs as well as other APUs of that generation.

    Does
    Code:
    ROC_ENABLE_PRE_VEGA=1
    work for those older APUs + this latest fix or is there something else needed to enable those?

    Leave a comment:


  • agd5f
    replied
    Originally posted by brent View Post
    Why do APUs still need the VRAM carveout anyway? I remember that for some things like scanout older GPUs needed physically linear memory layout. Is that still true? It doesn't really make sense to have dGPU style VRAM allocations with APUs. Why does it still exist?
    It's mainly for pre-OS display buffers.

    Leave a comment:


  • agd5f
    replied
    Originally posted by Eirikr1848 View Post
    • How will the changes to handle duplicate BOs in reserve_bo_and_cond_vms affect the stability and performance of the system? Memory leak prevention?
    It avoids potential duplicate locking an prevents locking splat in the kernel log.

    Originally posted by Eirikr1848 View Post
    • What are the implications of allowing VRAM allocations to go to the GTT domain on small APUs, and how does it improve memory handling?
    It allows additional apps to run that look for a certain amount of VRAM. On APUs, VRAM is just system memory so whether you use system memory or VRAM is irrelevant performance-wise. However, a number of applications don't take this into account and just always use VRAM. Since the VRAM carve out is relatively small on APUs, apps that require large amounts of VRAM won't run.

    Originally posted by Eirikr1848 View Post
    • For AMD: Are there plans to extend these improvements to other APU generations or discrete GPUs
    It's not applicable to dGPUs. On dGPUs, VRAM is significantly more performant than system memory so you can't use the pools interchangeably. It's currently enabled on all APUs.

    Originally posted by Eirikr1848 View Post
    • (i.e. retain GFX9 support, perhaps GFX8/Polaris/Fiji even if they have a "community validated" status?)
      • (Bonus points for keeping/re-adding the pieces GFX7 for devices such as the 16GB W8100, 8GB W7100, 8GB R9 390X which stopped building for me around ROCm 5.0)
    Importantly: How can the community contribute to testing the myriad of hardware and validating these fixes? (Is there a centralized location to submit results to?)

    GFX9 is still well supported. All CDNA parts are based on gfx9. Kernel driver issues can be reported here:
    amd (amdgpu, amdkfd, radeon) drm project, currently for issues only.

    Kernel driver patches should be submitted to:

    Patches or bug reports for ROCm user mode components should be filed here:
    AMD ROCm™ Software - GitHub Home. Contribute to ROCm/ROCm development by creating an account on GitHub.


    Leave a comment:


  • brent
    replied
    Why do APUs still need the VRAM carveout anyway? I remember that for some things like scanout older GPUs needed physically linear memory layout. Is that still true? It doesn't really make sense to have dGPU style VRAM allocations with APUs. Why does it still exist?

    Leave a comment:


  • bridgman
    replied
    Originally posted by marlock View Post
    bridgman, sorry for bothering your retirement but is there a new AMD rep user in phoronix that can chime in on such occasions?
    Good question. On the graphics side agd5f and twriter largely took over well before I retired, but we don't really have someone identified to cover ROCm the same way as far as I know. I'm just in the process of rebuilding contacts but will ask.

    Leave a comment:


  • marlock
    replied
    bridgman, sorry for bothering your retirement but is there a new AMD rep user in phoronix that can chime in on such occasions?

    ps: many many thanks again for the awesome work and for being so present around these parts! ❤️

    Leave a comment:


  • Eirikr1848
    replied
    Originally posted by filbo View Post
    Eirikr1848 those sound like excellent questions and a great opening offer to do useful testing for them. I hope you are attempting to contact the players by some more direct channel than the comments forum under a Phoronix article!
    It is mostly a “shouting into the void” sort of message, to be honest. Just enjoy putting stuff out there for discussion purposes and idea sharing. If someone finds me somehow and wants to share ideas; or wants someone to coordinate an effort or test something and provide results? Great!
    Last edited by Eirikr1848; 26 May 2024, 04:38 AM.

    Leave a comment:


  • filbo
    replied
    Eirikr1848 those sound like excellent questions and a great opening offer to do useful testing for them. I hope you are attempting to contact the players by some more direct channel than the comments forum under a Phoronix article!

    Leave a comment:

Working...
X