Announcement

Collapse
No announcement yet.

AMD Stages Latest Radeon/AMDGPU Changes For Linux 4.21 Kernel

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by perpetually high View Post
    this is fixed in 4.20 if you want to try it out
    Just don't try to copy any big files from a xfs filesystem to a nfs filesystem while you try it out. 4.20rc2 has.. other problems.

    Comment


    • #32
      Originally posted by perpetually high View Post
      Ahh, it was the clocks!

      Etherman and aufkrawall, you guys called it. Thank you for the suggestion to lower the clocks.

      I ended up using rocm-smi to manually set sclk to level 5: $ rocm-smi --setsclk 5

      For reference on my card:
      Code:
      GPU[0] : Supported GPU clock frequencies on GPU0
      GPU[0] : 0: 300Mhz
      GPU[0] : 1: 608Mhz
      GPU[0] : 2: 910Mhz
      GPU[0] : 3: 1077Mhz
      GPU[0] : 4: 1145Mhz
      GPU[0] : 5: 1191Mhz *
      GPU[0] : 6: 1236Mhz
      GPU[0] : 7: 1303Mhz
      I've yet to try level 6 (as I wanted to give it a decent downclock to test the theory out) or level 7 with higher voltage than it was being given.

      Annoyed I didn't think to try this sooner. Thank you guys again. This makes sense why it's affecting certain cards and not others. For the record, I have very good system cooling and air flow, and a 700W PSU. I know my system can handle the RX 480 at full load.

      raonlinux, would be great if you could test out this theory, too. Let me know if you need any help with rocm-smi or getting the card to downclock.
      If temperatures are fine I'd try checking if you can feed it a bit more power. When I was messing with overclocks I had crashes after stressing the card that were gone after a very little increase in Watts. My specific gpu model has an optional power plug on it meant for overclocks though, but if I am not wrong, you can't go above the maximum so it should be safe.

      btw, if you have dual-bios with different clocks it might be also a possible solution if you don't want to be changing clocks with scripts or manually.

      Comment


      • #33
        Originally posted by perpetually high View Post
        Ahh, it was the clocks!

        Etherman and aufkrawall, you guys called it. Thank you for the suggestion to lower the clocks.

        I ended up using rocm-smi to manually set sclk to level 5: $ rocm-smi --setsclk 5

        For reference on my card:
        Code:
        GPU[0] : Supported GPU clock frequencies on GPU0
        GPU[0] : 0: 300Mhz
        GPU[0] : 1: 608Mhz
        GPU[0] : 2: 910Mhz
        GPU[0] : 3: 1077Mhz
        GPU[0] : 4: 1145Mhz
        GPU[0] : 5: 1191Mhz *
        GPU[0] : 6: 1236Mhz
        GPU[0] : 7: 1303Mhz
        I've yet to try level 6 (as I wanted to give it a decent downclock to test the theory out) or level 7 with higher voltage than it was being given.



        raonlinux, would be great if you could test out this theory, too. Let me know if you need any help with rocm-smi or getting the card to downclock.
        Code:
        OD_SCLK:
        0:        300MHz        800mV
        1:        608MHz        818mV
        2:        910MHz        824mV
        3:       1077MHz        906mV
        4:       1145MHz        968mV
        5:       1191MHz       1012mV
        6:       1236MHz       1062mV
        7:       1303MHz       1143mV
        OD_MCLK:
        0:        300MHz        800mV
        1:       2000MHz        975mV
        OD_RANGE:
        SCLK:     300MHz       2000MHz
        MCLK:     300MHz       2250MHz
        VDDC:     800mV        1175mV
        I check my device/pp_od_clk_voltage so is set like that, how I must set the state 6 to test out. Do I need to install rocm for state the gpu clock, or can I do it with commands? Let me know if you don' t have problem with level 6.

        Comment


        • #34
          Originally posted by raonlinux View Post

          Code:
          OD_SCLK:
          0: 300MHz 800mV
          1: 608MHz 818mV
          2: 910MHz 824mV
          3: 1077MHz 906mV
          4: 1145MHz 968mV
          5: 1191MHz 1012mV
          6: 1236MHz 1062mV
          7: 1303MHz 1143mV
          OD_MCLK:
          0: 300MHz 800mV
          1: 2000MHz 975mV
          OD_RANGE:
          SCLK: 300MHz 2000MHz
          MCLK: 300MHz 2250MHz
          VDDC: 800mV 1175mV
          I check my device/pp_od_clk_voltage so is set like that, how I must set the state 6 to test out. Do I need to install rocm for state the gpu clock, or can I do it with commands? Let me know if you don' t have problem with level 6.
          You don't need rocm, follow this guide (its arch wiki but valid for other distros) https://wiki.archlinux.org/index.php/AMDGPU

          Don't forget to enable amdgpu.ppfeaturemask=0xffffffff on kernel parameters.

          Comment


          • #35
            Man, I just have to say, it's sooo nice to be able to game again worry-free of hangs. I played for hours today and zero hangs. Passed all the previous checkpoints in BioShock Infinite, Metro 2033 Redux, etc that I couldn't get to before... perpetually high is back, baby!

            1191MHz on the core clock is only a compromise of a 112 MHz from the default 1303, I can live with that. I'm going to revisit upping the voltage on the 1303 MHz setting at a later time. Will update my post with results as well when I do.

            I check my device/pp_od_clk_voltage so is set like that, how I must set the state 6 to test out. Do I need to install rocm for state the gpu clock, or can I do it with commands? Let me know if you don' t have problem with level 6.
            Originally posted by clapbr View Post
            You don't need rocm, follow this guide (its arch wiki but valid for other distros) https://wiki.archlinux.org/index.php/AMDGPU

            Don't forget to enable amdgpu.ppfeaturemask=0xffffffff on kernel parameters.
            Yeah, you could go that route also. As a warning though- I had issues with setting the amdgpu.ppfeaturemask=0xffffffff flag. Others have also from what I've seen online. You might not, but then again we have the same exact card so you likely will.

            rocm-smi is really nice, and doesn't require that flag to be set. I highly recommend it in general.

            Comment


            • #36
              Thanks for the help guys, so far try to run unigine heaven without a crash I set the maximum state as the lvl 6 without problem. I should try more games for know if this work with others games.
              At the end I set that the set allow are from 0 ~ 6.
              Code:
              echo "0 1 2 3 4 5 6" > pp_dpm_sclk

              Comment


              • #37
                Originally posted by perpetually high View Post
                Man, I just have to say, it's sooo nice to be able to game again worry-free of hangs. I played for hours today and zero hangs. Passed all the previous checkpoints in BioShock Infinite, Metro 2033 Redux, etc that I couldn't get to before... perpetually high is back, baby!

                1191MHz on the core clock is only a compromise of a 112 MHz from the default 1303, I can live with that. I'm going to revisit upping the voltage on the 1303 MHz setting at a later time. Will update my post with results as well when I do.





                Yeah, you could go that route also. As a warning though- I had issues with setting the amdgpu.ppfeaturemask=0xffffffff flag. Others have also from what I've seen online. You might not, but then again we have the same exact card so you likely will.

                rocm-smi is really nice, and doesn't require that flag to be set. I highly recommend it in general.
                That feeling when fixing these is pretty good

                From all the issues I ever got I luckily never had a full system crash with AMD drivers except when I messed with overclocking. Good to know about rocm-smi, I will try it.

                I don't know if you can set custom clock states using rocm-smi but if you can it might be worth trying a simple trial-and-error method to find the maximum clock that doesn't crash for you.

                Comment


                • #38
                  I use PolarisBioseditor for setting my volts and mhz.

                  Comment


                  • #39
                    Originally posted by debianxfce View Post

                    The auto setting should use the bios of the GPU card. Poor bios you might have in your GPU card. I hope you have latest drivers, Linux amdgpu firmware and bios.
                    I am a little afraid of tinkering with the bios. I have done it in the past on an older gpu successfully but i need to work and i don't have a replacement now. Still, on windows the gpu runs pretty well. That is the weird thing. So how is the bios bad if it can work on windows?

                    Comment


                    • #40
                      Originally posted by perpetually high View Post
                      Took a photo of a GPU hang occurring in Metro 2033 Redux with GALLIUM_HUD env var set:



                      - GPU temp: 66c
                      - GPU load: 99%
                      - CPU's were at 3.6 GHz (Turbo Boost from base 3.4 apparently)
                      - CPU loads were 71, 57, 50, 75
                      - FPS was at 163
                      - VRAM usage was reasonable at 1.175GB

                      So the GPU load at 99 is the only thing that sticks out here. Also, you'll also see on the bottom left the textures became screwed up. Usually when that happens, about 1 or 2 seconds later the hang happens, as the case here.
                      What monitoring software is this?

                      Comment

                      Working...
                      X