Announcement

Collapse
No announcement yet.

AMD Stages Latest Radeon/AMDGPU Changes For Linux 4.21 Kernel

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • muncrief
    replied
    Originally posted by duby229 View Post

    Well, if you are getting a permission denied error, then it's likely a filesystem problem. Something like an executable bit or a read bit or a write bit or something. Probably not a driver issue.
    Actually it turned out the Xorg log was from when I originally tried 4.19.0, and unfortunately not even that is produced in subsequent kernels. In any case I did look at /dev/dri/card0 and it has the same permissions that work with all other kernels: crw-rw----+. And by the way kernels 4.14.81, 4.17.19, and 4.18.19 work perfectly with the system just as it is. Though I also backed everything up and did a clean reinstall just to be sure it wasn't my setup. I also removed my second GPU and a few PCI cards when doing the clean install test just to eliminate as many variables as possible.

    In any case just out of curiosity I also tried setting the card0 "Other" permissions to rw as well, but as I expected the permissions are set automatically and don't survive reboot.

    Leave a comment:


  • duby229
    replied
    Originally posted by perpetually high View Post

    Appreciate the info.

    About a couple months ago I bought a GTX 1080 (I'm not proud of this) and didn't experience any GPU hangs or issues with the PSU. And that is more power hungry than the RX 480. I believe I was hitting 180W on the GPU at full power, but I can't remember. So in my particular issue, I can say with high confidence it's not the PSU. I did however, return that GTX 1080 and got back my $500 (this, I'm very proud of).
    Sorry brother, I don't mean to second guess you. But, just because some prior GPU worked in the same configuration doesn't mean that this GPU isn't pulling too much power. I'm 100% certain that 1080 is more power efficient than that 580 and I'm also 100% certain nVidia's power gating and thermal throttling is more advanced than amd's. It's entirely likely that you were in fact hitting the same power limit, but nVidia's driver gated and throttled the GPU enough to make it unnoticeable.

    Leave a comment:


  • perpetually high
    replied
    Originally posted by duby229 View Post

    The fucked thing about power supplies though is that retardedly, many of them have double and even triple 12v rails. If you have the CPU and GPU pulling power off the same rail and the other are going unused then that could in fact be a power supply problem, and TT is specifically known to use too many rails with too little power per rail.

    EDIT: Honestly, if you have a single CPU and a single GPU then you really should buy a power supply with a single 12v rail. If you already own a dual rail supply then you need to make sure the GPU is pulling power off the second rail.
    Appreciate the info.

    About a couple months ago I bought a GTX 1080 (I'm not proud of this) and didn't experience any GPU hangs or issues with the PSU. And that is more power hungry than the RX 480. I believe I was hitting 180W on the GPU at full power, but I can't remember. So in my particular issue, I can say with high confidence it's not the PSU. I did however, return that GTX 1080 and got back my $500 (this, I'm very proud of).

    Leave a comment:


  • duby229
    replied
    Originally posted by perpetually high View Post

    I'd normally agree, but it's not the PSU. I have a Thermaltake 700W, it can definitely handle my i5 Haswell and RX 480 at max stock clocks.

    I'm almost certain the voltage is too low. In my previous post I showed it was pulling 150W @ 1.09V. I think if I bump that up to 1.175V (1175mV), it'll fix the issue.

    Running prime95 and Intel's power_gadget right after, I see:

    Code:
    $ sudo ./power_gadget -e 1000 -d 1
    System Time,RDTSC,Elapsed Time (sec),IA Frequency_0 (MHz),Processor Power_0 (Watt),Cumulative Processor Energy_0 (Joules),Cumulative Processor Energy_0 (mWh),IA Power_0 (Watt),Cumulative IA Energy_0 (Joules),Cumulative IA Energy_0(mWh),GT Power_0 (Watt),Cumulative GT Energy_0 (Joules),Cumulative GT Energy_0(mWh)
    06:46:45:849,90852126468092,1.0007,477172664,89.1909,89.1983,24.7773,81.4456,81.4524,22.6257,0.0000,0.0000,0.0000,
    
    Total Elapsed Time(sec)=1.0007
    
    Total Processor Energy_0(Joules)=89.1983
    Total Processor Energy_0(mWh)=24.7773
    [B]Average Processor Power_0(Watt)=89.1393[/B]
    Let's round up to 100W, so that's 250W for both my CPU and GPU, leaving 500W left. It's not the PSU.

    I'll have to report back sometime with 1303 MHz at higher voltages but I'll wait until the drivers mature a little more when it comes to overclocking. I've had problems with the amdgpu.ppfeaturemask in the past.
    The fucked thing about power supplies though is that retardedly, many of them have double and even triple 12v rails. If you have the CPU and GPU pulling power off the same rail and the other are going unused then that could in fact be a power supply problem, and TT is specifically known to use too many rails with too little power per rail.

    EDIT: Honestly, if you have a single CPU and a single GPU then you really should buy a power supply with a single 12v rail. If you already own a dual rail supply then you need to make sure the GPU is pulling power off the second rail.
    Last edited by duby229; 16 November 2018, 02:03 PM.

    Leave a comment:


  • perpetually high
    replied
    Originally posted by debianxfce View Post
    I have a feeling that your PSU is too weak.
    I'd normally agree, but it's not the PSU. I have a Thermaltake 700W, it can definitely handle my i5 Haswell and RX 480 at max stock clocks.

    I'm almost certain the voltage is too low. In my previous post I showed it was pulling 150W @ 1.09V. I think if I bump that up to 1.175V (1175mV), it'll fix the issue.

    Running prime95 and Intel's power_gadget right after, I see:

    Code:
    $ sudo ./power_gadget -e 1000 -d 1
    System Time,RDTSC,Elapsed Time (sec),IA Frequency_0 (MHz),Processor Power_0 (Watt),Cumulative Processor Energy_0 (Joules),Cumulative Processor Energy_0 (mWh),IA Power_0 (Watt),Cumulative IA Energy_0 (Joules),Cumulative IA Energy_0(mWh),GT Power_0 (Watt),Cumulative GT Energy_0 (Joules),Cumulative GT Energy_0(mWh)
    06:46:45:849,90852126468092,1.0007,477172664,89.1909,89.1983,24.7773,81.4456,81.4524,22.6257,0.0000,0.0000,0.0000,
    
    Total Elapsed Time(sec)=1.0007
    
    Total Processor Energy_0(Joules)=89.1983
    Total Processor Energy_0(mWh)=24.7773
    [B]Average Processor Power_0(Watt)=89.1393[/B]
    Let's round up to 100W, so that's 250W for both my CPU and GPU, leaving 500W left. It's not the PSU.

    I'll have to report back sometime with 1303 MHz at higher voltages but I'll wait until the drivers mature a little more when it comes to overclocking. I've had problems with the amdgpu.ppfeaturemask in the past.

    Leave a comment:


  • perpetually high
    replied
    Originally posted by IreMinMon View Post

    What monitoring software is this?
    It's the GALLIUM_HUD env varaible, here are some example usages:

    Native games:

    Code:
    $ export GALLIUM_HUD_PERIOD=0.07
    $ export GALLIUM_HUD=".h80.w105cpufreq-cur-cpu0+cpufreq-cur-cpu1+cpufreq-cur-cpu2+cpufreq-cur-cpu3;.h80.x185.w230.c100cpu0+cpu1+cpu2+cpu3;.x445. h80.w75.dGPU-load+cpu+fps;.x565.h80.w875.dfps;.x1470.h80.w190.c 100sensors_temp_cu-amdgpu-pci-0100.temp1+GPU-load:100;.x1690.h80.w170requested-VRAM+VRAM-usage"
    $ ./game

    For Steam games:

    Code:
    GALLIUM_HUD_PERIOD=0.07 GALLIUM_HUD=".h80.w105cpufreq-cur-cpu0+cpufreq-cur-cpu1+cpufreq-cur-cpu2+cpufreq-cur-cpu3;.h80.x185.w230.c100cpu0+cpu1+cpu2+cpu3;.x445. h80.w75.dGPU-load+cpu+fps;.x565.h80.w875.dfps;.x1470.h80.w190.c 100sensors_temp_cu-amdgpu-pci-0100.temp1+GPU-load:100;.x1690.h80.w170requested-VRAM+VRAM-usage" %command%

    If you have more than 4 cores, you'll just need to update the above to add in cpu4-7, etc.

    More info here: https://manerosss.wordpress.com/2017...o-gallium-hud/

    Leave a comment:


  • IreMinMon
    replied
    Originally posted by perpetually high View Post
    Took a photo of a GPU hang occurring in Metro 2033 Redux with GALLIUM_HUD env var set:



    - GPU temp: 66c
    - GPU load: 99%
    - CPU's were at 3.6 GHz (Turbo Boost from base 3.4 apparently)
    - CPU loads were 71, 57, 50, 75
    - FPS was at 163
    - VRAM usage was reasonable at 1.175GB

    So the GPU load at 99 is the only thing that sticks out here. Also, you'll also see on the bottom left the textures became screwed up. Usually when that happens, about 1 or 2 seconds later the hang happens, as the case here.
    What monitoring software is this?

    Leave a comment:


  • TemplarGR
    replied
    Originally posted by debianxfce View Post

    The auto setting should use the bios of the GPU card. Poor bios you might have in your GPU card. I hope you have latest drivers, Linux amdgpu firmware and bios.
    I am a little afraid of tinkering with the bios. I have done it in the past on an older gpu successfully but i need to work and i don't have a replacement now. Still, on windows the gpu runs pretty well. That is the weird thing. So how is the bios bad if it can work on windows?

    Leave a comment:


  • Etherman
    replied
    I use PolarisBioseditor for setting my volts and mhz.

    Leave a comment:


  • clapbr
    replied
    Originally posted by perpetually high View Post
    Man, I just have to say, it's sooo nice to be able to game again worry-free of hangs. I played for hours today and zero hangs. Passed all the previous checkpoints in BioShock Infinite, Metro 2033 Redux, etc that I couldn't get to before... perpetually high is back, baby!

    1191MHz on the core clock is only a compromise of a 112 MHz from the default 1303, I can live with that. I'm going to revisit upping the voltage on the 1303 MHz setting at a later time. Will update my post with results as well when I do.





    Yeah, you could go that route also. As a warning though- I had issues with setting the amdgpu.ppfeaturemask=0xffffffff flag. Others have also from what I've seen online. You might not, but then again we have the same exact card so you likely will.

    rocm-smi is really nice, and doesn't require that flag to be set. I highly recommend it in general.
    That feeling when fixing these is pretty good

    From all the issues I ever got I luckily never had a full system crash with AMD drivers except when I messed with overclocking. Good to know about rocm-smi, I will try it.

    I don't know if you can set custom clock states using rocm-smi but if you can it might be worth trying a simple trial-and-error method to find the maximum clock that doesn't crash for you.

    Leave a comment:

Working...
X