Announcement

Collapse
No announcement yet.

AMDGPU Linux Driver No Longer Lets You Have Unlimited Control To Lower Your Power Limit

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #91
    Originally posted by yump View Post

    Amperes are very important in the context of reliability, and amperes are what cause stress on the VRM.
    Thank you, this was an interesting read.

    I honestly didn't think it was something to consider with cpu/gpu design, as I though that due to the infinitesimal area, the effects on velocity would already negate all of it.

    But as I said, all I know about it is dodgy murky misremembering. The other day after reading you guys' comments I realised I didn't even remember wtf was a S. Just for you to see how little remains in my head about the subject.

    Also I can't even remember exactly what velocity entails. And somehow I though that with semiconductors being in a lattice structure it would mess up a lot of migration.


    Originally posted by yump View Post
    Almost certainly it's f that varies. The power management controller could theoretically tell the scheduler to idle CUs or parts of CUs, but I'm pretty sure reducing frequency (and voltage, because its a lookup table or parametric curve) saves more power per unit performance lost unless you've set a really low power limit on a large chip.

    Oh, I was thinking the exact opposite. I guessiI'm thinking of during intensive workload and you are talking about idle.

    As in, I thought that due to the longer time that it would take to "fill up" a capacitor with the lower voltage, the C would have to rise at iso f.
    E.g. To game at nnnfps frame limited in game, cenario A with a tdp cap of 100 and cenario B with a tdp cap of 50, if the voltage is dropped for cenario B (as it should, since it has a disproportional impact in tdp) both could still boost to fmax, but A would show 50% utilisation and B would show 100% (I think in gpu_busy)(every number not related to reality, not even their relationships, cause of course I don't think halving power requires doubling activity. It's just an illustration)


    As per your latest comment in response to American locomotive. A larger area of the die would have to be active at any given time for it to reach the max clock.

    ____---_____

    s_j_newbury I'd think using dpm to create and set a mode in which the gpu simply can't reach a tdp above what the user wants would be the way they "support". Not guessing internal state from userspace. It shouldn't be needed, right?

    Comment


    • #92
      I just remembered there's something something OHMIC something about semiconductors that was useful to model current.

      I can't remember what, so sorry for posting here but it is a reminder for me to research and see what is it that I half remember.


      I honestly feel like I could get you guys in a call and listen for hours your responses to my questions. At the same time I know for a fact most of my questions would be so shallow that it would in the end be a waste of everyone's time lol.


      I'll see if my nephew who's currently attending uni can hook me up with a copy of "Principles of CMOS VLSI Design: A Systems Perspective", I bet it will be very enjoyable

      Comment


      • #93
        Originally posted by DumbFsck View Post
        Oh, I was thinking the exact opposite. I guessiI'm thinking of during intensive workload and you are talking about idle.

        As in, I thought that due to the longer time that it would take to "fill up" a capacitor with the lower voltage, the C would have to rise at iso f.
        E.g. To game at nnnfps frame limited in game, cenario A with a tdp cap of 100 and cenario B with a tdp cap of 50, if the voltage is dropped for cenario B (as it should, since it has a disproportional impact in tdp) both could still boost to fmax, but A would show 50% utilisation and B would show 100% (I think in gpu_busy)(every number not related to reality, not even their relationships, cause of course I don't think halving power requires doubling activity. It's just an illustration)

        As per your latest comment in response to American locomotive. A larger area of the die would have to be active at any given time for it to reach the max clock.
        No, we're both talking about intensive workloads.

        If the voltage is dropped the chip cannot boost to fmax. There is a minimum voltage required for the chip to compute correctly at any given frequency, running any given sequence of instructions. The minimum voltage required at every frequency for a worst-case workload defines the voltage/frequency curve. The chip has an embedded controller doing power management, and that controller's firmware has a definition of the voltage/frequency curve (either as a lookup table or a parameterized formula).

        How "heavy" or "light" a workload is depends on how many logic gates it causes to switch at once. That means like, furmark is heavy and games are comparatively light, nothing to do with how many FPS the game gets. (You will see some versions of the power formula that incorporate this, as ⍺*C*f*v^2. ⍺ is "activity factor".) The reason activity factor matters is that higher-activity workloads draw more current, which causes more voltage drop in the wires between the VRM and the actual logic gates.

        When the chip is allowed to automatically select voltage to match frequency, energy per frame scales something like frequency squared, at least in the part of the voltage/frequency curve where most chips are designed to operate under load. Turning half the shader cores off (and I don't know if that would actually show up in utilization %) saves no energy at all, because to complete the same work the remaining cores have to run twice as long. (I am assuming here that the uncapped frame rate is at least 2x the capped frame rate. Otherwise the half-disabled chip can't meet the FPS target and isn't doing the same work.)

        Comment


        • #94
          This hit me today. I ran my GPU at 200W instead of 303W, now the minimum allowed power cap is 272W.
          I've tried:
          Code:
          amdgpu.ppfeaturemask=0xffffffff
          But this only allowed me to raise power cap max, not lower power cap min.
          That's it? Take it or leave it?

          Comment


          • #95
            Originally posted by RBilettess View Post
            This hit me today. I ran my GPU at 200W instead of 303W, now the minimum allowed power cap is 272W.
            I've tried:
            Code:
            amdgpu.ppfeaturemask=0xffffffff
            But this only allowed me to raise power cap max, not lower power cap min.
            That's it? Take it or leave it?
            10% lower is all they allow? This is insane considering modern cards reach their maximum efficiency at 60-70% of the normal power limit.
            Instead of trying to catch up to Nvidia, AMD keep shooting themselves in the foot. It's infuriating.

            The talk about safety is completely inane, I can run a 4090 24/7 at 33% PL and there is still no damage.
            I would be far more concerned about running the card 24/7 at 100% PL.
            Last edited by david-nk; 18 March 2024, 08:35 AM.

            Comment


            • #96
              Originally posted by david-nk View Post
              The talk about safety is completely inane, I can run a 4090 24/7 at 33% PL and there is still no damage.
              I would be far more concerned about running the card 24/7 at 100% PL.
              The funny thing is, I'm still allowed to set my 303W card to 402W.
              10% down, but 33% up.
              I'm pretty sure, the heatsink wouldn't be able to handle this. Other stuff like voltage regulators, who knows?

              Comment


              • #97
                It is a very clear strategy of artificial segmentation. You take away users control over that part of the hardware, then you can upcharge customers that want any specific power-related behavior.

                Comment


                • #98
                  Originally posted by varikonniemi View Post
                  What the f is going on here? At least i would expect an explanation how this would damage the card? Only thing i could think of is it would damage their customer service if someone forgot they set it too low... WTF is this in Linux land. "just make it like windows" ... ????
                  AMD (gpus) are getting worse and worse - for productivity, they're pretty useless. The advantages and benefits seem to be evaporating. It seems that all you get is aggravation. Really a shame.

                  Comment


                  • #99
                    For posterity: The fine people at linux-zen already took action:
                    Summary The Linux 6.7 kernel introduced a change to the AMDGPU driver, enforcing a lower power limit set by the graphics card BIOS, preventing users from setting power limits below this threshold. ...

                    Adding 'amdgpu.ignore_min_pcap=1​' as boot parameter on recent linux-zen kernels restores the old behavior.

                    Comment

                    Working...
                    X