Announcement

Collapse
No announcement yet.

AMD Prepares PMF Linux Driver For "Smart PC Solutions Builder"

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Honestly, laptops already have those "embedded controllers", which have supper shitty closed-source firmwares only tuned to Windows; I see this is a try to pull this into a main SOC. Overall, the level of closed-sourcedness and anti-consumer bullshit is somewhat preserved, and there is a chance this will be more stable on Linux. Though, all depends on AMD qualification process.

    Comment


    • #12
      Originally posted by stormcrow View Post

      No, I can see why it would be controversial. We're long past the point where it should be obvious that signed firmware doesn't stop any determined hacker from infiltrating a system. There's numerous examples where this is the case, my favorite being the recent Switch hack presentation from 3C. The only thing it stops is the legitimate clients from exercising control over repairs, including fixing hardware configuration bugs that occur far too often to bother listing.

      In the past, if there were a sub-optimal or even buggy ACPI table for a piece of hardware, you could "simply" substitute out the problematic entries for good ones. Course, the opposite can be true as well. You can royally screw up a system by naively altering ACPI values. OpenBSD, for an example, has its own ACPI tables that aren't provided by Intel (and it's known to cause problems on some systems - I have an old laptop that immediately shuts down after OpenBSD's kernel boot due to a bad ACPI table, but works fine with Linux). But the point is, it's impossible to fix anything without legal access to the configuration settings, which this system effectively locks away from the supposed owners of the hardware for no good reason other than to enforce OEM secrecy over things that probably shouldn't be secret to begin with.

      The controversy isn't that it's 'necessary to mainstream Linux-based laptops'. The controversy is that the locked down signing is not necessary at all to have a functional system. It's an unnecessary external limitation of control, especially since the push is towards having more open and auditable hardware, firmware, and software stacks so people can at least be aware of any risks they're taking, if they can't be easily fixed. Let's face it, sometimes there's no feasible repairs possible, only remediation/mitigation/isolation, yet better the devil you know...

      Edit to add: While signed firmware and executables outside of the end user's control won't stop a determined attacker, code signing in general (with user consent and control) may stop lesser skilled or determined hackers, therefore it shouldn't be totally discarded as a security layer. What makes it actually useful is the ability to revoke and reissue signatures in the event of compromise. AMD having the sole signing authority to do this means this isn't about security (because you can't depend on AMD to properly revoke and reissue keys into the indefinite future) but about usurping user control.
      Looking forward to your “Smart PC Solutions Builder Hack” presentation!

      Comment


      • #13
        Originally posted by edwaleni View Post
        AMD is simply providing exactly what the OEM's want. A way to differentiate themselves without having to rely on a custom SKU just for them. No different than what many of the NVidia OEM's were doing with the reference designs. Unlike Intel, AMD doesn't have the capacity to create and fab several hundred perturbations of their reference Zen's, so they simply will let the OEM's control the feature set.
        I wasn't even thinking of this, for anyone unaware if you have enough $$$ Intel will customize the chip for you.

        Comment


        • #14
          Originally posted by timofonic View Post
          Lots of binary blobs and other crap. I hate this!

          bridgman Why????
          It's on the CPU side so not sure, but extrapolating from the GPU world I can make some guesses. If you go back 15 years the peak power usage of GPUs was roughly in line with the ability of the cooling system to dissipate that power while maintaining safe temperatures, so power management could be as simple as "run at full clocks unless the system is only being lightly used".

          These days it's a different story, particularly in mobile products. GPUs (and presumably CPUs) can use much (MUCH) more power that the cooling system can dissipate - even getting heat out of the package is a problem - so getting best performance while not letting the smoke out is much more complicated and multi-dimensional. Even thermal limiting with dozens of sensors is too slow to avoid local meltdowns, so power management also uses distributed current sensors to anticipate thermal rise and start limiting clocks & voltages.

          The key question is who takes responsibility for not letting the smoke out and killing the system. With desktop systems the worst case on the CPU side is usually that you blow up a socketed CPU, but with mobile systems the cost of damage is usually much higher, typically over 50% of the system cost. Right now HW vendors keep fine-grained power management under their control and if something goes wrong they can be pretty sure it was "their fault" rather than something the user did. There are grey areas like dust buildup in cooling solutions but that tends to happen sufficiently slowly that the firmware can adapt to it and still keep the smoke in.

          As edwaleni suggested this is really an alternative to creating custom SKUs or at least custom SMU firmware images.

          I haven't had a chance to go through the patches to see if there is a way to disable the mechanism in the event of problems, but even that is complicated because (for example) an OEM might be using it to support a SKU with a tiny cooling solution that is not able too run at default power levels. I would rather see that specific case handled with a user-visible knob or hard fuse so maybe a bad example, but key point is that there is a line between "stuff the vendor has to control in order to be able to warranty the product" and "stuff that is pretty safe for users to manage, that might crash the system but not permanently damage it".

          We try to make anything in the second bucket visible to users and developers, but we also try to lock down anything in the first bucket.

          Would you be OK with a mechanism that lets you say "I am taking responsibility for fine-grained power management and do not expect warranty coverage if something goes wrong" then records the answer permanently in the hardware ? My impression from customer support is that the general answer seems to be "no". The problem is not individual developers experimenting with PM software and letting the smoke out, but thousands or millions of users running power management software written by someone they never met.

          All that said, there may be a better idea we (and the other vendors) have missed so far. Nobody likes proliferation of blobs but AFAICS the alternative is having a lot of different SMU firmware images with no way to make sure that firmware images are properly paired with hardware implementations.
          Last edited by bridgman; 24 September 2023, 07:34 PM.
          Test signature

          Comment


          • #15
            Originally posted by edwaleni View Post
            AMD is simply providing exactly what the OEM's want. A way to differentiate themselves without having to rely on a custom SKU just for them. No different than what many of the NVidia OEM's were doing with the reference designs. Unlike Intel, AMD doesn't have the capacity to create and fab several hundred perturbations of their reference Zen's, so they simply will let the OEM's control the feature set.
            But this isn't about controlling features (which AMD already does by blowing "e-fuses") - this is about controlling power. The only noted "output" of this being able to update SMU power limits.

            Basically, the whole purpose of this system is to make laptops with mediocre cooling more performant and/or be able to sell a laptop with a "higher spec" CPU than the cooling system can support. Notice how they used "lid status" as proposed inputs? Why would the SMU need to know if the lid is closed? Shouldn't the only thing that matter if you close the lid is if you have the OS set to turn the display or off, or put the system to sleep? Well, when your ultra-thin-whatever relies on the whole keyboard surface to be a radiator, when you close the lid, you might lose 6-7 watts worth of power dissipation. Rather than have the whole thing bump up against the 100°C thermal limit and start throttling when you decided to have the system do a render overnight with the lid closed, the SMU will just drastically cut CPU power.

            Personally, I really don't like where the industry has gone, and is continuing to go with this whole configurable and dynamic TDP nonsense. It basically makes CPU SKUs completely meaningless. The same CPU model can perform dramatically different in two completely different laptops just based on which one has more thermal headroom.

            I miss the days where TDP was TDP, and manufacturers had to design their cooling systems to support a particular TDP. Not change a processor's TDP to fit their cooling solution.

            Comment


            • #16
              Originally posted by bridgman View Post

              All that said, there may be a better idea we (and the other vendors) have missed so far. Nobody likes proliferation of blobs but AFAICS the alternative is having a lot of different SMU firmware images with no way to make sure that firmware images are properly paired with hardware implementations.
              There is the always the obvious answer of "build your products with adequate cooling" which runs in conjunction with the solution "quit building super thin products out of lightweight materials so you can observe the laws of thermal dynamics".

              I speak from experience.
              • Version 1.0 of my AMD PC didn't have adequate cooling so my RX 580 kept overheating. I blamed MSI.
              • Version 2.0 got a Noctua fan placed into every free opening. No more overheating.
              • Version 3.0 has a heat sync for the CPU so large the internal side fan is an external. I forgot to account for the fan doing space calculations.

              Comment


              • #17
                Originally posted by Eirikr1848 View Post

                Looking forward to your “Smart PC Solutions Builder Hack” presentation!
                It's called "drugs, torture, and threats" and has been used for 10s of thousands of years by everyone from Roman Legions to Hell's Angels

                Comment


                • #18
                  Originally posted by AmericanLocomotive View Post
                  I miss the days where TDP was TDP, and manufacturers had to design their cooling systems to support a particular TDP. Not change a processor's TDP to fit their cooling solution.
                  Me too... but the question became "if you could get 90% of the perceived performance with half the weight and lower cost, at the expense of more touchy and complex power management would you take it ?" and the answer was invariably "hell yes".

                  The other challenge is simply getting heat out of the package fast enough to avoid thermal runaway even if you have what amounts to an infinite heat sink, eg a block of solid copper measuring a parsec on each edge.
                  Last edited by bridgman; 24 September 2023, 09:12 PM.
                  Test signature

                  Comment


                  • #19
                    Blobs blobs blobs
                    Choosing between AMD and Intel feels like choosing between turd sandwich and giant douche. Very sad.

                    Comment


                    • #20
                      If it helps, I don't think the PMF blobs are needed for generic systems, just for OEM-customized systems. My guess is primarily laptops and AIO's but I don't know that for sure (I'm on the GPU side so guessing a bit re: CPU support).

                      The other blobs are primarily hardware microcode so more hardware than software.
                      Test signature

                      Comment

                      Working...
                      X