Radeon Power Management Still An Incomplete Mess
When talking about the state of open-source Radeon driver features last weekend, one of the areas to first be criticized within the forums was the improper power management of the open-source AMD Linux graphics driver.
Comments were quick to come in along the lines of:
"It's power management and performance I personally find lacking - in recent tests, the low-end cards were 2x-5x lower perf vs catalyst."
"I dream of the day that bridgman will anounce in the forums that they will release the PM or an article by Michael confirming this. :p"
Of course, John Bridgman of AMD was quick to jump in with commentary from his perspective.
Bridgman's initial reaction came down to the fact that AMD has already released some code and documentation concerning power management, but not many people are working on it within the community. In addition, recently there was the release of a new AMD ACPI header file and some improvements headed to the next Linux kernel release (the Linux 3.7 kernel).
In response to a reader question that asked Bridgman, "Am I supposed to read this as 'We reviewd the code and we can't release it???'," the Bridgman response was unfortunately "Correct." They have been working to release improved power management code, but it has yet to clear AMD's legal/technical review processes. The Canadian reiterated though, "Again, there's a lot of PM info out there now. This is just a couple of additional blocks. PM seems to be an exception to the rest of the driver stack -- everyone seems to want it but hardly anyone seems to be willing to work on it. For most of the other bits it seems that every N'th user is willing to roll up their sleeves and make the code better but it's not happening here."
Bridgman, in another post in the Phoronix Forums thread, has an open question of "If so many people want better power management, and if people are already tweaking the code on their own personal systems, why aren't we seeing improvements in the common code?"
In response to the usual bickering that AMD should just open-source the feature-rich Catalyst driver for Linux, which does have PowerPlay and proper power management support, "If we have trouble getting approval to release a specific block of programming info, do you think we would have an easier time releasing the same info mixed with 5 million lines of proprietary source code, particularly when that source code is written to work across multiple OSes and most of those OSes *require* robust DRM as part of the design?"
Additionally, Bridgman had to say, "The GPU business is *very* competitive, and small differences in performance & features drive many of the buying decisions. The cost of driver development is the primary entry barrier for new competitors. Why would an established vendor give away their competitive advantage?"
To some surprise, David Airlie of Red Hat then jumped into this thread to combat some statements made by Bridgman. "John can you stop spreading this BS, it really isn't possible to improve the current PM code to anywhere near the degree you think. The problem is the atom tables (for setting engine and memory clocks) aren't used or tested in this way by the fglrx driver, so they have no QE beyond the BIOS using them at startup to set the clocks. The time taken to execute the tables is longer than vblank on a lot of cards, and this would require writing per-card/memory attached specific tables to try and allow the reclock to run in under a vblank time limit. Really you guys know how the cards work, and fglrx works with them, anything else is pointless since its using functionality that hasn't been exercised or QAed. please stop making excuses. you could maybe improve r500 to the level of fglrx but r600 and upwards its a waste of time and it would require years of testing before we could enable it by default, since no other drivers have ever tested these codepaths."
Basically the AtomBIOS tables used in resetting the core and memory clocks for the graphics card isn't thoroughly tested and the open-source Radeon DRM driver behaves differently than how the fglrx driver bangs on the hardware for power management. Though this is a tiny bit ironic seeing as how the open-source Radeon driver stack has grown dependent upon AtomBIOS rather than the hard-coded approach originally taken by the RadeonHD driver that hated this abstraction layer and demanded the actual hardware specifications out of AMD.
The Bridgman response to Airlie was "Is it worth trying to match fglrx with the current code ? I don't think so (other than for r600 and below). Is it worth improving the current code enough to give a bunch of current users full use of the profile mechanism (and maybe a few options in between), particularly on middling-old hardwere ? I think so..." Plus a longer explanation.
Another worthwhile statement by Bridgman in this thread was then, "Other than power management, which was a whole lot simpler when we kicked this off back in 2007, I imagine they're pretty pleased with the features and performance. Launch-time support (buy new HW, install a recent distro, use the system) was a higher priority than features and performance. The common thread among the customers was that (a) they were building big compute farms with our CPUs, (b) they were running Linux on those farms, (c) they did most of their related SW development on Linux, and (d) they wanted in-box support for the systems used for SW development and related activities."
There's also a need for this power management support to be done dynamically rather than statically. If you ask many open-source Radeon driver users right now how to change the power management profile, they will likely have no clue, unless you're a frequent Phoronix reader and Linux enthusiast. From Airlie, "Hmm most the feedback I see if for dynamic PM not better static PM, static PM is crap, no use in laptops at all, we also want to be able to use the Fusion GPUs up to their package limits, like we can't upclock the APU because it'll overheat, so even if the BIOS has the table we can't use it because we have no decent thermal protection. We should be able to use AMD APU like we use Intel CPUs and we can't. There are so many problems we just can't solve and the ones we can solve there is little demand from people I talk to. static profiles might work for people on phoronix, but they don't save power for the 99% of people who install RHEL and never read the online docs (i.e. my customers)."
Airlie has become more upset with Bridgman, "If the users make it this far to ask then they are in the possibly 5% of users who care, I can't provide a proper OS on top of that, its crap. In servers we care a bit, but on laptops they need to the right thing without specific configuration, the shit we did 5 years ago doesn't cut it any more, and its not like this stuff is getting simpler, so doing the dumb thing for evermore is just pointless. By the time we get to doing smart stuff for any of the current GPUs they'll find another reason for blocking doing smart stuff for newer chips. This requires someone in AMD with some power to overrule the idiocy that is blocking PM code. You guys own the hardware, you guys are diong the same things as nvidia and intel, if you have some innvoation in r600-trinity that nobody knows about at this stage I'd be shocked and amazed that anyone gives a shit. Just release the code already, and arrange for anyone who tries to block it to find a new position where being an idiot is acceptable."
"There is a lot of room for improvement in the open source graphics drivers," was another statement by Bridgman.
Long story short, like the current performance levels between the open-source Radeon driver and Catalyst, the power management support too is also at a large indifference and mostly a mess. From last year you can see the current state of Radeon power management, but I'll have some new open-source vs. closed-source GPU driver tests in the coming weeks. The open-source AMD developers aspire to have better power management support within the Linux driver, but it's not an easy process and one that will take some time.
Continue the Radeon feature thread here or see the other interesting and vibrant discussions taking place within the Phoronix Forums.
Comments were quick to come in along the lines of:
"It's power management and performance I personally find lacking - in recent tests, the low-end cards were 2x-5x lower perf vs catalyst."
"I dream of the day that bridgman will anounce in the forums that they will release the PM or an article by Michael confirming this. :p"
Of course, John Bridgman of AMD was quick to jump in with commentary from his perspective.
Bridgman's initial reaction came down to the fact that AMD has already released some code and documentation concerning power management, but not many people are working on it within the community. In addition, recently there was the release of a new AMD ACPI header file and some improvements headed to the next Linux kernel release (the Linux 3.7 kernel).
In response to a reader question that asked Bridgman, "Am I supposed to read this as 'We reviewd the code and we can't release it???'," the Bridgman response was unfortunately "Correct." They have been working to release improved power management code, but it has yet to clear AMD's legal/technical review processes. The Canadian reiterated though, "Again, there's a lot of PM info out there now. This is just a couple of additional blocks. PM seems to be an exception to the rest of the driver stack -- everyone seems to want it but hardly anyone seems to be willing to work on it. For most of the other bits it seems that every N'th user is willing to roll up their sleeves and make the code better but it's not happening here."
Bridgman, in another post in the Phoronix Forums thread, has an open question of "If so many people want better power management, and if people are already tweaking the code on their own personal systems, why aren't we seeing improvements in the common code?"
In response to the usual bickering that AMD should just open-source the feature-rich Catalyst driver for Linux, which does have PowerPlay and proper power management support, "If we have trouble getting approval to release a specific block of programming info, do you think we would have an easier time releasing the same info mixed with 5 million lines of proprietary source code, particularly when that source code is written to work across multiple OSes and most of those OSes *require* robust DRM as part of the design?"
Additionally, Bridgman had to say, "The GPU business is *very* competitive, and small differences in performance & features drive many of the buying decisions. The cost of driver development is the primary entry barrier for new competitors. Why would an established vendor give away their competitive advantage?"
To some surprise, David Airlie of Red Hat then jumped into this thread to combat some statements made by Bridgman. "John can you stop spreading this BS, it really isn't possible to improve the current PM code to anywhere near the degree you think. The problem is the atom tables (for setting engine and memory clocks) aren't used or tested in this way by the fglrx driver, so they have no QE beyond the BIOS using them at startup to set the clocks. The time taken to execute the tables is longer than vblank on a lot of cards, and this would require writing per-card/memory attached specific tables to try and allow the reclock to run in under a vblank time limit. Really you guys know how the cards work, and fglrx works with them, anything else is pointless since its using functionality that hasn't been exercised or QAed. please stop making excuses. you could maybe improve r500 to the level of fglrx but r600 and upwards its a waste of time and it would require years of testing before we could enable it by default, since no other drivers have ever tested these codepaths."
Basically the AtomBIOS tables used in resetting the core and memory clocks for the graphics card isn't thoroughly tested and the open-source Radeon DRM driver behaves differently than how the fglrx driver bangs on the hardware for power management. Though this is a tiny bit ironic seeing as how the open-source Radeon driver stack has grown dependent upon AtomBIOS rather than the hard-coded approach originally taken by the RadeonHD driver that hated this abstraction layer and demanded the actual hardware specifications out of AMD.
The Bridgman response to Airlie was "Is it worth trying to match fglrx with the current code ? I don't think so (other than for r600 and below). Is it worth improving the current code enough to give a bunch of current users full use of the profile mechanism (and maybe a few options in between), particularly on middling-old hardwere ? I think so..." Plus a longer explanation.
Another worthwhile statement by Bridgman in this thread was then, "Other than power management, which was a whole lot simpler when we kicked this off back in 2007, I imagine they're pretty pleased with the features and performance. Launch-time support (buy new HW, install a recent distro, use the system) was a higher priority than features and performance. The common thread among the customers was that (a) they were building big compute farms with our CPUs, (b) they were running Linux on those farms, (c) they did most of their related SW development on Linux, and (d) they wanted in-box support for the systems used for SW development and related activities."
There's also a need for this power management support to be done dynamically rather than statically. If you ask many open-source Radeon driver users right now how to change the power management profile, they will likely have no clue, unless you're a frequent Phoronix reader and Linux enthusiast. From Airlie, "Hmm most the feedback I see if for dynamic PM not better static PM, static PM is crap, no use in laptops at all, we also want to be able to use the Fusion GPUs up to their package limits, like we can't upclock the APU because it'll overheat, so even if the BIOS has the table we can't use it because we have no decent thermal protection. We should be able to use AMD APU like we use Intel CPUs and we can't. There are so many problems we just can't solve and the ones we can solve there is little demand from people I talk to. static profiles might work for people on phoronix, but they don't save power for the 99% of people who install RHEL and never read the online docs (i.e. my customers)."
Airlie has become more upset with Bridgman, "If the users make it this far to ask then they are in the possibly 5% of users who care, I can't provide a proper OS on top of that, its crap. In servers we care a bit, but on laptops they need to the right thing without specific configuration, the shit we did 5 years ago doesn't cut it any more, and its not like this stuff is getting simpler, so doing the dumb thing for evermore is just pointless. By the time we get to doing smart stuff for any of the current GPUs they'll find another reason for blocking doing smart stuff for newer chips. This requires someone in AMD with some power to overrule the idiocy that is blocking PM code. You guys own the hardware, you guys are diong the same things as nvidia and intel, if you have some innvoation in r600-trinity that nobody knows about at this stage I'd be shocked and amazed that anyone gives a shit. Just release the code already, and arrange for anyone who tries to block it to find a new position where being an idiot is acceptable."
"There is a lot of room for improvement in the open source graphics drivers," was another statement by Bridgman.
Long story short, like the current performance levels between the open-source Radeon driver and Catalyst, the power management support too is also at a large indifference and mostly a mess. From last year you can see the current state of Radeon power management, but I'll have some new open-source vs. closed-source GPU driver tests in the coming weeks. The open-source AMD developers aspire to have better power management support within the Linux driver, but it's not an easy process and one that will take some time.
Continue the Radeon feature thread here or see the other interesting and vibrant discussions taking place within the Phoronix Forums.
62 Comments