If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.
With any bit of luck, "that part was done by the guy who left five years ago"
Well, the code is still supposed to be there. Unless it's obfuscated. Or a needle in a haystack. Or a "why is this here to begin with" type of code. All of the cases being a result of poor code commenting/management.
Wait, so if AMD doesn't know what's wrong there, then how does fglrx manage to function right? There must have been some code in it that manages those cases...
I explain in my blog post why there might very well be no errata or special code in the closed source driver for the issue i am hitting. So to paraphrase myself, fglrx is doing things differently than the open source driver and this is the out come of many things :
- intentional to work around some issue
- out come of driver stack architecture
- human that wrote the code, wrote that register A before register B with no special intention to do so
...
In the end due to those difference the closed source driver might never trigger the issue the open source driver is facing. This is a possibility. If that the case no one in AMD/ATI ever faced the hang we are having and thus no errata for it was ever written.
So, no, there is not necesarily any code in fglrx that work around the issue we are having. Furthermore as i said, if there is such code, it might very well not be documented as a work around or errata, it might be some engineer fixed it and forgot about filling errata for all the reason human forgot about things.
Well, the code is still supposed to be there. Unless it's obfuscated. Or a needle in a haystack. Or a "why is this here to begin with" type of code. All of the cases being a result of poor code commenting/management.
Have to disagree with you there. If we were talking about "do this thing and HyperZ works" code then I would agree, but that's not the case here. It's more like "if you do *everything* (ie all the operations related to running a GL app, not just setting up a single HW block) in exactly the same sequence as these 30 million lines of code then it works fine... and as Jerome said doing things in exactly that sequence pretty much means rewriting the open source driver from scratch.
If it was a repeatable failure in a specific section of code then it would be easier to isolate a code sequence and experiment with it (eg try a sequence half way between the one that works and the one that doesn't repeat until light comes on), but intermittent failures in a highly pipelined system make it hard to do anything but "change something in the driver code, run the driver for a while, hey maybe it's working, oh crap it's not, reboot, rinse, repeat".
I'm impressed that everyone stuck with it as long as they did before parking the code for and working on something that might give some instant gratification for a change
Or maybe the planets were in the right order when AMD wrote catalyst command stream emition code, and didn't revealed the bug. Unfortunately Mesa developers weren't that lucky
Reading comment here, i obviously failed at making my point, AMD contributed a lot of engineering time helping me on hyperz. My point was that it's stupid to blame AMD for this because they might very well just don't have the information we need, or if they have this information they might ignore they have it because some one forgot to document it.
It's just life, maybe i should do an hyperz bumper sticker for my car ...
Or maybe the planets were in the right order when AMD wrote catalyst command stream emition code, and didn't revealed the bug. Unfortunately Mesa developers weren't that lucky
That's probably a fair statement.
There are all kinds of advantages that come from designing the HW and SW together, but the downside is that you end up with some unspoken, undocumented assumptions.
Typical example -- during early design HW dev explains a particular function to SW dev on a whiteboard, explanation shows the HW getting inputs in a certain sequence. SW dev now thinks about the HW that way. When it comes time to write the code, they organize operations in the same sequence the HW dev talked about a few months earlier... not because it's the only way to do it, not because it's even the "official" way to do it, but it's what was in the HW dev's head at the time they designed the block. SW dev writes the code, it works and gets integrated into the rest of the driver stack, everyone is happy.
The only real solution for that is to write open source driver software at the same time as proprietary driver software, run it on the same simulators & emulators, and be able to talk to the HW folks about odd issues like this while everything is still fresh in their heads AND we can try to reproduce on the emulators to see what is happening inside the chip.
I know people complained a lot about us working on support for new hardware when features were still missing from older hardware, but realistically the only way for this to work is if open source drivers are written and tested alongside the rest of the engineering effort. Getting there took 5 years of really hard work from the developers (basically supporting 10 years of hardware in 5 years) and I'll take a bit of credit for getting open source drivers at least partially integrated into the top level planning efforts, but I'm hoping you'll see the benefits and understand why we did this in the future.
We were a bit too late for this to give much benefit for SI since the HW focus had already moved to the next gen even though we started months before SI launch, but (crosses fingers) hopefully that will be the last time.
Comment