Announcement

**profoundWHALE** · 06 November 2014, 12:02 PM

Originally posted by zanny View Post

I have a 7870, I've used 6000 series cards, and I've built and run 3 r600 / radoensi APUs. The only time radeonsi ever crashes is in extremely new games, and very rarely, but example I got one crash in ~80 hours of borderlands 2 and it was due to a firmware bug on the card that got fixed a week later.

Catalyst, on the other hand, is lucky to last an afternoon.

Can confirm, I played borderlands for about an hour and it crashed on me with fglrx. I'll be trying it on RadeonSI tonight, but every game I've tried with RadeonSI runs much smoother and is more responsive than fglrx.

**darkbasic** · 06 November 2014, 12:42 PM

Originally posted by chrisb View Post

Compile your packages with debugging symbols. You posted a backtrace without debugging symbols, which is almost useless. A backtrace is much more useful with line numbers.

Those backtraces are useless anyway becuase 90% of times they are caused by the lockup recover which tries to reset the GPU.

Originally posted by chrisb View Post

Come up with a reliable test. It is very good if this step can be automated, like running the Phoronix Test Suite or some other benchmarks. But you need to have an actual test that is reliable, even if that test is "use as my desktop for 7 days without hanging". Automation or overnight runs are preferable - can you run PTS or some demo mode overnight to reproduce? Can you loop mplayer playing fullscreen video files overnight? There must be some way of testing, for some period of time, that causes the fault to appear if it is present.

I will repeat it again: I didn't manage to find a reproducible pattern. It doesn't matter how long we tried, nobody managed.

Originally posted by chrisb View Post

Test methodically. If it is not clear where the bug lies, then make a test matrix, covering combinations of versions of different packages, and then test one after the other. Yes, it takes time, but sometimes it is the only way to figure out where the fault lies.

The bug is in the kernel. Period.

Originally posted by chrisb View Post

Remember that each system is unique due to different hardware, cpu, gpu, mainboard, software, compiler, user usage patterns etc. If you often hit a bug, you might be the only person who can repeatedly reproduce that bug on their system. If one of the developers could reliably reproduce the issue, it would probably be fixed already.

BULLSHIT. More than one time I already offered remote access and I'm even willing to send my GPU to a developer for a couple of weeks to fix this nasty thing. Any more excuses?

Originally posted by chrisb View Post

It helps to have a remote machine logged in via ssh (or even better, a serial port console) so you can get logs when the gui hangs. See Reporting GPU lockup Bugs and Backtrace with gdb/ Debugging Hangs / Freezes / Lockups for how to gather information about the crash by either logs or attaching to the hung process with gdb and getting a backtrace.

Originally posted by chrisb View Post

If the kernel is crashing and non-responsive, try using kdump or crashdump to capture the crash details - see Ubuntu Kernel Crash Dump and How to use kdump to debug kernel crashes

I always debug with my laptop attached and I even have a serial console, but if a developer doesn't tell you what to do it's just a waste of time. I managed to find a 100% reproducible pattern and Christian helped me telling which logs he needed, which patches to apply to narrow it down etc. It may or not may be related to the "random radeonsi crashes" but hey, it's a 100% reproducible lockup and still no one except Christian seems to be interested

Hey AMD: I do have a 100% reliable 3D engine lockup, is someone interested before selling my card to buy a GTX 970?

Originally posted by chrisb View Post

Don't forget that instability can also be caused by hardware incompatibilities, overheating, PSU glitches, etc. It might be worth swapping out your graphics card for another that is known to have good support, and running that for a while to see if it is stable. Put the "bad" card in to another PC and see if you can still reproduce the problem etc.

Hardware incompatibilities which I don't have with kernels <=3.14 or with fglrx

In the meantime I even changed PSU and it's not an overheating problem because it happens even after a few minutes I started the pc.

Originally posted by chrisb View Post

Patience is a virtue. Professional testers can spend months identifying, characterising, and creating test cases to reproduce a single bug. I once tracked down an intermittent bug that occurred, on average, once every six weeks... try to look upon it as a fun project where you will get to do some cool stuff and learn some new things, rather than the laborious repetitive task it actually is

I am patient since 6 months. HALF AN YEAR. Tired to be patient, time to buy a GTX 970.

**pal666** · 06 November 2014, 01:29 PM

Originally posted by TheAaronB123 View Post

Hard to bisect something that takes 5 minutes to 3 days to arise

no, it is easy. it is the same number of steps regardless. if bug doesn't occur for long time then you are happy because you have no crashes. if it occurs sooner, then you are happy because you are one step closer to end.
so stop complaining and bisect

**asdfblah** · 06 November 2014, 01:36 PM

Originally posted by darkbasic View Post

I will repeat it again: I didn't manage to find a reproducible pattern. It doesn't matter how long we tried, nobody managed.
...
I always debug with my laptop attached and I even have a serial console, but if a developer doesn't tell you what to do it's just a waste of time. I managed to find a 100% reproducible pattern and Christian helped me telling which logs he needed, which patches to apply to narrow it down etc. It may or not may be related to the "random radeonsi crashes" but hey, it's a 100% reproducible lockup and still no one except Christian seems to be interested

Hey AMD: I do have a 100% reliable 3D engine lockup, is someone interested before selling my card to buy a GTX 970?

???

Anyway... as I posted in other thread, this is the kind of things the radeon devs have to deal with: https://bugs.freedesktop.org/show_bug.cgi?id=60389#c28 https://bugs.freedesktop.org/show_bug.cgi?id=60389#c64
Do you think they have an easy job? They are a small group, and have managed to make an awesome driver... I doubt they don't want to see their own product improved. It's just that the hardware seems particularly picky, and they don't really have a lot of time or resources...

**asdfblah** · 06 November 2014, 01:42 PM

Oh, I forgot I came to say something in this thread...
Please, LunarG guys, work in the radeon driver :P

**chrisb** · 06 November 2014, 02:00 PM

Originally posted by darkbasic View Post

Those backtraces are useless anyway becuase 90% of times they are caused by the lockup recover which tries to reset the GPU.

Backtraces are usually useful - you won't know if your backtrace is useless or not until you get it.

Originally posted by darkbasic View Post

I will repeat it again: I didn't manage to find a reproducible pattern. It doesn't matter how long we tried, nobody managed.

Originally posted by darkbasic View Post

I managed to find a 100% reproducible pattern and Christian helped me telling which logs he needed, which patches to apply to narrow it down etc. It may or not may be related to the "random radeonsi crashes" but hey, it's a 100% reproducible lockup and still no one except Christian seems to be interested

Hey AMD: I do have a 100% reliable 3D engine lockup, is someone interested before selling my card to buy a GTX 970?

Are you talking about two separate issues here? You say that nobody managed to find a reproducible pattern, but then say that you managed to find a 100% reproducible pattern? Anyway, when I was talking about testing I didn't mean that your test must necessarily be something short and easy - your test could be manually using the system for a week - whatever it takes to discriminate between a working system and a failed system.

Originally posted by darkbasic View Post

The bug is in the kernel. Period.

Then bisect the kernel. Even if it takes, say, 5 days of manual use to test each kernel, you could find the bad commit in 30-40 days.

Originally posted by darkbasic View Post

BULLSHIT. More than one time I already offered remote access and I'm even willing to send my GPU to a developer for a couple of weeks to fix this nasty thing. Any more excuses?

And how do you think that having remote access to a system, without a way to reproduce the problem, is going to help? You would probably need to send the whole system, including the software you are running, and have a way for the developer to reproduce the bug. If you are willing to do that, then the developers might be more interested. But right now your bug report is basically, "I am running random bits of software that I compiled off the internet, and my computer hangs every so often, and I have no idea why, and have no crash logs or backtraces or bisect results or reproducible test case, or anything else for you to look at". What would you do if you were a developer reading a bug report like that?

Originally posted by darkbasic View Post

I always debug with my laptop attached and I even have a serial console, but if a developer doesn't tell you what to do it's just a waste of time.

You have been given lots of advice by several people. What more do you need? Compile with debug symbols, get a proper backtrace, kernel crash log, bisect the kernel, etc. If you can't be bothered to do any of that, then that is fine, but don't blame the developers when your bug report is insufficient to identify the problem.

Originally posted by darkbasic View Post

I am patient since 6 months. HALF AN YEAR. Tired to be patient, time to buy a GTX 970.

I suggest you do that and move on with your life.

**TheAaronB123** · 06 November 2014, 02:16 PM

Originally posted by pal666 View Post

no, it is easy. it is the same number of steps regardless. if bug doesn't occur for long time then you are happy because you have no crashes. if it occurs sooner, then you are happy because you are one step closer to end.
so stop complaining and bisect

You are wrong, that is all I'm going to leave it at. I've bisected a few issues for them related to my card, I'm not exactly new, I even had a automated script to forward and set up bisects and installs. When a commit isn't bad because of one commit specifically in one software but maybe multiple like the kernel and such, you can't bisect from one and get a good result, period. YOu narrow it down by jumping versions, but you also have to know which mesa is good. If the kernel is bad, they're ALL bad! Does it make sense, yet?

Originally posted by darkbasic View Post

Hey AMD: I do have a 100% reliable 3D engine lockup, is someone interested before selling my card to buy a GTX 970?

I also did just that, My GTX 970-2978 Superclocked will be in the mail in 2 days, and this AMD possessed card will be gone.

Please, buy my card, bisect it, then come back and tell me I'm an idiot, I promise you I'm not. I mean seriously, what a joke it is to call people out on this. Also to not, I've had about 20 crashes the last 4 days, about 5 a day, for no reason. I'm on about 45 minutes stable right now, but we'll see how long it takes to crash. The 3.18 kernel also affect this crash: It used to be able to GPU Reset into usability for another few seconds/minutes and let you prepare/reboot, but now it doesn't even do that.

Also, an R9 270X is a 7850, basically. Since everyone with a 77XX series is stable, maybe...you don't have the instability because your card doesn't have the same hardware? Because it doesn't. So good for you you picked an old enough card to be stable. New hardware still is unstable.

**chrisb** · 06 November 2014, 04:50 PM

Originally posted by TheAaronB123 View Post

You are wrong, that is all I'm going to leave it at. I've bisected a few issues for them related to my card, I'm not exactly new, I even had a automated script to forward and set up bisects and installs. When a commit isn't bad because of one commit specifically in one software but maybe multiple like the kernel and such, you can't bisect from one and get a good result, period. YOu narrow it down by jumping versions, but you also have to know which mesa is good. If the kernel is bad, they're ALL bad! Does it make sense, yet?

What you describe is the problem of sub-project dependencies in testing. It is possible to bisect these kind of problems, but you have to plan a bit more since git doesn't do it automatically (even for git submodules). Remember that git-bisect is just a tool that does a checkout by choosing the midpoint between known good and known bad. You can do this yourself by creating a test matrix of your sub-projects, and if versions of one depend on particular versions of another, then you can incorporate this in to your git bisect script by having the script check out and build the particular dependency of the project for each tested version.

Obviously this is a bit harder to set up than a git-bisect on a single project, but once you get the idea it isn't that difficult. At the end of the testing, there will be some set of commits, across one or more projects, where the good/bad transition happened (considering all projects as a whole).

I'm not saying that you should do this testing, or remarking on whether or not AMD should be doing more testing, merely pointing out that testing across multiple projects with version dependencies is possible.

**darkbasic** · 06 November 2014, 09:19 PM

Originally posted by chrisb View Post

Are you talking about two separate issues here?

I can not say for sure but yes, I bet they are two separate issues.

Sorry for not answering to all the other points but you should deal with a completely unusable system for six months which such an impossible bug to track down to able to understand how frustrating it can be.
I have not that much time to spend on my PC and usually when I use it it's because I have to work with it. It means I need stable drivers (Catalyst), so no way to do a 3 month bisect.

**V10lator** · 07 November 2014, 04:14 AM

Originally posted by grndzro View Post

AMD has only FGLRX now. They are merging FGLRX and opensource into a binary driver. Firepro driver are pretty much FGLRX.

Not correct:
On the kernel side AMD has FGLRX, FGLRX Legacy and radeon for now.
On the userspace side AMD has FGLRX, FGLRX Legacy, r600g and radeonSI for now.
Planned is for kernel side a new driver called amdgpu and for userspace side a new FGLRX (will we have FGLRX Legacy v1, FGLRX Legacy v2 and FGLRX then?) and a new OS driver.

That will be: On kernel space 4 drivers, on userspace side 5 drivers. I'm not sure if we should talk about 5 or 4 drivers now, are we talking about user- or kernelspace?

Announcement

Major Performance Breakthrough Discovered For Intel's Mesa Driver

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment