Announcement

Collapse
No announcement yet.

Need help debugging possible graphics-related crash

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Need help debugging possible graphics-related crash

    Recently my system becomes more and more unstable.
    I more or less frequently update mesa from git and apart from my soundcard, my video driver is the only possible offender I can think of.

    Symptoms:
    Occasionally, the system just locks up hard. I get a black screen, the monitor turns off, no more hard disc or sound activity.
    Reboot is the only option.

    When does this happen:
    Randomly, but it seems to happen more often when I do more extended graphic commands, like xrandr or 3D.
    Just once it may have happened in Windows 7 as well, but I am not sure about that.

    Yesterday, I found a sure way to reproduce this error: I use a fairly recent wine and try to start Gothic 1.

    I tried to get a message by monitoring system messages via RS232 on my laptop, but once the hangup occurs, I do not get any messages.

    I would like to debug this further to make a proper bug report, but I need suggestions.
    The mesa git is just two days old, I compiled it with gallium enabled. I use kernel 2.6.36 on a debian system. My graphics card is a Radeon 4350.

    Thanks!

  • #2
    I upgraded to 2.6.38-rc6.
    It still crashes reliably, still without any messages.

    Comment


    • #3
      I will not install a new system because it would take over a week to do properly and apart from some crashes now and then my current Debian system works just fine.

      Also, if the crashes would stop that way, I could no longer provide a bug report. That way the bug (if there is one) would be corrected and not ignored.
      I was glad to be able to reproduce it reliably but sadly have no idea where that bug could be located or how to reproduce it under more clearly defined conditions.

      Comment


      • #4
        Originally posted by Dard View Post
        I upgraded to 2.6.38-rc6.
        It still crashes reliably, still without any messages.
        I'm on 2.6.38-rc6 and runs fine.
        I send some test to phoronix Global today.
        My rig is Core2DUO [email protected] 4gb DDR 1066 , and RV770(Ati HD4870 512mb old model) with Gallium 0.4 OGL 2.1 Mesa 7.11 devel

        No crash , no black screens.
        Only few games doe's not run(etqw and ut2004 and some Unigine tests)

        Comment


        • #5
          Most games run fine here, too.
          Sauerbraten once caused a crash, but not repeatable.
          Neverball, Chromium, Kobo Deluxe and several others ran too.
          Gothic 1 in Wine is the only one that crashes the system every time.

          Comment


          • #6
            I have made some progress, but not enough
            As it turned out, the computer wasn't completely dead once the crash appens.
            When connected from another computer by ssh, I got an NMI-error. So I enabled the NMI-Watchdog. Maybe it is the watchdog that keeps things running, I don't know.
            But now can still access the computer by ssh after crash. However, it is so slow that I can only get one command executed every 2 minutes. top shows varying tasks running at several hundred percent CPU time, among them the watchdogs.

            By now the crashes are quite frequent, if I can't debug this further, I have to try go get back to a stable old version.
            I have also another reliable crasher: Mincraft crashes within usually five minutes after start.

            Now that I know I can still access the computer after the crash, I could debug, but I don't know how.
            I can compile Mesa with debug options, but I don't know how to start it with gdb. After all, I only know how to start X by startx->xinit->X. And that isn't even Mesa.
            Besides that, I have no idea how to debug a live-lock like this.

            Any ideas?

            BTW, after seeing the NMI-message, I thought about a hardware problem, which seems to be the most likely explanation. But I don't think I have experienced a Windows crash so far.

            Comment


            • #7
              I am using Debian testing.
              I hoped this is a new enough system.
              xserver-xorg 1:7.5+8
              xserver-xorg-core 2:1.7.7-11

              Comment


              • #8
                Okay, thank you so far, but my system seems to be seriously screwed up.
                First I removed the git drivers as far as I could by manually removing them from /usr/local. (there wasn't a "make uninstall" option)
                Xorg seems to successfully fall back on the Debian drivers. Games like Chromium still work but with small glitches. And Xorg still crashes.

                Then I removed Section "DRI" in xorg.conf.
                To my surprise, the games still run as fast as if they had direct rendering, although glxinfo says no.
                And the crashes still happen.

                I have no way out, I guess I have to live with it for the time being.
                At least it has become unlikely that the current Mesa driver is responsible.

                Comment


                • #9
                  Okay, so I tried again.
                  I deleted my whole /usr/local directory. Finally all traces of the mesa drivers are gone and no longer loaded. My system is now rock solid, as far as I can see. But only 2D.
                  Apart from the crashes I fixed another bug this way that caused graphics glitches when I moved areas, for example when a tooltip is opened inside Firefox and I scrolled with the wheelmouse. I decided to investigate further, maybe it was my window manager.

                  My previous mesa installation was ages old, so I tried to reinstall the current mesa, this time in /usr/local/xorg, so that I can remove it quickly without destroying other applications I have installed in /usr/local.

                  My problem are probably the following messages in Xorg.0.log:
                  Code:
                  (EE) AIGLX error: dlopen of /usr/lib/dri/r600_dri.so failed (/usr/lib/dri/r600_dri.so: cannot open shared object file: No such file or directory)
                  (EE) AIGLX: reverting to software rendering
                  (II) AIGLX: Screen 0 is not DRI capable
                  (EE) AIGLX error: dlopen of /usr/lib/dri/swrast_dri.so failed (/usr/lib/dri/swrast_dri.so: cannot open shared object file: No such file or directory)
                  (EE) GLX: could not load software renderer
                  I tried to follow the instructions at http://www.x.org/wiki/radeonBuildHowTo.

                  These are basically my commands when I installed:
                  Code:
                  git clone git://anongit.freedesktop.org/xorg/driver/xf86-video-ati
                  git clone git://anongit.freedesktop.org/mesa/drm
                  git clone git://anongit.freedesktop.org/mesa/mesa
                  
                  cd drm/
                  ./autogen.sh --prefix=/usr/local/xorg
                  make -j 3 ; make install ; cd ..
                  
                  export PKG_CONFIG_PATH="/usr/local/xorg/lib/pkgconfig"
                  cd xf86-video-ati/
                  ./autogen.sh --prefix=/usr/local/xorg
                  make -j 3 ; make install ; cd ..
                  
                  cd mesa/
                  ./autogen.sh --prefix=/usr/local/xorg --enable-gallium-r600 --with-dri-driverdir=/usr/local/xorg/lib/dri/
                  make -j 3 ; make install ; cd ..
                  Then I added the line "/usr/local/xorg/lib" to /etc/ld.so.conf and executed "ldconfig".
                  Then I added "/usr/local/xorg/lib/modules" to the ModulePath entry in xorg.conf, although this seems useless since there isn't a modules folder.
                  I set this variable:
                  LIBGL_DRIVERS_PATH=/usr/local/xorg/lib/dri/
                  In xorg.conf I still have this section:
                  Code:
                  Section "DRI"
                      Group      "dard"
                      Mode       0666
                  EndSection
                  I thought this is everything I need, but glxinfo gives me:
                  Code:
                  name of display: :0.0
                  Error: couldn't find RGB GLX visual or fbconfig
                  What am I missing?

                  Comment

                  Working...
                  X