Announcement

Collapse
No announcement yet.

Crash hunting in Radeon KMS

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Crash hunting in Radeon KMS

    Hi,

    Me and some of my fellow Archers, are plagued by crashes when running latest mesa,libdrm,xf86-video, and kernel GIT while using radeon KMS. UMS is fine. It causes crashes (not kernel "oops") of the video driver ("no signal" on the monitor) and makes the machine unresponsive to keyboard or SSH.

    We think that we've found a way to reproduce it in KDE SC 4.4:
    http://bugzilla.kernel.org/show_bug.cgi?id=15276#c19

    Could you please take a look at the bug report and see if you can reproduce the crash? Please state your kernel/mesa/libdrm version and card.

    Please help us find the bug responsible for the crashes.

    Cheers

  • #2
    So I'm not the only one!
    I have lot of crashes too (latest mesa, libdrm, xf86-video-ati, drm-radeon-testing).
    ## VGA ##
    AMD: X1950XTX, HD3870, HD5870
    Intel: GMA45, HD3000 (Core i5 2500K)

    Comment


    • #3
      HD3870 here.
      ## VGA ##
      AMD: X1950XTX, HD3870, HD5870
      Intel: GMA45, HD3000 (Core i5 2500K)

      Comment


      • #4
        Anyone else experiences crashes in KMS when there is no video signal, the same audio plays over and over again (as if from a small buffer), there are no blinking leds (not a kernel oops) and the machine is unresponsive?

        Please, the more information we can gather in the bug report the better. It has already been assigned to the devs.

        Comment


        • #5
          Originally posted by Neuro View Post
          Anyone else experiences crashes in KMS when there is no video signal, the same audio plays over and over again (as if from a small buffer), there are no blinking leds (not a kernel oops) and the machine is unresponsive?

          Please, the more information we can gather in the bug report the better. It has already been assigned to the devs.
          Yes, I have a thread on UbuntuForum about that freeze... ATI Radeon HD3650 512M... Nothing useful I did not find in logs that I've looked in...

          Comment


          • #6
            Have you tried connecting a serial console or a net console?

            Originally posted by Neuro View Post
            Please, the more information we can gather in the bug report the better. It has already been assigned to the devs.
            Assuming that you have a second machine available to host the console, of course. But that way you'd be able to read any dmesg information, even when the local console stops responding.

            Comment


            • #7
              Cross-posted from the Archlinux BBS:

              A question to you guys with mysteriously crashing systems: are you all running a recent build of KDE/KWin?

              Comment


              • #8
                Originally posted by korpenkraxar View Post
                Cross-posted from the Archlinux BBS:

                A question to you guys with mysteriously crashing systems: are you all running a recent build of KDE/KWin?
                It seems that most of us are running KDE 4.4 SC.

                Comment


                • #9
                  Same here, looks like that i am not the only one (Mobility X1400). I tried to upgrade to 2.6.33-rc7 but got the same crashes (btw i couldnt reproduce with the test case above). I get system lockup if the radeon is compiled as a module and a kernel panic (blinking leds) if the module is integrated into the kernel.

                  The only good thing is that in one of there crashes dont affected the whole system and i managed to get a backtrace:

                  Code:
                  Feb 12 11:53:36 codemobile kernel: [50278.131689] ------------[ cut here ]------------
                  Feb 12 11:53:36 codemobile kernel: [50278.131716] WARNING: at /usr/src/linux-2.6.32-gentoo-r4/lib/kref.c:43 kref_get+0x20/0x30()
                  Feb 12 11:53:36 codemobile kernel: [50278.131721] Hardware name: MM061
                  Feb 12 11:53:36 codemobile kernel: [50278.131725] Modules linked in: rfcomm sco bnep l2cap aufs squashfs i8k vboxnetadp vboxnetflt vboxdrv loop radeon btusb ttm b44 drm_kms_helper b43 mac80211 led_class bluetooth ssb cfbcopyarea cfbimgblt intel_agp rtc_cmos cfbfillrect
                  Feb 12 11:53:36 codemobile kernel: [50278.131788] Pid: 5, comm: events/0 Not tainted 2.6.32-gentoo-r4-bfs313 #4
                  Feb 12 11:53:36 codemobile kernel: [50278.131792] Call Trace:
                  Feb 12 11:53:36 codemobile kernel: [50278.131813]  [<ffffffff81040323>] ? warn_slowpath_common+0x73/0xb0
                  Feb 12 11:53:36 codemobile kernel: [50278.131820]  [<ffffffff8121ec20>] ? kref_get+0x20/0x30
                  Feb 12 11:53:36 codemobile kernel: [50278.131843]  [<ffffffffa00c9633>] ? ttm_bo_delayed_delete+0x73/0x1b0 [ttm]
                  Feb 12 11:53:36 codemobile kernel: [50278.131852]  [<ffffffffa00c9770>] ? ttm_bo_delayed_workqueue+0x0/0x30 [ttm]
                  Feb 12 11:53:36 codemobile kernel: [50278.131878]  [<ffffffffa00c9782>] ? ttm_bo_delayed_workqueue+0x12/0x30 [ttm]
                  Feb 12 11:53:36 codemobile kernel: [50278.131885]  [<ffffffff81057be5>] ? worker_thread+0x195/0x310
                  Feb 12 11:53:36 codemobile kernel: [50278.131904]  [<ffffffff8105c620>] ? autoremove_wake_function+0x0/0x30
                  Feb 12 11:53:36 codemobile kernel: [50278.131910]  [<ffffffff81057a50>] ? worker_thread+0x0/0x310
                  Feb 12 11:53:36 codemobile kernel: [50278.131916]  [<ffffffff8105c26e>] ? kthread+0x8e/0xa0
                  Feb 12 11:53:36 codemobile kernel: [50278.131935]  [<ffffffff8103981c>] ? schedule_tail+0x3c/0xe0
                  Feb 12 11:53:36 codemobile kernel: [50278.131942]  [<ffffffff8100ceea>] ? child_rip+0xa/0x20
                  Feb 12 11:53:36 codemobile kernel: [50278.131948]  [<ffffffff8105c1e0>] ? kthread+0x0/0xa0
                  Feb 12 11:53:36 codemobile kernel: [50278.131964]  [<ffffffff8100cee0>] ? child_rip+0x0/0x20
                  Feb 12 11:53:36 codemobile kernel: [50278.131968] ---[ end trace b06f625ae300867a ]---
                  Feb 12 11:53:55 codemobile kernel: [50297.131806] CPU 0
                  Feb 12 11:53:55 codemobile kernel: [50297.131829] Modules linked in: rfcomm sco bnep l2cap aufs squashfs i8k vboxnetadp vboxnetflt vboxdrv loop radeon btusb ttm b44 drm_kms_helper b$
                  Feb 12 11:53:55 codemobile kernel: [50297.131829] Modules linked in: rfcomm sco bnep l2cap aufs squashfs i8k vboxnetadp vboxnetflt vboxdrv loop radeon btusb ttm b44 drm_kms_helper b$
                  Feb 12 11:53:55 codemobile kernel: [50297.131926] Pid: 5, comm: events/0 Tainted: G        W  2.6.32-gentoo-r4-bfs313 #4 MM061
                  Feb 12 11:53:55 codemobile kernel: [50297.131953] RIP: 0010:[<ffffffffa00c8412>]  [<ffffffffa00c8412>] ttm_bo_release_list+0xc2/0xd0 [ttm]
                  Feb 12 11:53:55 codemobile kernel: [50297.131990] RSP: 0000:ffff88007f883d90  EFLAGS: 00010202
                  Feb 12 11:53:55 codemobile kernel: [50297.132014] RAX: 0000000000000002 RBX: ffff88003a159e00 RCX: ffff880069970e00
                  Feb 12 11:53:55 codemobile kernel: [50297.132030] RDX: 0000000000000001 RSI: ffffffffa00c8350 RDI: ffff88003a159e44
                  Feb 12 11:53:55 codemobile kernel: [50297.132055] RBP: ffff88003a159e44 R08: ffff88007f882000 R09: 0000000000000000
                  Feb 12 11:53:55 codemobile kernel: [50297.132080] R10: 000000000000c940 R11: 00000000ffffffff R12: ffff88007c963400
                  Feb 12 11:53:55 codemobile kernel: [50297.132095] R13: ffff88003a159e44 R14: ffff88003a159e00 R15: ffff88003a159e44
                  Feb 12 11:53:55 codemobile kernel: [50297.132121] FS:  0000000000000000(0000) GS:ffff880001a00000(0000) knlGS:0000000000000000
                  Feb 12 11:53:55 codemobile kernel: [50297.132147] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
                  Feb 12 11:53:55 codemobile kernel: [50297.132161] CR2: 00007f9a32c5b000 CR3: 0000000069a6c000 CR4: 00000000000006f0
                  Feb 12 11:53:55 codemobile kernel: [50297.132186] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
                  Feb 12 11:53:55 codemobile kernel: [50297.132211] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
                  Feb 12 11:53:55 codemobile kernel: [50297.132237] Process events/0 (pid: 5, threadinfo ffff88007f882000, task ffff88007f841ae0)
                  Feb 12 11:53:55 codemobile kernel: [50297.132274]  ffff88003a159e44 ffffffffa00c8350 ffff88007c963868 ffffffff8121ebc3
                  Feb 12 11:53:55 codemobile kernel: [50297.132286] <0> ffff88003a159e44 ffff88003a159eb8 ffff88007d611c88 ffffffffa00c966d
                  Feb 12 11:53:55 codemobile kernel: [50297.132316] <0> ffff88007f841cf8 ffff880000000000 000000007f882000 ffff88007c963400
                  Feb 12 11:53:55 codemobile kernel: [50297.132383]  [<ffffffffa00c8350>] ? ttm_bo_release_list+0x0/0xd0 [ttm]
                  Feb 12 11:53:55 codemobile kernel: [50297.132414]  [<ffffffff8121ebc3>] ? kref_put+0x33/0x70
                  Feb 12 11:53:55 codemobile kernel: [50297.132442]  [<ffffffffa00c966d>] ? ttm_bo_delayed_delete+0xad/0x1b0 [ttm]
                  Feb 12 11:53:55 codemobile kernel: [50297.132471]  [<ffffffffa00c9770>] ? ttm_bo_delayed_workqueue+0x0/0x30 [ttm]
                  Feb 12 11:53:55 codemobile kernel: [50297.132501]  [<ffffffffa00c9782>] ? ttm_bo_delayed_workqueue+0x12/0x30 [ttm]
                  Feb 12 11:53:55 codemobile kernel: [50297.132530]  [<ffffffff81057be5>] ? worker_thread+0x195/0x310
                  Feb 12 11:53:55 codemobile kernel: [50297.132557]  [<ffffffff8105c620>] ? autoremove_wake_function+0x0/0x30
                  Feb 12 11:53:55 codemobile kernel: [50297.132573]  [<ffffffff81057a50>] ? worker_thread+0x0/0x310
                  Feb 12 11:53:55 codemobile kernel: [50297.132599]  [<ffffffff8105c26e>] ? kthread+0x8e/0xa0
                  Feb 12 11:53:55 codemobile kernel: [50297.132627]  [<ffffffff8103981c>] ? schedule_tail+0x3c/0xe0
                  Feb 12 11:53:55 codemobile kernel: [50297.132655]  [<ffffffff8100ceea>] ? child_rip+0xa/0x20
                  Feb 12 11:53:55 codemobile kernel: [50297.132671]  [<ffffffff8105c1e0>] ? kthread+0x0/0xa0
                  Feb 12 11:53:55 codemobile kernel: [50297.132696]  [<ffffffff8100cee0>] ? child_rip+0x0/0x20
                  Feb 12 11:53:55 codemobile kernel: [50297.132889]  RSP <ffff88007f883d90>
                  Feb 12 11:53:55 codemobile kernel: [50297.132914] ---[ end trace b06f625ae300867b ]---
                  Feb 12 11:55:22 codemobile kernel: [50384.032617] ------------[ cut here ]------------
                  Feb 12 11:55:22 codemobile kernel: [50384.032657] WARNING: at /usr/src/linux-2.6.32-gentoo-r4/lib/kref.c:43 kref_get+0x20/0x30()
                  Feb 12 11:55:22 codemobile kernel: [50384.032682] Hardware name: MM061
                  Feb 12 11:55:22 codemobile kernel: [50384.032706] Modules linked in: rfcomm sco bnep l2cap aufs squashfs i8k vboxnetadp vboxnetflt vboxdrv loop radeon btusb ttm b44 drm_kms_helper b$
                  Feb 12 11:55:22 codemobile kernel: [50384.032807] Pid: 2263, comm: X Tainted: G      D W  2.6.32-gentoo-r4-bfs313 #4
                  Feb 12 11:55:22 codemobile kernel: [50384.032832] Call Trace:
                  Feb 12 11:55:22 codemobile kernel: [50384.032849]  [<ffffffff81040323>] ? warn_slowpath_common+0x73/0xb0
                  Feb 12 11:55:22 codemobile kernel: [50384.032877]  [<ffffffff8121ec20>] ? kref_get+0x20/0x30
                  Feb 12 11:55:22 codemobile kernel: [50384.032908]  [<ffffffffa00c823d>] ? ttm_bo_unreserve+0x8d/0x100 [ttm]
                  Feb 12 11:55:22 codemobile kernel: [50384.032961]  [<ffffffffa0109063>] ? radeon_object_list_unreserve+0x33/0x50 [radeon]
                  Feb 12 11:55:22 codemobile kernel: [50384.033002]  [<ffffffffa0117c69>] ? radeon_cs_parser_fini+0x109/0x110 [radeon]
                  Feb 12 11:55:22 codemobile kernel: [50384.033053]  [<ffffffffa011842a>] ? radeon_cs_ioctl+0x11a/0x1e0 [radeon]
                  Feb 12 11:55:22 codemobile kernel: [50384.033072]  [<ffffffff812b580a>] ? drm_ioctl+0x18a/0x3b0
                  Feb 12 11:55:22 codemobile kernel: [50384.033122]  [<ffffffffa0118310>] ? radeon_cs_ioctl+0x0/0x1e0 [radeon]
                  Feb 12 11:55:22 codemobile kernel: [50384.033152]  [<ffffffff810d7662>] ? do_sync_read+0xe2/0x120
                  Feb 12 11:55:22 codemobile kernel: [50384.033179]  [<ffffffff810e6ac2>] ? vfs_ioctl+0x82/0xb0
                  Feb 12 11:55:22 codemobile kernel: [50384.033195]  [<ffffffff810e6c18>] ? do_vfs_ioctl+0x88/0x570
                  Feb 12 11:55:22 codemobile kernel: [50384.033222]  [<ffffffff81044973>] ? do_setitimer+0x1c3/0x240
                  Feb 12 11:55:22 codemobile kernel: [50384.033248]  [<ffffffff810e7149>] ? sys_ioctl+0x49/0x80
                  Feb 12 11:55:22 codemobile kernel: [50384.033264]  [<ffffffff8100bfab>] ? system_call_fastpath+0x16/0x1b
                  Feb 12 11:55:22 codemobile kernel: [50384.033289] ---[ end trace b06f625ae300867c ]---
                  Feb 12 11:56:37 codemobile kernel: [50458.938300] SysRq : Keyboard mode set to system default
                  Feb 12 11:56:39 codemobile kernel: [50461.262824] SysRq : Terminate All Tasks
                  I dont know how to try to debug this properly because is very hard to reproduce on my laptop (sometimes i work 20+ hours and nothing happens, and other times i got the freeze/panic in 2 hours or less).

                  Comment


                  • #10
                    codestation, nice one with the trace

                    Are you getting that under KDE 4.4? Are you loosing monitor signal? It's a bit awkward because I ceased to get kernel logs in syslog after I moved to kernel 2.6.33- in february.

                    Cheers,
                    Michal

                    Comment

                    Working...
                    X