Announcement

Collapse
No announcement yet.

More... freaking... hangs.

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • More... freaking... hangs.

    Come on, seriously. It was so stable and smooth, until I upgraded to 19.1.0. From that point onward, using VA-API for encoding using my card and a custom program (source here: https://github.com/tildearrow/darmstadt) while doing something may cause the card to hang. Yes, I said that. Hang.

    Back in 2017 this was MUCH more severe, hanging even if you tried to render something, which is why I began using AMDGPU-PRO (it even hung there!!! but just once, but still enough to mention). Then after some point in 2018 it became so stable that it wouldn't ever hang again except that a later upgrade did clog the card at times but then a kernel upgrade fixed it.

    However, now in 2019, I upgraded to Mesa 19.1, and saw myself with a surprise. My card hung again. The freaking same hang that clogs up the X server and you can't even close it because it's in D status, so your only solution is to reboot. Even Windows has a proper card reset system, which I appreciate.

    Steps to reproduce:
    1. Run sudo ./darmstadt /dev/null (yes, I will provide the source to my program if needed although see bottom)
    1. OR (MAY ALSO REPRODUCE): Run sudo env LIBVA_DRIVER_NAME=radeonsi ffmpeg -f kmsgrab -device /dev/dri/card0 -framerate 60 -i - -vf 'hwmap=derive_device=vaapi,scale_vaapi=format=nv12 ' -c:v hevc_vaapi -rc_init_occupancy 512M -bf 0 -qp 24 -f matroska -y /dev/null
    2. Wait anywhere between 5 seconds and 10 hours while rendering something (I'm pretty sure even glxgears will help reproducing). Yeah, I know it's too long but that's how long it can take to reproduce.
    3. Done. The tachometer should turn on completely (all 7-8 LEDs (sorry I'm not sure how many LEDs does the card have but I know it turns on)) and the X session should hang.

    System specs: (The card, kernel version, Mesa version, LLVM version and DRM version are most important):

    Code:
    --INXI--
    System:    Host: linux Kernel: 5.0.2-arch1-1-ARCH x86_64 bits: 64 Desktop: KDE Plasma 5.16.2  
              Distro: Arch Linux  
    Machine:   Type: Desktop Mobo: MSI model: Z170A GAMING PRO (MS-7984) v: 1.0 serial: <root required>  
              UEFI: American Megatrends v: 1.D0 date: 12/22/2016  
    CPU:       Quad Core: Intel Core i7-6700K type: MT MCP speed: 4035 MHz min/max: 800/4001 MHz  
    Graphics:  Device-1: Intel HD Graphics 530 driver: i915 v: kernel  
              Device-2: Advanced Micro Devices [AMD/ATI] Vega 10 XTX [Radeon Vega Frontier Edition]  
              driver: amdgpu v: kernel  
              Display: x11 server: X.Org 1.20.5 driver: amdgpu resolution: 3840x2160~60Hz  
              OpenGL: renderer: Radeon Vega Frontier Edition (VEGA10 DRM 3.27.0 5.0.2-arch1-1-ARCH LLVM 8.0.0)  
              v: 4.5 Mesa 19.1.1  
    Network:   Device-1: Intel Ethernet I219-V driver: e1000e  
    Drives:    Local Storage: total: 6.37 TiB used: 1.49 TiB (23.3%)  
    Info:      Processes: 291 Uptime: 17m Memory: 15.58 GiB used: 1.62 GiB (10.4%) Shell: bash inxi: 3.0.26
    --OTHER--
    Card was at max memory clock when it hung (needed to get the recorder to work full-speed).
    Yes, you do need the air-cooled Vega FE to reproduce.
    The display was connected to DP-3. I'm not sure whether this is important but adding it in case it's needed.
    (No, I didn't upgrade the kernel because I know the later kernels introduce a random data corruption bug that affected me once)

    What could it be:

    Maybe some sort of race condition...
    Last edited by tildearrow; 07-09-2019, 02:45 AM.

  • #2
    Originally posted by smitty3268 View Post
    I guarantee you no one from AMD cares.
    I agree completely. This means, if it hangs once more, I will have to downgrade to Mesa 19.0, and if it still does, I will be forced to install Gentoo full of debug symbols and finally check out what the hell is causing this crap.

    This is also more proof that Polaris is more active than Vega.

    Comment

    Working...
    X