Announcement

Collapse
No announcement yet.

H.264 VA-API Encode Comes To Gallium3D

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Oh cool, just saw this. Here's the patchset for easy download: https://patchwork.freedesktop.org/series/9839/

    I never got openmax encoding to work, I think because exactly of the bellagio stuff.

    This is the test pipeline I most recently tried:
    $ gst-launch-1.0 videotestsrc ! omxh264enc ! "video/x-h264,profile=high" ! h264parse ! matroskamux ! filesink location=output.avi
    And that was the error message:
    OMX-could not load /usr/lib/bellagio/libomxvideosched.so ==> OMX.st.video.scheduler ==> OMX.st.video.scheduler: ==> 2 1,456192 1,304128: /usr/lib/bellagio/libomxvideosched.so ==> OMX.st.video.scheduler ==> OMX.st.video.scheduler: ==> 2 1,456192 1,304128: cannot open shared object file: No such file or directory

    Sounds confusing? With strace:

    open("/usr/lib/bellagio/libomxvideosched.so ==> OMX.st.video.scheduler ==> OMX.st.video.scheduler: ==> 2 1,456192 1,304128", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
    you see that it has some weird stuff behind the library filename.

    I think omxregister does that, because in ~/.omxregister I had
    Code:
    /usr/lib/bellagio/libomxvideosched.so ==> OMX.st.video.scheduler ==> OMX.st.video.scheduler: ==> 2 1,456192 1,304128
    /usr/lib/bellagio/libomx_mesa.so
    BELLAGIO_SEARCH_PATHOMX-Failed to write %zu bytes to fd %d
    OMX-Component % ==> OMX.mesa.video_decoder ==> OMX.mesa.video_decoder.mpeg2:OMX.mesa.video_decoder.avc:
     ==> OMX.mesa.video_encoder ==> OMX.mesa.video_encoder.avc:
    /usr/lib/bellagio/libomxaudio_effects.so
    BELLAGIO_SEARCH_PATHOMX-Failed to write %zu bytes to fd ==> OMX.st.volume.component ==> OMX.st.volume.component: ==> 2 1,65536 1,32768
     ==> OMX.st.audio.mixer ==> OMX.st.audio.mixer: ==> 1 50,60000
    /usr/lib/bellagio/libomxclocksrc.so
    BELLAGIO_SEARCH_PATHOMX-Failed to write %zu bytes to fd %d ==> OMX.st.clocksrc ==> OMX.st.clocksrc:
    and removing that weird stuff in the first line made it go a little bit further but fail with
    Code:
    OMX-Component not found with current ST static component loader.
    I had a go at looking at the omxregister source code, but: https://sourceforge.net/p/omxil/omxi...egister.c#l187


    So trying this patchset on my intel + radeon latpop (I heard the h.264 quality of VCE is better than intel's quicksync):
    Code:
    DRI_PRIME=1 LIBVA_DRIVER_NAME=radeonsi vainfo                                                                                                                             :(
    libva info: VA-API version 0.39.2
    libva info: va_getDriverName() returns 0
    libva info: User requested driver 'radeonsi'
    libva info: Trying to open /usr/lib/dri/radeonsi_drv_video.so
    libva info: Found init function __vaDriverInit_0_39
    libva info: va_openDriver() returns 0
    vainfo: VA-API version: 0.39 (libva 1.7.1)
    vainfo: Driver version: mesa gallium vaapi
    vainfo: Supported profile and entrypoints
         VAProfileMPEG2Simple            : VAEntrypointVLD
         VAProfileMPEG2Main              : VAEntrypointVLD
         VAProfileVC1Simple              : VAEntrypointVLD
         VAProfileVC1Main                : VAEntrypointVLD
         VAProfileVC1Advanced            : VAEntrypointVLD
         VAProfileH264Baseline           : VAEntrypointVLD
         VAProfileH264Baseline           : VAEntrypointEncSlice
         VAProfileH264Baseline           : VAEntrypointEncPicture
         VAProfileH264Main               : VAEntrypointVLD
         VAProfileH264High               : VAEntrypointVLD
         VAProfileNone                   : VAEntrypointVideoProc
    The test pipeline
    Code:
    DRI_PRIME=1 LIBVA_DRIVER_NAME=radeonsi gst-launch-1.0 videotestsrc ! vaapih264enc ! "video/x-h264,profile=high" ! h264parse ! matroskamux ! filesink location=output.avi
    Actually does output something that looks like a test video.

    gstreamer-screenrecording by Pontostroy does not work though.
    Code:
    ERROR: from element /GstPipeline:pipeline0/GstXImageSrc:ximagesrc0: Internal data flow error.
    Additional debug info:
    gstbasesrc.c(2948): gst_base_src_loop (): /GstPipeline:pipeline0/GstXImageSrc:ximagesrc0:
    streaming task paused, reason not-negotiated (-4)
    Probably just needs minor adjustments for the usage with VCE

    Comment


    • #12
      Originally posted by schmidtbag View Post
      Makes me wonder - for people who do screen recording on a discrete GPU, could you use VA-API encoding accelerated by an IGP? Seems like it'd be a great use to hardware that most people don't care about, when screen recording can usually be pretty demanding.
      On both intel and radeon GPUs the encoding is done on completely separate hardware. Unless there is a bug relating to power management there should be basically no difference in the rest of the GPU performance. Only the resource impact of grabbing the image 30 or 60 times a second is really relevant and it's actually relatively small.

      Comment


      • #13
        Originally posted by puleglot View Post

        If you have mesa <12.0.0, then you probably only have /usr/lib/x86_64-linux-gnu/dri/gallium_drv_video.so, so you should export LIBVA_DRIVER_NAME=gallium. With mesa-12.0.0 and newer there are proper hardlinks, e. g. on gentoo:
        Code:
        $ ls -l /usr/lib64/va/drivers/
        total 5608
        -rwxr-xr-x 2 root root 2867296 Jul 9 23:18 r600_drv_video.so
        -rwxr-xr-x 2 root root 2867296 Jul 9 23:18 radeonsi_drv_video.so
        I use current mesa drivers on Kubuntu 16.04, which are 11.2 I suppose; perhaps some drivers library is missed?

        Comment


        • #14
          Originally posted by Azrael5 View Post

          I use current mesa drivers on Kubuntu 16.04, which are 11.2 I suppose; perhaps some drivers library is missed?
          According to [1] all you need is "mesa-va-drivers" package as its latest version should install all necessary symlinks. But I'm not sure.
          [1] https://bugs.launchpad.net/ubuntu/+s...a/+bug/1481832

          Comment


          • #15
            Originally posted by puleglot View Post

            According to [1] all you need is "mesa-va-drivers" package as its latest version should install all necessary symlinks. But I'm not sure.
            [1] https://bugs.launchpad.net/ubuntu/+s...a/+bug/1481832
            thanks for suggestion
            the fact that my current gpu processes the post-processing by shaders unit could be the cause of the absence on hardware decoding support?
            Last edited by Azrael5; 14 July 2016, 10:40 AM.

            Comment


            • #16
              Finally found a pipeline that does not segfault gst-launch. Apparently you need to convert the video format before feeding it to vaapih264enc. I tried format=I420 as is hardcoded in gstreamer-screenrecording for intel vaapi encoding, then I tried format=YV12 because that's a format the commits are talking about ("st/va: add conversion for yv12 to nv12in putimage") but with both the image is badly garbled.

              Then I tried the other format, NV12, that the commit talks about and this seems to be the only one from those three that are actually supported by the driver.

              So this is the pipeline:
              LIBVA_DRIVER_NAME=radeonsi gst-launch-1.0 ximagesrc display-name=:0 use-damage=0 startx=0 starty=0 endx=1919 endy=1079 ! videoconvert ! video/x-raw,framerate=25/1,format=NV12 ! vaapih264enc ! h264parse ! matroskamux ! filesink location=output.avi

              And this is the video it produced:
              LIBVA_DRIVER_NAME=radeonsi gst-launch-1.0 ximagesrc display-name=:0 use-damage=0 startx=0 starty=0 endx=1919 endy=1079 ! videoconvert ! video/x-raw,framerate...

              The colors in htop still look pretty washed out, as on intel's encoding, but much more problematic is the performance. That looks more like 3 fps than 25.

              Edit: Tried the exact same pipeline (added queues to test whether that was the problem):

              1 on radeonsi: LIBVA_DRIVER_NAME=radeonsi gst-launch-1.0 ximagesrc display-name=:0 use-damage=0 startx=0 starty=0 endx=1919 endy=1079 ! queue ! videoconvert ! queue ! video/x-raw,framerate=25/1,format=NV12 ! vaapih264enc ! h264parse ! matroskamux ! filesink location=output.avi

              2 on intel: gst-launch-1.0 ximagesrc display-name=:0 use-damage=0 startx=0 starty=0 endx=1919 endy=1079 ! queue ! videoconvert ! queue ! video/x-raw,framerate=25/1,format=NV12 ! vaapih264enc ! h264parse ! matroskamux ! filesink location=output.avi

              1 produces super choppy video as you can see on my youtube link.
              2 is relatively smooth video as you expect.

              So it definitely is something with encoding on the radeon VCE.

              It's actually pretty cool that the radeon GPU wakes up from runpm on its own (and I don't even need to use DRI_PRIME=1 as I first thought).
              Edit: Wait, DRI_PRIME=1 IS needed...

              Edit: Ah, it's just very, very slow.
              Using this 5 second 720p example video: http://www.sample-videos.com/video/m...y_720p_1mb.mp4
              Code:
              DRI_PRIME=1 LIBVA_DRIVER_NAME=radeonsi /usr/bin/time gst-launch-1.0 -e filesrc location=SampleVideo_1280x720_1mb.mp4 ! decodebin ! queue ! videoconvert ! queue ! video/x-raw
              ,format=NV12 ! vaapih264enc ! h264parse ! matroskamux ! filesink location=output.mkv
              Leitung wird auf PAUSIERT gesetzt ...
              libva info: VA-API version 0.39.2
              libva info: va_getDriverName() returns 0
              libva info: User requested driver 'radeonsi'
              libva info: Trying to open /usr/lib/dri/radeonsi_drv_video.so
              libva info: Found init function __vaDriverInit_0_39
              libva info: va_openDriver() returns 0
              Leitung läuft vor …
              Kontext von Element »vaapiencodeh264-0« erhalten: gst.vaapi.Display=context, gst.vaapi.Display=(GstVaapiDisplay)NULL;
              Verzögerung neu verteilen …
              Verzögerung neu verteilen …
              Leitung ist vorgelaufen …
              Leitung wird auf ABSPIELEN gesetzt ...
              New clock: GstSystemClock
              EOS wurde von Element »pipeline0« erhalten.
              Execution ended after 0:01:05.027487617
              Leitung wird auf PAUSIERT gesetzt ...
              Leitung wird auf BEREIT gesetzt ...
              Leitung wird auf NULL gesetzt ...
              Leitung wird geleert ...
              0.83user 1.19system 1:09.60elapsed 2%CPU (0avgtext+0avgdata 91792maxresident)k
              0inputs+51168outputs (0major+14087minor)pagefaults 0swaps
              Last edited by haagch; 14 July 2016, 06:56 PM.

              Comment


              • #17
                Originally posted by haagch View Post
                Finally found a pipeline that does not segfault gst-launch. Apparently you need to convert the video format before feeding it to vaapih264enc. I tried format=I420 as is hardcoded in gstreamer-screenrecording for intel vaapi encoding, then I tried format=YV12 because that's a format the commits are talking about ("st/va: add conversion for yv12 to nv12in putimage") but with both the image is badly garbled.

                Then I tried the other format, NV12, that the commit talks about and this seems to be the only one from those three that are actually supported by the driver.

                So this is the pipeline:
                LIBVA_DRIVER_NAME=radeonsi gst-launch-1.0 ximagesrc display-name=:0 use-damage=0 startx=0 starty=0 endx=1919 endy=1079 ! videoconvert ! video/x-raw,framerate=25/1,format=NV12 ! vaapih264enc ! h264parse ! matroskamux ! filesink location=output.avi

                And this is the video it produced:
                LIBVA_DRIVER_NAME=radeonsi gst-launch-1.0 ximagesrc display-name=:0 use-damage=0 startx=0 starty=0 endx=1919 endy=1079 ! videoconvert ! video/x-raw,framerate...

                The colors in htop still look pretty washed out, as on intel's encoding, but much more problematic is the performance. That looks more like 3 fps than 25.
                Yeah, the performance looks like it still has some issues, but at least it appears functional.

                1) Make it work.
                2) Make it fast.

                Comment


                • #18
                  Originally posted by haagch View Post
                  Oh cool, just saw this. Here's the patchset for easy download: https://patchwork.freedesktop.org/series/9839/

                  I never got openmax encoding to work, I think because exactly of the bellagio stuff.
                  There are newer patches now.

                  On the omx issue - It is a bit tricky to get working I use LFS so probably didn't do it the same as a distro, some random thoughts.

                  gst-omx needs setting up as well as bellagio.

                  Assuming you have libomx_mesa.so in /usr/lib/bellagio/ try running as root

                  omxregister-bellagio -v (maybe repeat this at various stages I can't remember what I did !)

                  gst-omx needs to know where gstomx.conf is mine is in /etc and contains

                  [omxh264dec]
                  type-name=GstOMXH264Dec
                  core-name=/usr/lib/libomxil-bellagio.so.0
                  component-name=OMX.mesa.video_decoder.avc
                  rank=256
                  in-port-index=0
                  out-port-index=1

                  [omxmpeg2dec]
                  type-name=GstOMXMPEG2VideoDec
                  core-name=/usr/lib/libomxil-bellagio.so.0
                  component-name=OMX.mesa.video_decoder.mpeg2
                  rank=256
                  in-port-index=0
                  out-port-index=1

                  [omxh264enc]
                  type-name=GstOMXH264Enc
                  core-name=/usr/lib/libomxil-bellagio.so.0
                  component-name=OMX.mesa.video_encoder.avc
                  rank=256
                  in-port-index=0
                  out-port-index=1

                  Also gstreamer keeps a cache under $HOME which seems to need deleting and regenerating (I am crap with gstreamer there must be a better way!)

                  If I want to use omx I do -

                  export GST_OMX_CONFIG_DIR=/etc
                  rm -r ~/.cache/gstreamer-1.0/
                  gst-inspect-1.0 | grep omx

                  Which should output

                  omx: omxh264enc: OpenMAX H.264 Video Encoder
                  omx: omxmpeg2dec: OpenMAX MPEG2 Video Decoder
                  omx: omxh264dec: OpenMAX H.264 Video Decoder

                  On the commands I wouldn't bother asking for high as baseline is what's supported so far.

                  Comment


                  • #19
                    On the issue of screen recording. On my old Phenom II x4 965 the bottleneck for me would be doing the BGR0 -> nv12 conversion. It's just too expensive and unless someone wrote a shader so the GPU did it then having VCE isn't really going to help. (Maybe 444 support would, if it would take rgb).

                    Currently to do 1920x1080 60fps I would need to use libx264rgb --preset ultrafast and convert to something more normal later. Given the result is going to be big and how lame Linux disk cache can be, on a long run I wouldn't be surprised if it didn't grind to a halt during a massive flush to disk = use tmpfs ramdisk if possible.

                    If I were doing lower fps and converting to nv12 I would have to set cpufreq to perf as on_demand for me it useless at detecting SIMD loads - which is what ffmpeg would be using to do the conversion.

                    Comment


                    • #20
                      Originally posted by legume View Post
                      On the issue of screen recording. On my old Phenom II x4 965 the bottleneck for me would be doing the BGR0 -> nv12 conversion. It's just too expensive and unless someone wrote a shader so the GPU did it then having VCE isn't really going to help. (Maybe 444 support would, if it would take rgb).
                      Isn't that a simple color space conversion? I don't know much about video processing but I thought you should be able to do that in real time with no problems.

                      As you can see in my video, htop shows basically no significant CPU usage from any process.

                      Also I tried running an exact same pipeline on intel vaapi and radeonsi vaapi (including conversion to nv12) and the intel one was quite smooth while the radeonsi one was as choppy as you can see here.

                      Comment

                      Working...
                      X