Announcement

Collapse
No announcement yet.

H.264 VA-API Encode Comes To Gallium3D

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by legume View Post
    On the issue of screen recording. On my old Phenom II x4 965 the bottleneck for me would be doing the BGR0 -> nv12 conversion. It's just too expensive and unless someone wrote a shader so the GPU did it then having VCE isn't really going to help. (Maybe 444 support would, if it would take rgb).
    Isn't that a simple color space conversion? I don't know much about video processing but I thought you should be able to do that in real time with no problems.

    As you can see in my video, htop shows basically no significant CPU usage from any process.

    Also I tried running an exact same pipeline on intel vaapi and radeonsi vaapi (including conversion to nv12) and the intel one was quite smooth while the radeonsi one was as choppy as you can see here.

    Comment


    • #22
      Originally posted by atomsymbol View Post

      $ man fdatasync

      LD_PRELOAD
      Well I didn't write the apps I use - but anyway I was talking about default behavior, there's a tuneable that lets you change the % of ram allowed to fill before flush - that would help.

      Comment


      • #23
        Originally posted by haagch View Post
        Isn't that a simple color space conversion? I don't know much about video processing but I thought you should be able to do that in real time with no problems.

        As you can see in my video, htop shows basically no significant CPU usage from any process.

        Also I tried running an exact same pipeline on intel vaapi and radeonsi vaapi (including conversion to nv12) and the intel one was quite smooth while the radeonsi one was as choppy as you can see here.
        I wouldn't say it's simple - for years the reverse has been offloaded to GPU to save CPU.
        For me top, like cpufreq, doesn't seem to register SIMD usage very well.

        Of course my CPU is old and I only have ffmpeg to easily bench with (gstreamer in theory possible to bench, but I couldn't get it to work).

        I could do 25fps so your observation about intel vs radeon is quite valid.

        If you are not running the latest patches, there's quite a big memory leak lurking to mess things up.

        Also if you don't specify cbr you get cqp which currently seems to defaul to 0 = huge/slow.

        Your prime setup is quite different from me - so I guess there will be difference there (I am nowhere near as slow as you see), but vaapi is still somewhat slower than omx for me - but it's still early days.

        Comment


        • #24
          Originally posted by legume View Post
          Well I didn't write the apps I use - but anyway I was talking about default behavior, there's a tuneable that lets you change the % of ram allowed to fill before flush - that would help.
          Well, automatic recognition of streaming applications isn't coming to the Linux kernel any time soon. Torvalds' style of thinking is preventing it.

          Test code (golang):
          Code:
          package main
          
          import (
              "math/rand"
              "os"
              "time"
          )
          
          func main() {
              f, _ := os.Create("stream")
              defer f.Close()
          
              data := make([]byte, 1<<10)
              nwritten := 0
              for i := 0; i < 1<<(34-10); i++ {
                  data[rand.Intn(len(data))] = byte(rand.Int())
                  n, _ := f.Write(data)
                  nwritten += n
                  if nwritten&((1<<20)-1) == 0 {
                      println(nwritten>>20, "MiB")
                  }
                  time.Sleep(10 * time.Microsecond)
              }
          }

          Comment


          • #25
            I made another reply hours ago which still hasn't turned up - makes typing out stuff rather pointless on here :-(

            Comment


            • #26
              Originally posted by legume View Post
              I made another reply hours ago which still hasn't turned up - makes typing out stuff rather pointless on here :-(
              Blame the crappy vBulletin software. Indirectly, blame Phoronix for using said crappy vBulletin software. This vBulletin abomination is such immense crap, I can't believe people are paying money for it.

              Comment


              • #27
                Originally posted by legume View Post
                Of course my CPU is old and I only have ffmpeg to easily bench with (gstreamer in theory possible to bench, but I couldn't get it to work).
                Well, you could do what I did and take http://www.sample-videos.com/video/m...y_720p_1mb.mp4 and run
                LIBVA_DRIVER_NAME=radeonsi /usr/bin/time gst-launch-1.0 -e filesrc location=SampleVideo_1280x720_1mb.mp4 ! decodebin ! queue ! videoconvert ! queue ! video/x-raw,format=NV12 ! vaapih264enc ! h264parse ! matroskamux ! filesink location=output.mkv
                Originally posted by legume View Post
                If you are not running the latest patches, there's quite a big memory leak lurking to mess things up.
                Well, this is the newest that is available as patchset on patchwork...

                Originally posted by legume View Post
                Also if you don't specify cbr you get cqp which currently seems to defaul to 0 = huge/slow.
                Well, I had to look that one up. It's vaapih264enc rate-control=2.
                No difference though, still takes 1:09 minutes.


                So I revisited omx again.
                /etc/xdg/gstomx.conf is the default location I believe. Mine looks a bit different:
                Code:
                [omxh264dec]
                type-name=GstOMXH264Dec
                core-name=/usr/lib/libomxil-bellagio.so.0
                component-name=OMX.radeonsi.video_decoder.avc
                rank=256
                in-port-index=0
                out-port-index=1
                
                [omxmpeg2dec]
                type-name=GstOMXMPEG2VideoDec
                core-name=/usr/lib/libomxil-bellagio.so.0
                component-name=OMX.radeonsi.video_decoder.mpeg2
                rank=256
                in-port-index=0
                out-port-index=1
                
                [omxh264enc]
                type-name=GstOMXH264Enc
                core-name=/usr/lib/libomxil-bellagio.so.0
                component-name=OMX.radeonsi.video_encoder.avc
                rank=256
                in-port-index=0
                out-port-index=1
                Not sure where exactly I got. OMX.radeonsi.video_encoder.avc. I tried your component-name=OMX.mesa.video_encoder.avc but it doesn't make a difference.

                omxregister-bellagio produces this file: http://haagch.frickel.club/files/.omxregister
                I believe this is totally wrong, but I have no working one to compare.

                Comment


                • #28
                  Hi,
                  will this work with ffmpeg's vaapi encoder? I have tried to use VCE encoding on my kabini some time ago, but building these gstreamer pipelines and setting up omx bellagio was a pain...

                  Comment


                  • #29
                    ffmpeg will work in the future - it was almost there last time I tried (needed patching).

                    Comment


                    • #30
                      Originally posted by haagch View Post
                      Well, you could do what I did and take http://www.sample-videos.com/video/m...y_720p_1mb.mp4 and run
                      LIBVA_DRIVER_NAME=radeonsi /usr/bin/time gst-launch-1.0 -e filesrc location=SampleVideo_1280x720_1mb.mp4 ! decodebin ! queue ! videoconvert ! queue ! video/x-raw,format=NV12 ! vaapih264enc ! h264parse ! matroskamux ! filesink location=output.mkv
                      That's not really what I meant when talking about benching screencap + bgr0 -> nv12, but FWIW on that very very short sample using s/w decode (your exact command doesn't work for me and hwdec -> hwenc has issues curently) I get -
                      Code:
                      time gst-launch-1.0 -e filesrc location=~/big_buck_bunny_720p_1mb.mp4 ! qtdemux ! h264parse ! avdec_h264 ! queue ! videoconvert ! queue ! video/x-raw,format=NV12 ! vaapih264enc rate-control=cbr bitrate=2000 ! h264parse ! matroskamux ! filesink location=output.mkv
                      Setting pipeline to PAUSED ...
                      libva info: VA-API version 0.38.1
                      libva info: va_getDriverName() returns 0
                      libva info: Trying to open /usr/lib/dri/radeonsi_drv_video.so
                      libva info: Found init function __vaDriverInit_0_38
                      libva info: va_openDriver() returns 0
                      Pipeline is PREROLLING ...
                      Got context from element 'vaapiencodeh264-0': gst.vaapi.Display=context, gst.vaapi.Display=(GstVaapiDisplay)NULL;
                      Redistribute latency...
                      Pipeline is PREROLLED ...
                      Setting pipeline to PLAYING ...
                      New clock: GstSystemClock
                      Got EOS from element "pipeline0".
                      Execution ended after 0:00:00.797613942
                      Setting pipeline to PAUSED ...
                      Setting pipeline to READY ...
                      Setting pipeline to NULL ...
                      Freeing pipeline ...
                      
                      real    0m1.236s
                      user    0m1.284s
                      sys     0m0.069s
                      The bench I was referring to not being able to get to work was the gstreamer equivalent of doing with ffmpeg -

                      ffmpeg -f x11grab -r 300 -s 1920x1080 -i :0.0 -f null -

                      and

                      ffmpeg -f x11grab -r 300 -s 1920x1080 -i :0.0 -pix_fmt nv12 -f null -

                      both of which output an fps result, I get -

                      100 and 22fps with cpufreq on_demand and
                      138 and 35fps with cpus full freq, maybe ffmpeg/my box is just exceptionally crap at this task!


                      omxregister-bellagio produces this file: http://haagch.frickel.club/files/.omxregister
                      I believe this is totally wrong, but I have no working one to compare.
                      Hmm, what does gstreamer say if you try to use it?
                      Mine looks like -

                      Code:
                      /usr/lib/bellagio/libomx_mesa.so
                       ==> OMX.mesa.video_decoder ==> OMX.mesa.video_decoder.mpeg2:OMX.mesa.video_decoder.avc:
                       ==> OMX.mesa.video_encoder ==> OMX.mesa.video_encoder.avc:
                      /usr/lib/bellagio/libomxvideosched.so
                       ==> OMX.st.video.scheduler ==> OMX.st.video.scheduler: ==> 2 1,456192 1,304128
                      /usr/lib/bellagio/libomxaudio_effects.so
                       ==> OMX.st.volume.component ==> OMX.st.volume.component: ==> 2 1,65536 1,32768
                       ==> OMX.st.audio.mixer ==> OMX.st.audio.mixer: ==> 1 50,60000
                      /usr/lib/bellagio/libomxclocksrc.so
                       ==> OMX.st.clocksrc ==> OMX.st.clocksrc:

                      Comment

                      Working...
                      X