Announcement

Collapse
No announcement yet.

See How Your Linux System Compares To A $300 Broadwell-EP CPU That Lacks Turbo Boost

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by DanL View Post
    Wow. I didn't know it was such a dramatic difference. I have a GTX 950, so I'll have to play with NVENC when I get a chance (and the AC isn't on in the house). Unfortunately, my tool of choice has been Handbrake and the devs have no interest in NVENC.


    You'll find the supported hardware feature list at the bottom (Codec Support Matrix). For H.265/HEVC encoding does it need a Maxwell GM 20x chip or better. You will need the Nvidia proprietary driver (367.35 with kernel 4.6 or 370.23 with kernel 4.7), Nvidia Video SDK 7.0.1 and Nvidia CUDA 7.5. The Video SDK and CUDA stuff should you install into /usr/local/. Then you'll need to get the latest ffmpeg (i.e. 3.1.2). It's then just a matter of compiling ffmpeg with the paths all pointing into the right places.
    Code:
    export CFLAGS="$(cat $HOME/.cflags) -fipa-pta"
    export LDFLAGS="$(cat $HOME/.ldflags)"
    
    rm -rf objdir; mkdir objdir; pushd objdir
    export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
    export LD_RUN_PATH=$LD_LIBRARY_PATH
    ../ffmpeg-3.1.2/configure --prefix=$HOME/av \
                              --enable-gpl --enable-version3 --enable-nonfree \
                              --arch=x86_64 --cpu=native \
                              --disable-debug --disable-stripping \
                              --enable-opengl --enable-opencl \
                              --enable-vaapi --enable-vdpau \
                              --enable-cuda --enable-cuvid --enable-nvenc \
                              --enable-libnpp \
                              --extra-cflags="-I/usr/local/cuda/include -I/usr/local/Video_Codec_SDK_7.0.1/Samples/common/inc" \
                              --extra-ldflags="-L/usr/local/cuda/lib64" \
                              --ar=gcc-ar --nm=gcc-nm --ranlib=true
    make -j16 && make install
    popd
    ln -s /usr/local/cuda/lib64/libnpp[csi].so* $HOME/av/lib
    This is about how your configure should look like. It will enable nvenc and npp (CUDA-based scaling support).

    Once it's compiled can you run ffmpeg like this for example:
    Code:
    ffmpeg -i "$*" -v error \
           -map_metadata -1 -sn \
           -c:a aac -ac 2 -b:a 128k \
           -filter:v "hwupload_cuda,scale_npp=w=852:h=480:format=nv12:interp_algo=lanczos,hwdownload,format=nv12" \
           -c:v hevc_nvenc -b:v 768k \
           -preset slow -level 6.2 -tier high \
           -y "$out"
    I've been using this to transcode videos so I can watch them on a mobile phone. Hence the size reduction down to hd480 (done here with CUDA to 852x480), stereo conversion (from any multi-channel audio), stripping of subtitles + metadata and with a bit rate of 896kbps (128kbps audio + 768kbps video). Depending on the input do I get between 200 fp/s and 380 fp/s encoding speed from this.

    Hope this gives you a quick intro to it. It's definitely worth it.

    Comment


    • #22
      Originally posted by sdack View Post



      You'll find the supported hardware feature list at the bottom (Codec Support Matrix). For H.265/HEVC encoding does it need a Maxwell GM 20x chip or better. You will need the Nvidia proprietary driver (367.35 with kernel 4.6 or 370.23 with kernel 4.7), Nvidia Video SDK 7.0.1 and Nvidia CUDA 7.5. The Video SDK and CUDA stuff should you install into /usr/local/. Then you'll need to get the latest ffmpeg (i.e. 3.1.2). It's then just a matter of compiling ffmpeg with the paths all pointing into the right places.
      But output quality on same bitrate(size) is better on software encoder!

      Comment


      • #23
        Originally posted by miskol View Post
        But output quality on same bitrate(size) is better on software encoder!
        I've read about it, but I couldn't make out any differences. I am guessing it's a minor detail, which gets recited for making a technical argument, but its practically not an issue. It's also never going to get better than what you have as source and at these speeds do I really not care if libx265 can squeeze in a few extra bits in software. And of course the result with nvenc-based H.265 beats H.264 compression. That said, I only have a GTX 960 myself, which supports just the Main profile, whereas Pascal cards support Main, Baseline and High. Might be the argument is that libx265 gives you the High profile and the Maxwell cards don't. Anyhow, I couldn't make out a difference and so I don't care.

        Comment


        • #24
          With H.264, the GPU encoders' worse quality was visible to eyes. It would be interesting to see what the difference on H.265 is, even if the GPU encode can't be distinguished, it might still use 10% more space vs a software encoder at that exact quality.

          Comment


          • #25
            Originally posted by Floyddotnet View Post
            the problem with the turbo-boost I had with my pentium g3258 anniversary edition (Hashwell) too. Here ist he script that write the correct values into the MSR registers to enable the turbo boost.

            #!/bin/sh

            ...
            fi
            Interresting. I never saw that turbo boost was not activated by default...

            By the way what freq is it? I already oc it @4GHz without any change so maybe this could improve single processes.

            Comment


            • #26
              These are my results with my asus n752vx laptop:
              OpenBenchmarking.org, Phoronix Test Suite, Linux benchmarking, automated benchmarking, benchmarking results, benchmarking repository, open source benchmarking, benchmarking test profiles


              It complains that openmpi is not installed even if it is installed (I'm using archlinux and openmpi from repos) and some tests give errors.

              Comment


              • #27
                Originally posted by curaga View Post
                With H.264, the GPU encoders' worse quality was visible to eyes. It would be interesting to see what the difference on H.265 is, even if the GPU encode can't be distinguished, it might still use 10% more space vs a software encoder at that exact quality.
                That's what I just now did only to be sure. The "10-20 fp/s" figure for the software encoder came from my distro's ffmpeg and it wasn't quite fair to compare this to a self-compiled version of ffmpeg. So I've compiled one, which contains both, x265 and nvenc and optimized for an AMD FX 8350 Piledriver.

                As input did I download the latest episode of "Braindead" in 720p, H.264 encoded, and as found on EZTV. The file is 833MB in size.

                ffmpeg+libx265 with MMX2, SSE2Fast, SSSE3, SSE4.2, AVX, XOP, FMA4, FMA3, LZCNT, BMI1 on 8 cores at 4GHz transcoded it in 22m7.721s or at 45fp/s. So it is quite faster than what I got originally from my distro. Output file size is 269MB.

                ffmpeg+hevc_nvenc on the same CPU transcoded it in 3m55.830s or at 257fp/s. It is still 5.7x times faster than the software encoder. Output file size is 266MB.

                What is the quality like after shrinking it from 720p to 480p and transcoding it from H.264 into H.265? See here: http://i.imgur.com/6i1i0sf.png The source is at the centre, top-right is the result after it ran through nvenc and top-left is a magnification of it. Bottom-right is the result from x265 and bottom-left is the magnification for libx265.

                The parameters for both encoders aren't quite compatible and so I left them at their defaults wihile just using the Main profile. Bit rates are however identical (128kbps audio, 768kbps video).

                Comment


                • #28
                  I don't understand why the Xeon E5-2609 v4 costs $300 while the AMD FX-8320E is only $90 at microcenter.
                  From cpu benchmark: Xeon E5-2609 v3: No of Cores: 6
                  Max TDP: 80 W
                  Average CPU Mark: 5878
                  Single Thread Rating: 1113
                  FX-8320E: No of Cores: 4 (2 logical cores per physical)
                  Max TDP: 95 W
                  Average CPU Mark: 7451
                  Single Thread Rating: 1355

                  Comment


                  • #29
                    Originally posted by sdack View Post
                    ffmpeg+hevc_nvenc on the same CPU transcoded it in 3m55.830s or at 257fp/s. It is still 5.7x times faster than the software encoder. Output file size is 266MB.
                    GPU was an Nvidia GeForce GTX 960. Because one can run two encoding sessions in parallel in this GPU did I also do this...

                    Two ffmpeg+hevc_nvenc processes transcoding in parallel complete the same task in 4m27.701s or at 227fp/s each, making it a theoretical 454fp/s.

                    Comment


                    • #30
                      "TurboBoost" as Intel calls it wreaks havoc on apps that are sensitive to timings. Same with SpeedStep. We have to turn off all of the frequency/power mgmt. in BIOS and in the OS. Also HT is overrated, especially when used with ESX or any other virtualization OS. People keep thinking it works and acts like a real CPU, when it clearly doesn't. Can't tell you how many apps that need low latency processing fall apart on these power management/speed bumping/faux cores stuff. Physical cores scale linear, HT cores are incredibly steep slopes.

                      So while its great we love HT coming to Zen, few people know how to leverage it properly other than staring at their perfmon tool and watching twice as many grids scrolling.

                      Comment

                      Working...
                      X