Announcement

**sdack** · 29 August 2016, 03:03 AM

Originally posted by DanL View Post

Wow. I didn't know it was such a dramatic difference. I have a GTX 950, so I'll have to play with NVENC when I get a chance (and the AC isn't on in the house). Unfortunately, my tool of choice has been Handbrake and the devs have no interest in NVENC.

https://developer.nvidia.com/nvidia-video-codec-sdk

You'll find the supported hardware feature list at the bottom (Codec Support Matrix). For H.265/HEVC encoding does it need a Maxwell GM 20x chip or better. You will need the Nvidia proprietary driver (367.35 with kernel 4.6 or 370.23 with kernel 4.7), Nvidia Video SDK 7.0.1 and Nvidia CUDA 7.5. The Video SDK and CUDA stuff should you install into /usr/local/. Then you'll need to get the latest ffmpeg (i.e. 3.1.2). It's then just a matter of compiling ffmpeg with the paths all pointing into the right places.

Code:

export CFLAGS="$(cat $HOME/.cflags) -fipa-pta"
export LDFLAGS="$(cat $HOME/.ldflags)"

rm -rf objdir; mkdir objdir; pushd objdir
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export LD_RUN_PATH=$LD_LIBRARY_PATH
../ffmpeg-3.1.2/configure --prefix=$HOME/av \
                          --enable-gpl --enable-version3 --enable-nonfree \
                          --arch=x86_64 --cpu=native \
                          --disable-debug --disable-stripping \
                          --enable-opengl --enable-opencl \
                          --enable-vaapi --enable-vdpau \
                          --enable-cuda --enable-cuvid --enable-nvenc \
                          --enable-libnpp \
                          --extra-cflags="-I/usr/local/cuda/include -I/usr/local/Video_Codec_SDK_7.0.1/Samples/common/inc" \
                          --extra-ldflags="-L/usr/local/cuda/lib64" \
                          --ar=gcc-ar --nm=gcc-nm --ranlib=true
make -j16 && make install
popd
ln -s /usr/local/cuda/lib64/libnpp[csi].so* $HOME/av/lib

This is about how your configure should look like. It will enable nvenc and npp (CUDA-based scaling support).

Once it's compiled can you run ffmpeg like this for example:

Code:

ffmpeg -i "$*" -v error \
       -map_metadata -1 -sn \
       -c:a aac -ac 2 -b:a 128k \
       -filter:v "hwupload_cuda,scale_npp=w=852:h=480:format=nv12:interp_algo=lanczos,hwdownload,format=nv12" \
       -c:v hevc_nvenc -b:v 768k \
       -preset slow -level 6.2 -tier high \
       -y "$out"

I've been using this to transcode videos so I can watch them on a mobile phone. Hence the size reduction down to hd480 (done here with CUDA to 852x480), stereo conversion (from any multi-channel audio), stripping of subtitles + metadata and with a bit rate of 896kbps (128kbps audio + 768kbps video). Depending on the input do I get between 200 fp/s and 380 fp/s encoding speed from this.

Hope this gives you a quick intro to it. It's definitely worth it.

**miskol** · 29 August 2016, 03:07 AM

Originally posted by sdack View Post

https://developer.nvidia.com/nvidia-video-codec-sdk

You'll find the supported hardware feature list at the bottom (Codec Support Matrix). For H.265/HEVC encoding does it need a Maxwell GM 20x chip or better. You will need the Nvidia proprietary driver (367.35 with kernel 4.6 or 370.23 with kernel 4.7), Nvidia Video SDK 7.0.1 and Nvidia CUDA 7.5. The Video SDK and CUDA stuff should you install into /usr/local/. Then you'll need to get the latest ffmpeg (i.e. 3.1.2). It's then just a matter of compiling ffmpeg with the paths all pointing into the right places.

But output quality on same bitrate(size) is better on software encoder!

**sdack** · 29 August 2016, 03:32 AM

Originally posted by miskol View Post

But output quality on same bitrate(size) is better on software encoder!

I've read about it, but I couldn't make out any differences. I am guessing it's a minor detail, which gets recited for making a technical argument, but its practically not an issue. It's also never going to get better than what you have as source and at these speeds do I really not care if libx265 can squeeze in a few extra bits in software. And of course the result with nvenc-based H.265 beats H.264 compression. That said, I only have a GTX 960 myself, which supports just the Main profile, whereas Pascal cards support Main, Baseline and High. Might be the argument is that libx265 gives you the High profile and the Maxwell cards don't. Anyhow, I couldn't make out a difference and so I don't care.

**curaga** · 29 August 2016, 06:07 AM

With H.264, the GPU encoders' worse quality was visible to eyes. It would be interesting to see what the difference on H.265 is, even if the GPU encode can't be distinguished, it might still use 10% more space vs a software encoder at that exact quality.

**Passso** · 29 August 2016, 08:15 AM

Originally posted by Floyddotnet View Post

the problem with the turbo-boost I had with my pentium g3258 anniversary edition (Hashwell) too. Here ist he script that write the correct values into the MSR registers to enable the turbo boost.

#!/bin/sh

...
fi

Interresting. I never saw that turbo boost was not activated by default...

By the way what freq is it? I already oc it @4GHz without any change so maybe this could improve single processes.

**Xwaang** · 29 August 2016, 08:45 AM

These are my results with my asus n752vx laptop:

Intel Xeon E5-2609 V4 Linux Testing Benchmarks - OpenBenchmarking.org

http://openbenchmarking.org/result/1608296-HA-1608277LO71

OpenBenchmarking.org, Phoronix Test Suite, Linux benchmarking, automated benchmarking, benchmarking results, benchmarking repository, open source benchmarking, benchmarking test profiles

It complains that openmpi is not installed even if it is installed (I'm using archlinux and openmpi from repos) and some tests give errors.

**sdack** · 29 August 2016, 09:19 AM

Originally posted by curaga View Post

With H.264, the GPU encoders' worse quality was visible to eyes. It would be interesting to see what the difference on H.265 is, even if the GPU encode can't be distinguished, it might still use 10% more space vs a software encoder at that exact quality.

That's what I just now did only to be sure. The "10-20 fp/s" figure for the software encoder came from my distro's ffmpeg and it wasn't quite fair to compare this to a self-compiled version of ffmpeg. So I've compiled one, which contains both, x265 and nvenc and optimized for an AMD FX 8350 Piledriver.

As input did I download the latest episode of "Braindead" in 720p, H.264 encoded, and as found on EZTV. The file is 833MB in size.

ffmpeg+libx265 with MMX2, SSE2Fast, SSSE3, SSE4.2, AVX, XOP, FMA4, FMA3, LZCNT, BMI1 on 8 cores at 4GHz transcoded it in 22m7.721s or at 45fp/s. So it is quite faster than what I got originally from my distro. Output file size is 269MB.

ffmpeg+hevc_nvenc on the same CPU transcoded it in 3m55.830s or at 257fp/s. It is still 5.7x times faster than the software encoder. Output file size is 266MB.

What is the quality like after shrinking it from 720p to 480p and transcoding it from H.264 into H.265? See here: http://i.imgur.com/6i1i0sf.png The source is at the centre, top-right is the result after it ran through nvenc and top-left is a magnification of it. Bottom-right is the result from x265 and bottom-left is the magnification for libx265.

The parameters for both encoders aren't quite compatible and so I left them at their defaults wihile just using the Main profile. Bit rates are however identical (128kbps audio, 768kbps video).

**dxxvi** · 29 August 2016, 11:05 AM

I don't understand why the Xeon E5-2609 v4 costs $300 while the AMD FX-8320E is only $90 at microcenter.
From cpu benchmark: Xeon E5-2609 v3: No of Cores: 6
Max TDP: 80 W
Average CPU Mark: 5878
Single Thread Rating: 1113
FX-8320E: No of Cores: 4 (2 logical cores per physical)
Max TDP: 95 W
Average CPU Mark: 7451
Single Thread Rating: 1355

**sdack** · 29 August 2016, 11:32 AM

Originally posted by sdack View Post

ffmpeg+hevc_nvenc on the same CPU transcoded it in 3m55.830s or at 257fp/s. It is still 5.7x times faster than the software encoder. Output file size is 266MB.

GPU was an Nvidia GeForce GTX 960. Because one can run two encoding sessions in parallel in this GPU did I also do this...

Two ffmpeg+hevc_nvenc processes transcoding in parallel complete the same task in 4m27.701s or at 227fp/s each, making it a theoretical 454fp/s.

**edwaleni** · 29 August 2016, 11:07 PM

"TurboBoost" as Intel calls it wreaks havoc on apps that are sensitive to timings. Same with SpeedStep. We have to turn off all of the frequency/power mgmt. in BIOS and in the OS. Also HT is overrated, especially when used with ESX or any other virtualization OS. People keep thinking it works and acts like a real CPU, when it clearly doesn't. Can't tell you how many apps that need low latency processing fall apart on these power management/speed bumping/faux cores stuff. Physical cores scale linear, HT cores are incredibly steep slopes.

So while its great we love HT coming to Zen, few people know how to leverage it properly other than staring at their perfmon tool and watching twice as many grids scrolling.

Announcement

See How Your Linux System Compares To A $300 Broadwell-EP CPU That Lacks Turbo Boost

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment