Announcement

**Neuro** · 01 May 2010, 01:57 PM

Yea, I know I'm probably a bit boring at this point, but bridgeman, what's the status of the legal review?

Also I've read that a significant poriton of h.264 decoding is not really prone to parallelization using shaders, so, AFAIK, it's not suitable for gallium3d at all. Does this mean that decent 1080p decoding of h.264 on open source ati drivers will never work?

**jrch2k8** · 01 May 2010, 04:12 PM

first uvd is not easy to get out of AMD cuz the dmca is going to push hard to protect what's left of the HDCP protection(which nobody here cares btw), so even if AMD wants to, it probably wont ever happen but will probably a 3rd party fault like dmca, so dont wait for it with much hope.

now the solution would be to wait until opencl g3d kicks out cuz with that we can accelerate the decoding process inside the codecs(this can take some work at first cuz most codecs arent designed to massive parallelization, but is solvabalbe) and the scaling color management insive Xv, that way we can entirely bypass the hdcp protection crap and even support more and more codecs in the future.

now for play blurays in linux, well most of the neccasaries are already cracked so having opencl accelerated decoding/color convertion is the last missing thing.

this can be done from some months ago but only in nvidia hardware, so maybe someone with some opencl knowledge and an nvidia card can step up and begin to code some stuff in opencl

**bridgman** · 01 May 2010, 04:23 PM

First pass through failed, but that's not a surprise, just means we need a new plan/proposal. BTW it's technical review, not legal review.

Yep, a portion of the h.264 decode is not currently practical to parallelize and so won't be a good fit with shaders. The obvious counterpoint is that a portion of the h.264 decode *is* practical to parallelize. If the question is "will we see 100% of h.264 decode happening on the GPU ?" the answer is probably not, but we've all been saying that from the start. If the question is "will we see enough h.264 run on the GPU that a typical CPU can handle what's left ?" I think the answer is still "yes".

The main risk to shader-based decoding IMO is multi-thread decoders maturing and fast CPUs becoming sufficiently cheap and power-efficient that the community loses interest in GPU decode before it gets fully implemented.

**rbmorse** · 01 May 2010, 07:31 PM

Originally posted by bridgman View Post

The main risk to shader-based decoding IMO is multi-thread decoders maturing and fast CPUs becoming sufficiently cheap and power-efficient that the community loses interest in GPU decode before it gets fully implemented.

John, is that a risk...or a hope?

**Killigrew** · 01 May 2010, 07:47 PM

hi, i would say it will go the other way around.
More users use less powerfull cpu's like Athlon Neo or Intel Atom
because they have enough power for every day use.

greetings

**popper** · 01 May 2010, 07:51 PM

its seems for an ATI/AMD gfx card to be used for High profile AVC/H.264 Encode/Decode without access to the UVD ASIC you NEED somone to make a Viable SAD and SATD OpenCL capable of replacing the current x264 CPU SAD/SATD assembily etc.

"Sum of absolute transformed differences (SATD) is a widely used video quality metric used for block-matching in motion estimation for video compression. It works by taking a frequency transform, usually a Hadamard transform, of the differences between the pixels in the original block and the corresponding pixels in the block being used for comparison. The transform itself is often of a small block rather than the entire macroblock. For example, in x264, a series of 4?4 blocks are transformed rather than doing the more processor-intensive 16?16 transform."

Dark Shikari, over 3 years ago now made an attempt at a SAD CUDA code and apparently made some good progress without knowing the API or gfx card coding so you could take that as a POC and recode it to ATI GPU capable OpenCL perhaps If You want to be practical about actually advancing things and produce viable current code.

H.264 in CUDA - Motion Estimation

https://sites.google.com/site/x264cuda/motion-estimation

Overview See: http://en.wikipedia.org/wiki/Motion_estimation Parallelizing Motion Estimation Problem The biggest problem of doing motion estimation on CUDA is getting the MVp for a given macroblock. The H.264 standard defines the MVP based on the macroblocks surrounding a given macroblock (as

"SATD (Sum of Average Transformed Differences)

SATD uses the Hadamard transform (check p451ff of http://www.jjj.de/fxt/fxtbook.pdf if you want to know more). The hadamard transform is performed on the difference between the original block and the predicted block, then the absolute values of the hadamard transforms coefficients are summed to give the SATD score

It's one of the metrics for similarity of blocks.

it's used mostly in qpel search, deciding intra prediction directions and for MB mode decision (this is when RDO is disabled)
SATD isn' t used during the integer part of the search, except when '--me tesa' is used but not otherwise. So if --me is hex (as in by default), no SATD. if me is dia,hex,umh or esa then satd won't be used in the fullpel search. SATD is also used when you have 'i_subpel_refine > 1' mbcmp_init in encoder.c sets fpelcmp to sad or satd and mbcmp is always satd except when 'subme <= 1'

How much gain is there from tesa vs esa for motion estimation? very little, almost placebo effect

<Setsuna-Xero> if you want to optimize one thing that will speed everything up SATD and SAD are the two that will really show it but that said they're optimized beyond all reason
<holger> setsuna-xero: already being in the works. you'll be surprised soon

<Setsuna-Xero> you mean theres more? 0_0
<holger> no, satd isn't. there's a lot more

<Dark_Shikari> hadamard transform is trivial
<Dark_Shikari> its the frequency transform that can be represented by a matrix multiply with a matrix of nothing but 1s and -1s
<Dark_Shikari> i.e. the frequency transform calculatable with only adds and subtracts
<Dark_Shikari> the simplest possible transform
<holger> imagine the array going on top [0..x]. in each row, if there's a star below the array element, add it, otherwise sub it. row sums are transformed values. abs and sum them to get satd."

NVIDIA Official Forums

http://forums.nvidia.com/lofiversion/index.php?t53172.html

Keep up to date with the latest announcements & discussions on the hot topics.

"Dark Shikari
Dec 4 2007, 10:12 PM
I'm an x264 developer and I'm beginning the long and dreadful process of porting x264's motion estimation functions to CUDA. To begin with, I need to get a SAD (sum of absolute differences) function working. Here's what I have now:"

OC to be viable anything you might produce doesnt need to cover all the most advanced uses it might be used for to start with , but it does Need to work and be faster than the slow fallback C code you might run on the CPU today id think!

also Bridgeman himself tells use Some ATi cards and the current API do now come with a form of Hardware SAD too today, so are You good enough and interested enough to try and make a viable ATI/AMD OpenCL GPU SAD/SATD Proof Of Concept that plugs into the current x264 adn ffmpeg codebase for fun ?

**popper** · 01 May 2010, 08:07 PM

"Quote:
Originally Posted by bridgman
The main risk to shader-based decoding IMO is multi-thread decoders maturing and fast CPUs becoming sufficiently cheap and power-efficient that the community loses interest in GPU decode before it gets fully implemented.
"

Originally posted by rbmorse View Post

John, is that a risk...or a hope?

LOL , i thought that too

allas its not clear that ATI/AMD are actually optimising or adding to their embedded microcode, instructions that are actually faster at the Very common SIMD SAD/SATD etc that x264 and other codebases use and optimise for a given processor hence it seems the bad showing that the new stopgap phenom-ii-x6-1090t-890fx have right now, even though they have 2 more SIMD capable CPUs to add to the x264 encoding

AMD Phenom II X6 1090T And 890FX Platform Review: Hello, Leo

http://www.tomshardware.com/reviews/amd-phenom-ii-x6-1090t-890fx,2613-7.html

Hot on the heels of Intel's Core i7-980X, AMD is looking to show enthusiasts that six cores don't have to cost four digits. Starting under $200, the real question is whether the new X6s are better than the X4s and Intel's quad-core Core i7-930 at 2.8 GHz.

**Kjella** · 01 May 2010, 11:00 PM

Originally posted by bridgman View Post

The main risk to shader-based decoding IMO is multi-thread decoders maturing and fast CPUs becoming sufficiently cheap and power-efficient that the community loses interest in GPU decode before it gets fully implemented.

I'm starting to trend towards this, but only because I realized how far along ffmpeg-mt has come. One thing is price and power efficiency but quite frankly there hasn't been much of an implementation and getting CoreAVC to work under Linux was a huge pain.

Ffmpeg-mt is still not quite as good as the single threaded version, it still has some small corruption sometimes so it's not entirely bug free, but my CPU use is down to 30-40% on a Core i7-860 and it's silky smooth and doesn't crash.

However I did get myself the new Panasonic TM-700 which records in 1080p60, that it'll still struggle with but using -fast even that is doable in software. Then I'm running the CPU at near full though though 70-80% spikes.

I realize this might not be the cheapest CPU nor the most power efficient way but well... Right now I'm more interested in getting good 3D working.

**bridgman** · 01 May 2010, 11:17 PM

Yep. My current thinking is that we're still likely to see some shader-based decode acceleration on the GPU, probably just motion comp or similar (but the H.264/VC1 flavours, not MPEG2) aimed mostly at reducing CPU load enough that you can get decent playback on a low-ish end CPU.

What I'm seeing is that most of the systems out there with CPUs that need some decode help also tend not to have big honkin' GPUs either, which implies that you don't want to do too much on shaders or you'll just overload the GPU instead of the CPU.

I do have a nagging concern that going through Gallium3D for motion comp is going to require a CPU premium over implementing MC natively, and since the target hardware generally will have a wussy CPU this might be enough to steer the implementation back to a native one rather than being based on Gallium3D.

On the other hand the really attractive thing about Gallium3D is that it is sufficiently portable that it would probably be feasible to add Gallium3D-based code to ffmpeg while I don't know how the ffmpeg team would feel about adding a heap of GPU-specific code.

I guess we can worry about that once the current dev priorities get further along and someone can spend some time prototyping MC etc... on Gallium3D and/or natively.

**Neuro** · 02 May 2010, 02:42 AM

Thanks bridgman

So to sum up, please correct me if I'm wrong:
1. There are no plans to use UVD in open source radeon stack, probably due to legal issues.
2. There are other priorities to be fulfilled before HW accel is even considered by AMD devs.
3. Not all of h.264 will be decoded on the GPU (as in UVD), but rather the easily parralelizable parts like motion compensation etc. Correct me if I'm wrong, but this would still require a decent processor (circa 2.2GHz?) to do the serial (bytestream?) operations with high bitrate streams (1080p?)
4. The HW accel is planned to be done on top of Gallium for easy portability accross gfx cores at the expense of more CPU power requirements, or is to be done in native ATI shader code for more performance at the expense of portability.

Announcement

UVD/hw acceleration If, when?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment