There May Still Be Hope For R600g Supporting XvMC, VDPAU

popper replied

25 October 2010, 08:34 PM
jrch2k8 have a read of the content and comments on http://x264dev.multimedia.cx/archives/71
http://x264dev.multimedia.cx/archives/486 and
http://x264dev.multimedia.cx/archives/377 if you haven't already to get an idea of the people most skilled in assembly etc.
Leave a comment:
popper replied

25 October 2010, 08:01 PM
"
#46 jrch2k8
see it more like a study phase to understand gallium and how video decode and encode works from a gpu point of view that in a near future could produce some code good enough to reproduce mpeg2 video using the GPU and compliant with va-api.

the idea at least in my head is after we master TGSI and video algorithm theory decently enough is sit an have a brainstorm of a more focused project aiming to be codec agnostic and maybe even process some sections of audio compression in the gpu among other stuffs"

oh sure, i dont expect you or the team to produce anything in the short term, in fact many GPU uni guys pass through x264dev channels with great potential and say they will produce x then never come back after talking with devs LOL.

even those OpenCL guys never did the simple pop on IRC, ask DS if he's around, he is there at all times of the 24 hour day (odd sleeper apparently)then tell him this OpenCL test/prototype code exists at location and can the dev's review it and tell them what small changes to make so it can be pushed to master on the next commit, done.

that IS the way to get any patch committed on x264 and even FFmpeg is getting better as more dev's get commit privilege

BUT you forgot the Most Important Thing in all this, you Need to have FUN learning this stuff.

and im sure you would love talking to Holger etc on #x264dev he Will have idea's and and perhaps new code to use if its a better idea than used now for "a smarter way to multiply 3 matrix" and the 3 IDCT versions you outline and probably lots more FUN for you
Leave a comment:
jrch2k8 replied

25 October 2010, 06:03 PM
Originally posted by bridgman View Post

The reality is that IDCT is going to have to be implemented anyways if only to determine whether running it on GPU was a good idea or not

well im not an expert in IDCT or anything like it ofc, but so far i founded 3 ways of resolving a fast IDCT

1. using shift and mask (butterfly IDCT)
2. integers mostly (FFT IDCT)
3. matrixes multiplication (DCT float M'TM)

Cuda developers proved that IDCT using FFT is slow on GPUs so i discard it and the butterfly way is horrible to vectorize(look the pdf that explain the butterfly algorithm you will understand me) so this is not discarded yet but i want to put my hope in get a pure float M'TM first and then when i get more confidence with TGSI bench it against butterfly, especially since M'TM is beatiful to vectorize and prefecth and can be reduced to only Vectors Dot Products aka sums and mult's transposing the incoming macroblock and the coeff matrix .

so far using the third method simply converted to SSE/2 aka no prefecht or memory optimization at all (since gpu is different on this aspect)
it can resolve 64000 8x8 dct'ed block in around 0.35 ish ms on 1 core of my phenom II X4 (it can be faster once i found a way to calculate a dot4 without leaving the cache or find a smarter way to multiply 3 matrix).

from the perfomance point of view i believe this code should behave almost linear in a GPU (as far i understand adds and muls can be done in 1 cycle) and provide a nice performance or at least beast the cpu by a nice factor.

another interesting stuff of pass idct to the gpu is bandwith saving since is ovbiuosly cheaper pass the dct'ed data wich is very small to the gpu memory than pass the IDCT data wich is way bigger.

another benefit could be the color convertion pass (maybe force it naturally to RGBA ) since by the time IDCT ended and MC kick in you almost have a full frame so in theory could a time saver add the color convertion step inside IDCT and hack MC to process all in rgba(for now is a crazy idea, remember im learning this stuff while speaking)(ofc i need to think of a fast way to do color convertion that maintain the linear behavior of the IDCT algorithm (remember still learning and looking for differnt ways of get the job done so maybe ill find out later that this is impossible or creppy faster or the same but i try it and a +1 for me)

but i strongly believe that idct on the gpu would be way nicer XD (for now tu tu tu tun tu tu tun)
Leave a comment:
jrch2k8 replied

25 October 2010, 05:30 PM
Originally posted by popper View Post

well, im in favor of the engineer approach, taking the best in class of whatever's available at the time of design and connecting the dots as it where.

make them all fit together in a consistent way, but allow for pulling a given element and replacing it with something totally different , rinse/repeat.

and OC any such design Must allow and include some form of basline code path you can turn on, to actually test each element/block for speed/time so you can find the bottle necks, and write test cases to see if your new idea is actually better ,faster and actually works.

but most apps/HW seem to go for the 'it's good enough' for that is how 'I Think' it should work without getting meaningful feedback from a 3rd party that's gone through that type of design already, and doesn't allow for simple future speed improvements by others and thats a shame.

that's how the UVD came become virtually unused by Open source 3rd party's it seems,

as someone inside AMD/ATI had a 'for that is how 'I Think' it should work' moment with their 'i think ill put the protected DRM/BR
code in this UVD along with the decode logic and save some money/time/kudos this quarter

well you are right up to some point, we all have many ideas of what is good and wrong with ffmpeg/x264/vaapi/players/tv apps/etc when you confront it from the point of view of an integrated video framework for linux and other unix like oses, so is not like we are expecting anything to be widely accepted or even used in the short term.

see it more like a study phase to understand gallium and how video decode and encode works from a gpu point of view that in a near future could produce some code good enough to reproduce mpeg2 video using the GPU and compliant with va-api.

the idea at least in my head is after we master TGSI and video algorithm theory decently enough is sit an have a brainstorm of a more focused project aiming to be codec agnostic and maybe even process some sections of audio compression in the gpu among other stuffs
Leave a comment:
popper replied

25 October 2010, 01:01 PM
well, im in favor of the engineer approach, taking the best in class of whatever's available at the time of design and connecting the dots as it where.

make them all fit together in a consistent way, but allow for pulling a given element and replacing it with something totally different , rinse/repeat.

and OC any such design Must allow and include some form of basline code path you can turn on, to actually test each element/block for speed/time so you can find the bottle necks, and write test cases to see if your new idea is actually better ,faster and actually works.

but most apps/HW seem to go for the 'it's good enough' for that is how 'I Think' it should work without getting meaningful feedback from a 3rd party that's gone through that type of design already, and doesn't allow for simple future speed improvements by others and thats a shame.

that's how the UVD came become virtually unused by Open source 3rd party's it seems,

as someone inside AMD/ATI had a 'for that is how 'I Think' it should work' moment with their 'i think ill put the protected DRM/BR
code in this UVD along with the decode logic and save some money/time/kudos this quarter
Leave a comment:
bridgman replied

25 October 2010, 12:38 PM
I guess it depends on what you mean by "vaapi backend". If the plan is to implement a VLD-level vaapi entry point then that will be "clean" from a user perspective but "dirty" from a developer perspective since a lot of decode-on-the-CPU code will need to be (re)implemented.

If, on the other hand, you are talking about implementing a lower level vaapi entrypoint (IDCT, MoComp or Deblocking) that corresponds with the functionality you are likely to be able to implement, and modifying an existing decoder stack to use that lower level entry point, then that seems a lot more do-able.

* DISCLAIMER - when I looked at the vaapi spec I saw the entry points but didn't think I saw enough of the right kind of data structures to implement an IDCT or MC-level interface... guess I expected it to look a bit more like XvMC
Leave a comment:
tball replied

25 October 2010, 12:22 PM
Originally posted by popper View Post

no, you miss the point tball , x264 may be considering only H.264 right now, but they do and know Mpeg2 and Vp8 Encode AND Decode assembly and C inside out, as do the FFmpeg dev's.

they can Help You and your team get this prototype up and running and you can port and re-factor any code sections that suits you to replace the CPU code with GPU code where needed/wanted, and use the other CPU parts unchanged to start with.

its perfectly usual and expected to start riping out things you dont need for a prototype,and building basic simple test case per new function you want to write and test, just as the OpenCL University guys did "..Considering the fact that only a fraction of the motion estimation capabilities have been ported to OpenCL...." section by section, one routine at a time.

the only other code-base of significance that really matters is LinuxTv/Media and they too use x264/FFmpeg code frame works inside their hardware devices etc, everything else assentially is just secondary apps and code that wrap's/port's code from these 3.

That would be the easiest way, yes. But that approach has already been considered.

Implementing a vaapi backend seems somewhat cleaner, don't you agree?
If you like, plz join #gallium-vdpau irc channel and discuss it with us.
Leave a comment:
popper replied

25 October 2010, 05:39 AM
no, you miss the point tball , x264 may be considering only H.264 right now, but they do and know Mpeg2 and Vp8 Encode AND Decode assembly and C inside out, as do the FFmpeg dev's.

they can Help You and your team get this prototype up and running and you can port and re-factor any code sections that suits you to replace the CPU code with GPU code where needed/wanted, and use the other CPU parts unchanged to start with.

its perfectly usual and expected to start riping out things you dont need for a prototype,and building basic simple test case per new function you want to write and test, just as the OpenCL University guys did "..Considering the fact that only a fraction of the motion estimation capabilities have been ported to OpenCL...." section by section, one routine at a time.

the only other code-base of significance that really matters is LinuxTv/Media and they too use x264/FFmpeg code frame works inside their hardware devices etc, everything else assentially is just secondary apps and code that wrap's/port's code from these 3.
Leave a comment:
tball replied

25 October 2010, 04:08 AM
Originally posted by popper View Post

so hang on , keeping in mind you also say
"We are still in the very early state. I can't [say?] anything about how we will reach any usable state."

so if im getting this right, Jrch2k8 has made or ported someones existing IDCT sse2, and he likes and understands x86 assembly ?

i also seem to get the impression (reading some post or other from him here) that he also knows TGSI assembly and will be writing that too ?

so, your at a very early stage and are basicly wanting to use any and all the other missing parts MC,CABAC , CAVLC etc for your Decoder from elsewhere or new wrapper code etc where needed Yes?

hmmm, Yet it appears Jrch2k8, you and whoever have not actually appeared and asked questions where your most likely to get in-depth help, feedback and actual working code to look at and learn from.

that being the codebase and IRC dev channels for x264 and FFmpeg or Doom10

Dark Shikari and team are VERY helpful (but will call you on stupid if you say you actually get something explained then turns out you dont lol )and encourages all new devs to ask video Encode/Decode and related code questions as soon as possible, preferably before they make to many 101's or waste time working out problems that dont exist, they dont care who you are, as high ranking intel guy found out until he got on IRC and finally understood their working practice .

for instance see http://doom10.org/index.php?topic=658.0

doom10.org

http://doom10.org/index.php?topic=571.0

for the x264 openCL code some non x264 dev's patched http://www.gpucomputing.net/?q=node/1143
http://li5.ziti.uni-heidelberg.de/x264gpu/ but no one seems to know if it even runs, never mind might give you hints and in your DEcode project but worth a look perhaps, but apparently they never did talk to DS Etc to get it submitted, submitting easy

theres also http://x264dev.multimedia.cx/archives/157 search on x264 IDCT or whatever video related term you like and the chances are Dark Shikari or a related x264 dev has made an indepth block about it.

for instance the x264 dev's helped make the FFmpeg VP8 Decoded the worlds fastest LOL.http://x264dev.multimedia.cx/

adjusting and expanding the x264 code framework to help, and some of the x264 dev's are right now working on x262 the Mpeg2 Encoder using this same x264 framework remember also, ffmpeg being their preferred input/decode code base and they hack that too, DS etc having rights there to post patches and new code etc with write access to the repo.

all told you and indeed any dev's wanting to know Detailed inner working of video are well advised to go over to freenode.net/ and Actually talk to Dark Shikari and the other x264/FFmpeg dev's in #x264dev and #ffmpeg-devel as that is where actual development happens.

you can even use any part of their code and get algorithm advice and details...

Thx for your advice. I will look into it.
But as I have stated before, we haven't even reach h264 yet. The first milestone is to get a mpeg2 decoder up and running, with e.g. a vaapi state_tracker.
Leave a comment:
tball replied

25 October 2010, 04:05 AM
Originally posted by bridgman View Post

tball; search for "deathsimple koenig" and you'll get an email address... I'd put it here but don't want to make his spam situation any worse

Thx. Found him :-)
Leave a comment:

Announcement

There May Still Be Hope For R600g Supporting XvMC, VDPAU

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: