Originally posted by yesterday
View Post
Announcement
Collapse
No announcement yet.
Will H.264 VA-API / VDPAU Finally Come To Gallium3D?
Collapse
X
-
Originally posted by popper View Postdid you put your current alpha/beta prototype code on a github or do you intend to soon? and when do you assume your thesis will be done
there's all those other bit's and peace's of openCL/Cuda code mentioned on the x264dev logs etc too, it's not clear if they do/cover other stuff besides your "I've managed to convert Subpixel Prediction (sixtap + bilinear), IDCT, Dequantization, and the VP8 Loop filter" routines yet , as no ones bothered to collect them up and list the github etc location's on a web page somewhere
OpenCL Decoder work for libvpx' VP8 decoder. Contribute to awatry/libvpx.opencl development by creating an account on GitHub.
The bound copy of my thesis is due in 3 weeks, final draft 3/31 or 4/1, don't remember which .
I've only had Nvidia hardware to test on since my Radeon 4770 doesn't support the byte_addressable_store extension (5000-series and up only), but it runs on my GF9400m and a GTX 480 in current Ubuntu just fine. It also works fine on AMD Stream CPU-based OpenCL. I've gotten it working in Mac OS using CPU CL, but there's a bug in the Mac GPU-based acceleration that kills it every time and I haven't had time to track it down yet.
Like I said, I'm hoping to keep working on this after graduation, either as a hobby, or professionally if someone's willing to pay. I've gotten the OpenCL initialization framework in place, have all of the memory management taken care of, and have most of the major parts of the decoding available as CL kernels.
The next step that needs to be done is increasing the parallelism, as I'm currently capping out at 336 threads max, and the common case is only a few dozen threads, not enough to even approach achieve performance parity with the CPU-only paths. I've figured out a few ways to do that, especially in the loop filter (which accounts for 50% or so of the CPU-only execution time on a few of the 1080p videos I've profiled ). The sub-pixel prediction/motion compensation and Dequantization/IDCT will take a bit more work to thread effectively, but I think it can be done.
Comment
-
Originally posted by pingufunkybeat View PostNow we need Clover
I'm sick of using the binary Nvidia drivers on my desktop/laptop, and I'd love to be able to switch back to the OSS drivers.
Comment
-
If anyone interested, or would pick up this GSoC project, I do have some very early vaapi state_tracker code. I just got more important things to do, so I haven't touched it for a while. But the one doing the GSoC project, could get it if he/ she wants it.
Comment
-
Originally posted by tball View PostIf anyone interested, or would pick up this GSoC project, I do have some very early vaapi state_tracker code. I just got more important things to do, so I haven't touched it for a while. But the one doing the GSoC project, could get it if he/ she wants it.
then someone might make reference to it and encourage uptake and OC then there's always an off site backup if you loose your local hard drive with all that work on
Comment
-
Originally posted by popper View Post... and OC then there's always an off site backup if you loose your local hard drive with all that work on
Comment
-
Originally posted by Veerappan View PostThis... I feel a bit more comfortable knowing that I have a minimum of 7 identical copies of my thesis code spread across at least 5 physical locations.
by the way although it's no direct use for for the gfx code side, i noticed on one of Jason Garrett-Glaser latest ffmpeg VP8: optimization patches Diego Elio Petten? flameeyes mentioned the pahole utility from acmel's dwarves is designed to find the cacheline boundaries in structures, dont know if it's any good for the CPU side, but worth mentioning anyway just in case.
Comment
-
Originally posted by popper View PostLOL i thought you might.
So now my desktop is running hardware RAID 1 with git checkouts in both Linux and Windows partitions, and my laptop has git checkouts of my stuff on all 3 of its operating systems (Win7, Mac, Linux). Both laptop and desktop are periodically backed up to external drives (separate drives for each system). Eventually, I'll probably store those drives in my desk at work, but for now they're on a shelf.
I've got a co-located server in another state, the github master repository, and a checkout on my work computer. My HTPC has a copy as well (also RAID 1), just to provide another machine to test on.
I know it's excessive, but I really don't want to try to use the "hard drive ate my homework" excuse. I knew people in undergrad who used that one, and it sounded lame even then.
As far as the cache-line software goes, it could come in handy for profiling the CPU decoder. The reference VP8 decoder does force alignment to certain boundaries on many of its structures, but I haven't seen any work on cache line boundary detection (it may have happened, I just haven't seen it).
Comment
Comment