Intel Is Still Working On G45 VA-API Video Acceleration

mark5 replied

01 April 2011, 02:46 PM
Intel is doing a great job and there remarkable work makes it different from others...there this work will also create a new race ...
Leave a comment:
const replied

22 March 2011, 02:42 PM
actually, here:

404 Not Found

http://lists.freedesktop.org/archives/intel-gfx/2011-February/009458.html

Gordon from Intel says about GM45 (4500MHD):

> Just H.264 hw decoding not supported yet. (it?s already supported on
> newer hw, but with lower priority to port to G45/GM45.)

he used "to port to G45/GM45", which suggests the code would be similar to what newer already supported hardware is using.

so, in this lines of thoughts, maybe it's possible even for someone from the community to port it. i'm not sure if enough information about G45/GM45 is available in the public domain though to such port be handled by developer independent from Intel based on the existing code for their newer hardware and changed that should be made for G45/GM45. also, maybe, it's not hard at all to be done, but "lower priority" for Intel as Gordon mentioned and probably waits when there is nothing more important for their developers to do.
Leave a comment:
rafirafi replied

22 March 2011, 10:40 AM
Indeed "Intel HD Graphics" is the name for the `gpu` in the i-core processors.
Leave a comment:
const replied

22 March 2011, 10:30 AM
Originally posted by rafirafi View Post

If you look to the Chipset item for the T410 at their page it says: "Intel HD Graphics" which is not "GMA 4500"...

well, it's really confusing and also misleading - i googled more and it seems there are Lenovo T410 with "GMA 4500" and some new ones with "GMA 5700" and the last is often also called "Intel HD Graphics or GMA HD" in different articles (BTW, it's not build-in the chipset, but rather in the new Intel "Core i" processors). so, it seems the information is probably for T410 with "GMA 5700".
Leave a comment:
rafirafi replied

22 March 2011, 09:20 AM
If you look to the Chipset item for the T410 at their page it says: "Intel HD Graphics" which is not "GMA 4500"... for the "support is being worked on" probably means you will probably never see it in your lifetime, but who knows (I think you can ask at their irc, I did it some months ago it was really funny).
Leave a comment:
const replied

22 March 2011, 08:33 AM
Originally posted by gbeauche View Post

MPEG-2 VLD is already implemented on GMA 4500MHD. H.264 support is being worked on. I think it was also mentioned that VC-1 won't be supported on those older chips.

ok, but here: http://intellinuxgraphics.org/user.html they state:

Originally posted by intellinuxgraphics.org

Laptop Lenovo T410 Intel HD Graphics Debian Squeeze works out of the box, with Intel 2010Q2 graphics package, Mpeg4 offloading to GPU works fine

and Lenovo T410 has two versions: one with "Intel GMA X4500 HD" and one with "Nvidia NVS 3100M". so, i don't understand why the information is so contradicting or maybe by "Mpeg4" they don't mean H.264. i have access to X4500 hardware, but i don't have time to test it, but in any case it's really confusing that it seems no one really knows the real state.
Leave a comment:
Veerappan replied

03 March 2011, 05:25 PM
Not Edit:
Any more than upping the loop filter by a factor of 3-5x is tough to say, as I haven't really looked at the loop filter algorithm in detail to know if it can be truly threaded the way it needs to be to get good performance and still get correct output.
Leave a comment:
Veerappan replied

03 March 2011, 05:19 PM
Originally posted by curaga View Post

Out of curiosity, what kind of speeds are you getting now?

Very low. I haven't had time to parallelize most of the algorithms, just get the CL framework in place to do the work and a direct port of the C code into CL kernels to prove correctness of the output of the ported code. For sub-pixel prediction of inter-coded Macroblocks, it's something like 5-10% of the C speed, 2-5% of the assembly optimized paths. This is doing 16x16, 8x8, 8x4, and 4x4 inter-prediction on the GPU, but one block at a time (far from optimal).

If I only did 16x16 and 8x8 prediction in CL, the numbers would probably be closer (not currently sure how much closer as then CPU/GPU transfers might be needed then and might provide a bottleneck), as the 16x16 and 8x8 kernels do a lot more work and launch a lot more threads than the 8x4 and 4x4 kernels. This is also only predicting one block within a Macroblock at a time, not batching all of the inter-coded Macroblocks together to launch a multi-dimensional kernel which predicts all blocks of the same size/type simultaneously. That would probably provide an actual improvement over the C code.

There's definitely still a lot of work to be done before this project gets close to speed parity with the C code.

As for what I've implemented and am getting correct output from:
Six-Tap and Bilinear Subpixel Prediction
IDCT/De-quantization
Loop Filtering (normal and simple filters)

The IDCT/Dequant has not gotten much attention, and so it's practically single-threaded, and therefore much slower than the subpixel prediction. The loop filter is also very lowly threaded, 8-16 at most currently, although that could be upped by a factor of at least 3 without too much work.
Leave a comment:
agd5f replied

27 February 2011, 01:05 PM
Certain parts do parallelize well (MC and to a certain extent iDCT). Prior to UVD, we used the 3D engine on those asics for these sorts of tasks.
Leave a comment:
curaga replied

27 February 2011, 01:00 PM
Out of curiosity, what kind of speeds are you getting now?
Leave a comment:

Announcement

Intel Is Still Working On G45 VA-API Video Acceleration

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: