Announcement

Collapse
No announcement yet.

AMD Releases R600/700 3D Documentation

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by bridgman View Post


    It seems that quad-core processors are able to keep up with H.264 at 1080p, at least with some software decoders.

    I believe we have already released enough information to implement both motion comp and the in-loop deblocking filter on shaders. My guess is that implementing those two would drop CPU utilizatin enough that a modern dual-core CPU (like yours) could handle the rest. Won't know for sure until we see it running, of course.
    It is not something I am proud of, but it is a single core CPU I have

    But if quad core can handle 1080p, then no problem. AMD's roadmap says that quad core Propus 45Watt 45nm will come in maj 2009, and I read at AMDzone.com yesterday, that AM3 socket quad cores Daneb are roling out in Febuar.

    Comment


    • #32
      Originally posted by chaos386 View Post
      Not sure about what the minimum is, but my 2.2 GHz C2D laptop can play back 1080p H.264 just fine. AFAIK, none of the open-source video players are multithreaded, either, so the number of cores shouldn't be too important.
      Strange that no one have taken up the challenge. Shouldn't video decoding be one of those cases that is perfect for multi threading? I.e. scales almost perfect for each extra core.

      Comment


      • #33
        I took a quick skim through the forums; looks like the work has been started but is not fully there. It may be that slice-level decoding is working (for video which has multiple slices per frame) but apparently not all encoders make heavy use of slices.

        Slices sure seem like the most obvious option for multi-threading and the only one which doesn't involve building and balancing a pipeline.

        EDIT - here we go :

        FFmpeg supports slice based threading. That means it can use as many threads as the h264 file has slices.

        The implications behind that are the following :
        - if you didn't create the file you want to play, you don't know whether you'll be able to play it using several threads or not before actually playing it

        - recent x264 revisions use a frame based parallelism, and don't support slices anymore, so the main open source provider of h264 stream isn't "thread compatible" with ffmpeg at the present time

        - most professionnal encoders use a sliced based approach to encoding. So it's highly probable that you'll be able to decode Apple's trailers, and broadcasted HD videos using both threads.

        http://lists.mplayerhq.hu/pipermail/...er/069275.html
        In short words, ffmpeg supports multithreading if the video is encoded with multiple slices per frame, so the most common h.264 encoder creates video which can't be multi-threaded on the most common h.264 decoder. Boo
        Last edited by bridgman; 01-27-2009, 12:15 PM.

        Comment


        • #34
          What does the current -lavdopts threads=X do then? In the manpage of mplayer it mentions this option can be used for MPEG-1/2 and H.264 decoding.

          Comment


          • #35
            Isn't video decoding supposed to be solved in Gallium3D in vendor-independent way? I would like AMD (Intel, Via, ...) to follow nVidia route of creating own video decoding library when there is chance to have something generic that can be reused for all current and future codecs/GPUs.

            Comment


            • #36
              oh sorry, last sentance is wrong... "I would _not_ like ..."

              Comment


              • #37
                Understood. As long as decoding over Gallium continues to work out I think we would want to support that effort rather than creating something new -- the issue is that the existing APIs don't seem to run at the right level to match what we definitely know can be done in the open drivers, so something new *may* be needed anyways.

                The existing video-over-Gallium3D work uses XvMC, but that won't handle H.264 or VC-1 without some API changes. There's going to be some iteration involved -- implement something, see what is feasible on a broad range of GPUs, figure out the right API level(s) and whether existing APIs can be used, adjust the implementation, retest etc...

                I want to make sure we end up with something that takes full advantage of 780-class shader power, so probably MC + deblocking filter will be about right. Back-of-envelope calculations suggest that 1080p H.264 might need ~8B FLOPS* for MC+deblock and the 780 can crunch maybe 20B FLOPS when not memory-limited, so that seems reasonable.

                * you don't really need floating point for video decode but modern GPUs are floating point processors so...
                Last edited by bridgman; 01-27-2009, 12:09 PM.

                Comment


                • #38
                  Originally posted by chaos386 View Post
                  Not sure about what the minimum is, but my 2.2 GHz C2D laptop can play back 1080p H.264 just fine. AFAIK, none of the open-source video players are multithreaded, either, so the number of cores shouldn't be too important.
                  I think Xine can multithread. You can change the number of threads in the Xine settings.
                  It uses both cores of my AMD 5000x2. Together with my Sapphire X550 and the latest xorg driver (ati) + ffmpeg, I can now watch 1080i BBC HD. 1080p is not possible.


                  Greetings,
                  Steven

                  Comment


                  • #39
                    Wow, nice to see people wanting to get involved.

                    I'm one of those that wasn't born when X was around. (I'm only 20!) I got into this kind of work because AMD put out the r5xx documentation, and at the time my only working computer was an Asus laptop with a Radeon Mobility X1700.

                    So, being the enterprising entity that I am, I walked into the IRC channel (#radeon on Freenode), and inquired. Turns out that there wasn't really anybody working on it, but there was only one piece of the puzzle missing, so if somebody could write a fragment program compiler for r5xx, it should all just magically work.

                    So I did. It was not exactly easy; it took me a few months before I came anywhere near actual understanding of the code. I knew C, but I didn't *know* C. But, as I worked, I kept reading code, and reading docs, and bugging airlied and glisse with stupid questions, and eventually, things started to come together.

                    I'll even dish a few pointers for free. Doing r6xx support on Mesa is kind of silly in my opinion, but Gallium work requires a bunch of experimental pieces, and there's still bugs here and there. If I were to start r6xx drivers today, I'd start by getting a mug of hot chocolate and sitting down with the r6xx docs, and read those front to back a few times.

                    ~ C.

                    Comment


                    • #40
                      Originally posted by MostAwesomeDude View Post
                      Doing r6xx support on Mesa is kind of silly in my opinion, but Gallium work requires a bunch of experimental pieces, and there's still bugs here and there.
                      Heck, even we agree with that, but there's a "but..."

                      From a developer's perspective, working in the classic Mesa HW driver model is silly. Nobody feels that more strongly than the devs actually doing the work. From a user perspective, though, it's different -- until all the experimental bits and pieces fetch up in at least a few distros, anything 6xx-ish we do in Gallium is not going to be broadly accessible to them.

                      The best compromise we could come up with was to get the basic programming sequences worked out in classic Mesa so that users of current distros will have Compiz support, then port the working 6xx code across to Gallium and never look back.

                      BTW, for anyone not following IRC, not only did MostAwesomeDude implement a lot of the 5xx 3D support (including the shader compiler for ARB_vertex_program and ARB_fragment_program) but he has been working on a Gallium3D implementation for 3xx-5xx and saw the first screen output from that in the last few days.
                      Last edited by bridgman; 01-27-2009, 01:53 PM.

                      Comment


                      • #41
                        Originally posted by bridgman View Post
                        BTW, for anyone not following IRC, not only did MostAwesomeDude implement a lot of the 5xx 3D support (including the shader compiler for ARB_vertex_program and ARB_fragment_program) but he has been working on a Gallium3D implementation for 3xx-5xx and saw the first screen output from that in the last few days.
                        Respect

                        Excellent news about the 3D documentation.
                        Last edited by tmpdir; 01-27-2009, 04:45 PM.

                        Comment


                        • #42
                          Originally posted by bridgman View Post
                          I took a quick skim through the forums; looks like the work has been started but is not fully there. It may be that slice-level decoding is working (for video which has multiple slices per frame) but apparently not all encoders make heavy use of slices.

                          Slices sure seem like the most obvious option for multi-threading and the only one which doesn't involve building and balancing a pipeline.

                          EDIT - here we go :



                          In short words, ffmpeg supports multithreading if the video is encoded with multiple slices per frame, so the most common h.264 encoder creates video which can't be multi-threaded on the most common h.264 decoder. Boo
                          I'd just like to add that this is (thankfully) not totally correct: there's an experimental tree that adds frame-level parallelism decoding for mpeg1/2/4 and H264. If you check out http://gitorious.org/projects/ffmpeg/repos/ffmpeg-mt you can get that tree. It still needs some work, so if you'd like to see multi-threaded decoding of H264 videos in the main tree, poke the owner of that tree.

                          If you look on the ffdshow-tryout thread on doom9, you can see some benchmarking numbers for ffmpeg-mt- they're pretty good.

                          Comment


                          • #43
                            OpenCL haven't been mentioned to decode H264.

                            Is that because OpenCL isn't good for that?

                            Comment


                            • #44
                              OpenCL falls into the same category as Gallium3D, CUDA, or any purely shader-based implementation. It's probably going to be quite good for the back-end part of the decoding pipe (motion comp, filtering), might or might not be good for the middle part of the pipe (inverse quantization, IDCT) depending on the implementation, and probably not good for the start of the pipe (bitstream processing, entropy decoding).

                              The good news is that the processing at the front of the pipe tends to be easier to do on the CPU than the processing at the end of the pipe, so with luck it will all balance out.

                              Comment


                              • #45
                                Has an OpenCL implementation (even a non-accelerated reference implementation) even been released yet? From my perspective, it looks like that's the biggest barrier to OpenCL work right now - the spec is out, but there doesn't seem to be any way to actually run OpenCL code (unless you're a developer at ATI/Nvidia/Apple). I think that has more to do with the lack of excitement over it than anything about its technical merits.

                                Comment

                                Working...
                                X