Announcement

Collapse
No announcement yet.

AOMedia's "AVM" Repository Serves As Reference Implementation For Eventual AV1 Successor

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #11
    Was there any progress with Daala's approach of lapped transforms?

    Comment


    • #12
      Originally posted by Mathias View Post

      I agree that some next generation video codec will have an AI enhancement layer baked in. I guess this will improve the quality for very low bitrates significantly. IMO one of the biggest issues will be what metric to use to judge quality improvements. AI can generate good looking pictures that don't resemble the original well, how will the codec and the developers know something looks good, if it doesn't have to resemble the original perfectly? We need an AI based metric for that as well. Which then has the problem that an AI judges what an AI generates. So that metric has to be exceptionally good. So that needs a lot of training data...

      That will of course also make comparisons to other codecs difficult. Because if one codec is tuned for a specific metric, it of course beats every other codec tuned to a different metric. This alone will give "a 300% improvement in image quality!!!!".
      We already have one: VMAF. It uses several traditional metrics and lots of human-generated rankings in an AI model. The AOM devs use it extensively. And, in my experience, its great, but not perfect either.


      Also, resolution scaling and enhancement are obviously big ones, but other steps can benefit as well. Denoising video, and then generating that noise back, for instance, is something AI is already very good at, and that CPUs struggle to do quickly with traditional algorithms. Various other decisions, like deciding what data to throw away or what parameters to use based on the input, could also leverage trained neural nets.


      This of course opens the question of hardware requirements. I can see a future where encoding AV2 on a CPU alone is basically impossible, and even decoding them on CPU may skip some quality enhancing steps... not that you *need* hardware acceleration for neural net implementations, but you do basically need it for certain steps.
      Last edited by brucethemoose; 06 May 2022, 04:52 PM.

      Comment


      • #13
        Originally posted by cl333r View Post
        I imagine the biggest issue is to develop a next-gen solution while not stepping on any landmine in the giant and complicated patent minefield.
        That has always been the problem, even for AV1 itself and for its predecessors (VP8/9 etc).

        I think the solution to this is not to try to develop a codec that would not infringe on a single MPEG-LA patent which, for all we know, might be strictly impossible. The solution is to scare the MPEG-LA away. The AOMedia alliance is a benemoth and its own patent pool can be lethal to the MPEG-LA as much as the other way around. Mutually Assured Destruction: that's what stopped the Cold War from turning into a real war, and it's still the best known strategy against patent trolls.

        Comment


        • #14
          Originally posted by brucethemoose View Post
          IIRC there was talk of incorporating neural nets into parts of the next gen codec.

          I know AI gets thrown around as a buzzword and slapped on things that don't deserve the term, but there are stages of the AV1 pipeline that it could really dramatically help.
          I alway imagined AI to be really helpful for encoding tasks, just think of an autoencoders, you have like a 64x64x3 Block input which narrows down to like a 256 vector and widens up again to 64x64x3
          You train that network with common Video blocks, so it learns to efficently encode common blocks in it´s 256 wide middle section.
          Then you cut up the network, use the first part during encoding, store the 256 wide vector as the "encoded" block in the compressed video file and on the decoding side, you use the second half of the pre-trained network to decode that vector again to a 64x64x3 Matrix.

          with a large enought network and sufficient training, the network will probably do things like encode gradients and do some color from luma encoding and so on. The thing is: You don´t have to think about that / which algorithms would be most efficient, it just "learns" it automatically during training.

          You can even use networks for encoding motion / doing interframe encoding, by having a network which has like the last 3-6 frames as an input vector to the right-side layers of an auto-encoder network and having the current block as input to the left side and as an output.. In the bitstream just save the intermediate vector, during decoding the past frames are known as they are already decoded.

          Such a network would probably encode motion vector like things in it´s intermediate layer.

          To scale bandwidth <-> Quality just have multiple networks with different sizes of intermediate layers.

          Comment


          • #15
            Originally posted by shmerl View Post
            Was there any progress with Daala's approach of lapped transforms?
            It wasn't even ready to merge as an experiment in AV1.
            Even PVQ that is used in the Opus codec didn't fit in AV1 because the way that AV1 is structured made it a dozen times slower.
            Although other experiments originated on Daala were successful like Chroma from Luma, CDEF, and Multi-symbol Entropy Coding.
            Last edited by juarezr; 06 May 2022, 09:29 PM.

            Comment


            • #16
              why would you want noise back that you cleaned up? Is that to avoid to clean and perfect surfaces?

              Comment


              • #17
                Originally posted by bemerk View Post
                why would you want noise back that you cleaned up? Is that to avoid to clean and perfect surfaces?
                Film grain is nice if you use it as a visual tool that gives a specific Look. That"s why movies use them, And if you reescode the noise/grain can hide a lot of banding that would be obvious on the denoised output Cleaned up does sometimes really not look good. .

                The idea in AV1 is that you the player of the end user will recreate the grain like a filter over the video instead of the codec itself that will waste quite a lot bitrate at nearly random noise

                Comment


                • #18
                  Originally posted by Toggleton View Post

                  And if you reescode the noise/grain can hide a lot of banding that would be obvious on the denoised output Cleaned up does sometimes really not look good. .
                  Banding should be a thing of the past, since 10-bit should be standard in AV1. And modern devices should dither the odd 8 bit video mistakenly made video.



                  That aside, I have mixed feelings about film grain. But grain synthesis is useful outside of that.

                  Comment


                  • #19
                    FYI to all - Sisvel posted a patent list for AV1 and VP9 back in May 2020. https://www.sisvel.com/blog/audio-vi...9-patent-pools

                    The list is here. https://www.sisvel.com/images/docume...ntList_AV1.pdf

                    Comment


                    • #20
                      Originally posted by brucethemoose View Post
                      I know AI gets thrown around as a buzzword and slapped on things that don't deserve the term, but there are stages of the AV1 pipeline that it could really dramatically help.
                      Well, when "AI" gets involved with compression, we can probably expect shiny pictures after decompression that may just not resemble the semantics of the original image. Like, in a much simpler way, the infamous Xerox compression did.
                      We already see this in video games, were artificial neural networks are used to "guess" details that just are not there in an to-be-upscaled rendered image. Often those "guessed" details are plausible, sometimes they are just nonsense.
                      For a video game output, adding some nonsense to a picture is not as problematic though, then let's say guessing stuff in a war crime video.

                      Comment

                      Working...
                      X