Was there any progress with Daala's approach of lapped transforms?
Announcement
Collapse
No announcement yet.
AOMedia's "AVM" Repository Serves As Reference Implementation For Eventual AV1 Successor
Collapse
X
-
Originally posted by Mathias View Post
I agree that some next generation video codec will have an AI enhancement layer baked in. I guess this will improve the quality for very low bitrates significantly. IMO one of the biggest issues will be what metric to use to judge quality improvements. AI can generate good looking pictures that don't resemble the original well, how will the codec and the developers know something looks good, if it doesn't have to resemble the original perfectly? We need an AI based metric for that as well. Which then has the problem that an AI judges what an AI generates. So that metric has to be exceptionally good. So that needs a lot of training data...
That will of course also make comparisons to other codecs difficult. Because if one codec is tuned for a specific metric, it of course beats every other codec tuned to a different metric. This alone will give "a 300% improvement in image quality!!!!".
Also, resolution scaling and enhancement are obviously big ones, but other steps can benefit as well. Denoising video, and then generating that noise back, for instance, is something AI is already very good at, and that CPUs struggle to do quickly with traditional algorithms. Various other decisions, like deciding what data to throw away or what parameters to use based on the input, could also leverage trained neural nets.
This of course opens the question of hardware requirements. I can see a future where encoding AV2 on a CPU alone is basically impossible, and even decoding them on CPU may skip some quality enhancing steps... not that you *need* hardware acceleration for neural net implementations, but you do basically need it for certain steps.Last edited by brucethemoose; 06 May 2022, 04:52 PM.
Comment
-
Originally posted by cl333r View PostI imagine the biggest issue is to develop a next-gen solution while not stepping on any landmine in the giant and complicated patent minefield.
I think the solution to this is not to try to develop a codec that would not infringe on a single MPEG-LA patent which, for all we know, might be strictly impossible. The solution is to scare the MPEG-LA away. The AOMedia alliance is a benemoth and its own patent pool can be lethal to the MPEG-LA as much as the other way around. Mutually Assured Destruction: that's what stopped the Cold War from turning into a real war, and it's still the best known strategy against patent trolls.
- Likes 5
Comment
-
Originally posted by brucethemoose View PostIIRC there was talk of incorporating neural nets into parts of the next gen codec.
I know AI gets thrown around as a buzzword and slapped on things that don't deserve the term, but there are stages of the AV1 pipeline that it could really dramatically help.
You train that network with common Video blocks, so it learns to efficently encode common blocks in it´s 256 wide middle section.
Then you cut up the network, use the first part during encoding, store the 256 wide vector as the "encoded" block in the compressed video file and on the decoding side, you use the second half of the pre-trained network to decode that vector again to a 64x64x3 Matrix.
with a large enought network and sufficient training, the network will probably do things like encode gradients and do some color from luma encoding and so on. The thing is: You don´t have to think about that / which algorithms would be most efficient, it just "learns" it automatically during training.
You can even use networks for encoding motion / doing interframe encoding, by having a network which has like the last 3-6 frames as an input vector to the right-side layers of an auto-encoder network and having the current block as input to the left side and as an output.. In the bitstream just save the intermediate vector, during decoding the past frames are known as they are already decoded.
Such a network would probably encode motion vector like things in it´s intermediate layer.
To scale bandwidth <-> Quality just have multiple networks with different sizes of intermediate layers.
- Likes 1
Comment
-
Originally posted by shmerl View PostWas there any progress with Daala's approach of lapped transforms?
Even PVQ that is used in the Opus codec didn't fit in AV1 because the way that AV1 is structured made it a dozen times slower.
Although other experiments originated on Daala were successful like Chroma from Luma, CDEF, and Multi-symbol Entropy Coding.Last edited by juarezr; 06 May 2022, 09:29 PM.
Comment
-
Originally posted by bemerk View Postwhy would you want noise back that you cleaned up? Is that to avoid to clean and perfect surfaces?
The idea in AV1 is that you the player of the end user will recreate the grain like a filter over the video instead of the codec itself that will waste quite a lot bitrate at nearly random noise
Comment
-
Originally posted by Toggleton View Post
And if you reescode the noise/grain can hide a lot of banding that would be obvious on the denoised output Cleaned up does sometimes really not look good. .
That aside, I have mixed feelings about film grain. But grain synthesis is useful outside of that.
Comment
-
FYI to all - Sisvel posted a patent list for AV1 and VP9 back in May 2020. https://www.sisvel.com/blog/audio-vi...9-patent-pools
The list is here. https://www.sisvel.com/images/docume...ntList_AV1.pdf
Comment
-
Originally posted by brucethemoose View PostI know AI gets thrown around as a buzzword and slapped on things that don't deserve the term, but there are stages of the AV1 pipeline that it could really dramatically help.
We already see this in video games, were artificial neural networks are used to "guess" details that just are not there in an to-be-upscaled rendered image. Often those "guessed" details are plausible, sometimes they are just nonsense.
For a video game output, adding some nonsense to a picture is not as problematic though, then let's say guessing stuff in a war crime video.
- Likes 1
Comment
Comment