
by Christos G. Bampis, Li-Heng Chen and Zhi Li
If you end up binge-watching the newest season of Stranger Issues or Ozark, we try to ship the very best video high quality to your eyes. To take action, we constantly push the boundaries of streaming video high quality and leverage the most effective video applied sciences. For instance, we put money into next-generation, royalty-free codecs and complex video encoding optimizations. Just lately, we added one other highly effective software to our arsenal: neural networks for video downscaling. On this tech weblog, we describe how we improved Netflix video high quality with neural networks, the challenges we confronted and what lies forward.
There are, roughly talking, two steps to encode a video in our pipeline:
- Video preprocessing, which encompasses any transformation utilized to the high-quality supply video previous to encoding. Video downscaling is probably the most pertinent instance herein, which tailors our encoding to display resolutions of various units and optimizes image high quality underneath various community circumstances. With video downscaling, a number of resolutions of a supply video are produced. For instance, a 4K supply video will probably be downscaled to 1080p, 720p, 540p and so forth. That is sometimes completed by a standard resampling filter, like Lanczos.
- Video encoding utilizing a standard video codec, like AV1. Encoding drastically reduces the quantity of video knowledge that must be streamed to your gadget, by leveraging spatial and temporal redundancies that exist in a video.
We recognized that we are able to leverage neural networks (NN) to enhance Netflix video high quality, by changing typical video downscaling with a neural network-based one. This method, which we dub “deep downscaler,” has a couple of key benefits:
- A discovered method for downscaling can enhance video high quality and be tailor-made to Netflix content material.
- It may be built-in as a drop-in resolution, i.e., we don’t want every other adjustments on the Netflix encoding facet or the shopper gadget facet. Tens of millions of units that assist Netflix streaming mechanically profit from this resolution.
- A definite, NN-based, video processing block can evolve independently, be used past video downscaling and be mixed with completely different codecs.
In fact, we consider within the transformative potential of NN all through video purposes, past video downscaling. Whereas typical video codecs stay prevalent, NN-based video encoding instruments are flourishing and shutting the efficiency hole when it comes to compression effectivity. The deep downscaler is our pragmatic method to bettering video high quality with neural networks.
The deep downscaler is a neural community structure designed to enhance the end-to-end video high quality by studying a higher-quality video downscaler. It consists of two constructing blocks, a preprocessing block and a resizing block. The preprocessing block goals to prefilter the video sign previous to the next resizing operation. The resizing block yields the lower-resolution video sign that serves as enter to an encoder. We employed an adaptive community design that’s relevant to the big variety of resolutions we use for encoding.
Throughout coaching, our aim is to generate the most effective downsampled illustration such that, after upscaling, the imply squared error is minimized. Since we can’t immediately optimize for a standard video codec, which is non-differentiable, we exclude the impact of lossy compression within the loop. We concentrate on a strong downscaler that’s educated given a standard upscaler, like bicubic. Our coaching method is intuitive and leads to a downscaler that’s not tied to a particular encoder or encoding implementation. Nonetheless, it requires an intensive analysis to exhibit its potential for broad use for Netflix encoding.
The aim of the deep downscaler is to enhance the end-to-end video high quality for the Netflix member. By our experimentation, involving goal measurements and subjective visible assessments, we discovered that the deep downscaler improves high quality throughout numerous typical video codecs and encoding configurations.
For instance, for VP9 encoding and assuming a bicubic upscaler, we measured a median VMAF Bjøntegaard-Delta (BD) rate achieve of ~5.4% over the normal Lanczos downscaling. We’ve got additionally measured a ~4.4% BD fee achieve for VMAF-NEG. We showcase an instance consequence from one in every of our Netflix titles under. The deep downscaler (purple factors) delivered greater VMAF at comparable bitrate or yielded comparable VMAF scores at a decrease bitrate.
Moreover goal measurements, we additionally carried out human topic research to validate the visible enhancements of the deep downscaler. In our preference-based visible assessments, we discovered that the deep downscaler was most popular by ~77% of check topics, throughout a variety of encoding recipes and upscaling algorithms. Topics reported a greater element preservation and sharper visible look. A visible instance is proven under.
We additionally carried out A/B testing to know the general streaming affect of the deep downscaler, and detect any gadget playback points. Our A/B assessments confirmed QoE enhancements with none antagonistic streaming affect. This exhibits the good thing about deploying the deep downscaler for all units streaming Netflix, with out playback dangers or high quality degradation for our members.
Given our scale, making use of neural networks can result in a major enhance in encoding prices. So as to have a viable resolution, we took a number of steps to enhance effectivity.
- The neural community structure was designed to be computationally environment friendly and in addition keep away from any detrimental visible high quality affect. For instance, we discovered that just some neural community layers have been enough for our wants. To scale back the enter channels even additional, we solely apply NN-based scaling on luma and scale chroma with a regular Lanczos filter.
- We applied the deep downscaler as an FFmpeg-based filter that runs along with different video transformations, like pixel format conversions. Our filter can run on each CPU and GPU. On a CPU, we leveraged oneDnn to additional cut back latency.
The Encoding Applied sciences and Media Cloud Engineering groups at Netflix have collectively innovated to carry Cosmos, our next-generation encoding platform, to life. Our deep downscaler effort was a wonderful alternative to showcase how Cosmos can drive future media innovation at Netflix. The next diagram exhibits a top-down view of how the deep downscaler was built-in inside a Cosmos encoding microservice.
A Cosmos encoding microservice can serve a number of encoding workflows. For instance, a service may be referred to as to carry out complexity evaluation for a high-quality enter video, or generate encodes meant for the precise Netflix streaming. Inside a service, a Stratum perform is a serverless layer devoted to operating stateless and computationally-intensive features. Inside a Stratum perform invocation, our deep downscaler is utilized previous to encoding. Fueled by Cosmos, we are able to leverage the underlying Titus infrastructure and run the deep downscaler on all our multi-CPU/GPU environments at scale.
The deep downscaler paves the trail for extra NN purposes for video encoding at Netflix. However our journey just isn’t completed but and we try to enhance and innovate. For instance, we’re finding out a couple of different use circumstances, corresponding to video denoising. We’re additionally extra environment friendly options to making use of neural networks at scale. We’re concerned with how NN-based instruments can shine as a part of next-generation codecs. On the finish of the day, we’re keen about utilizing new applied sciences to enhance Netflix video high quality. In your eyes solely!
We wish to acknowledge the next people for his or her assist with the deep downscaler challenge:
Lishan Zhu, Liwei Guo, Aditya Mavlankar, Kyle Swanson and Anush Moorthy (Video Picture and Encoding staff), Mariana Afonso and Lukas Krasula (Video Codecs and High quality staff), Ameya Vasani (Media Cloud Engineering staff), Prudhvi Kumar Chaganti (Streaming Encoding Pipeline staff), Chris Pham and Andy Rhines (Information Science and Engineering staff), Amer Ather (Netflix efficiency staff), the Netflix Metaflow staff and Prof. Alan Bovik (College of Texas at Austin).