Painpoints in New Video Service Development
Currently, as videos are developing towards ultra HD, immersive experience and strong interactivity, many new video services are emerging in the media industry, for example, virtual reality (VR), multi-viewpoint video (MVV), and free-viewpoint video (FVV). Meanwhile, the pandemic has accelerated the adoption of online education, making it a must-have for many households. However, when launching these services, telecom operators still face many painpoints.
Painpoints for VR services include: lack of high-quality content, flat videos affecting immersive experience; high requirements for content transmission bandwidth; and high transmission latency, which may cause motion sickness.
Painpoints for MVV and FVV include: poor user experience when multiple views of the same content are displayed asynchronously; high cost and complicated deployment resulting from many cameras required by FVV; and difficulty in meeting users' personalized requirements with unified viewpoint direction.
Those for online education include difficultly in implementing bidirectional interaction on a TV screen; high network latency, severe packet loss and stutters; and high network distribution costs when there are open classes and large classes.
ZTE's Key technologies to Solve the Painpoints
ZTE has developed a series of key technologies to solve the above-mentioned painpoints.
Key Technologies for VR
For the lack of VR content, ZTE developed the image quality enhancement and 2D-to-3D video conversion technologies.
—Image quality enhancement: This technology divides video scenarios based on an understanding of the content, implements inter-frame algorithm and single-frame optimization algorithm respectively for each scenario based on deep neural networks, and then combines them with audio to generate the final video content. The technology covers all the important functions, including video repair, super-resolution, frame rate conversion, image denoising and sharpening, and color enhancement, and supports enhancement of live and VOD content with various formats, bitrates and frame rates. It has gained impressive achievements in CVPR and many other competitions at home and abroad.
—2D-to-3D video conversion: The traditional 2D-to-3D video conversion assigns depths to various regions in an image, then produces a parallax map, and finally synthesizes the left and right images. ZTE's smart depth perception and prediction network automatically and rapidly collects the depth information from the original images and videos to form a depth map, and then adopts the depth image based rendering (DIBR) algorithm to deal with various complicated scenarios such as smog and running water. In addition, it supports various formats such as left/right, up/down, and interleaved format, as well as various terminals such as VR HMDs, 2D/3D TVs, laptops, and smartphones. After the conversion, the quality indicators such as the overall image brightness, clarity, and sharpness of the object edge, have gained recognition from many operators and have been widely appreciated at many major exhibitions at home and abroad.
To meet the high bandwidth requirements for VR, ZTE proposed four key technologies to reduce the transmission bandwidth of UHD VR.
—FOV+: An entry-level (8K) VR content requires a transmission bandwidth of 120-150 Mbps, posing huge challenges on the access and transmission network, the server and client. The latest tile-based FOV transmission overlaps a low-quality background stream with a high-quality FOV stream, which can save about 50% of the bandwidth. The FOV+ technology, first proposed by ZTE, transmits images with a slightly larger FOV to cope with network and processing latency. If a user rotates head at a speed of 120 degree per second, the transmission of an additional 6 degrees of image in all directions can compensate for 50 ms RTT latency, which means an extra 20% of transmission bandwidth can be saved compared to the tile-based FOV solution.
—Region-wise packing: To avoid content gaps when the users rotates their head quickly, ZTE proposed the industry's first viewport dependent transmission solution based on region-wise packing. This solution processes the original omnidirectional spherical videos by using non-uniform mapping. In this way, when serving users, high quality is guaranteed within the viewport while low quality is provided in other regions, which reduces the overall bitrate.
—mABR+VR FOV: VR transmits live content based on HTTP unicast by using the four mainstream ABR protocols, including Apple HLS, Microsoft MSS, Adobe HDS, and MPEG DASH. Concurrency of massive VR users may cause considerable pressure on networks. The mABR+VR FOV transmission solution, first proposed by ZTE, overlaps the low-quality background stream transmitted through mABR multicast (originally transmitted through ABR unicast) with the high-quality FOV stream transmitted through unicast, significantly reducing the transmission pressure on networks and CDN nodes.
—Asymmetric stitching: Compared to 2D VR, 3D VR content requires twice the bandwidth for transmission. Based on the principle of mask effect (that is, when there is difference in the image quality perceived by the two eyes, sensory experience is determined by the side perceiving better quality), ZTE uses this characteristic to transmit high-quality content for the left eye and low-quality content for the right eye, which further reduces the performance requirements of 3D VR content on the transmission bandwidth, server and client.
To further reduce the end-to-end latency, ZTE proposed the AI-based viewpoint prediction technology. By predicting changes in users' viewpoints, this technology can download the corresponding content in advance and in parallel, so that users can experience a relatively lower MTP latency while the actual MTP latency is high. Currently, with the smart prediction algorithm, the time advanced is up to 80 ms, which is leading in the industry.
Key Technologies for MVV and FVV
ZTE proposed the SRT+RTP technology and AVC SEI technology to solve the frame-level synchronization problem for multiple channels of videos. Relying on coordination between the encoding and playing ends, it achieves frame-level synchronization for encoding and broadcasting of multiple channels of streams.
Considering the shooting and deployment complexity of the FVV service, ZTE innovatively proposed the surround virtual viewpoint synthesis. For the surround video capture scenario, according to the demultiplied camera images, this AI-based technology automatically and rapidly generates video content of any viewpoint, which guarantees smooth control and operation when users enable rotary viewing and bullet time while reducing the deployment complexity. This technology has successfully helped operators live broadcast large sporting events, for example, the wrestling program of the 2nd National Youth Games of China and the 2019 World Wushu Championships. It delivers a great user experience, significantly shortens the deployment and debugging time, and reduces the deployment cost.
ZTE also proposed the personalized video solution, which adopts technologies such as AI-powered facial recognition and comparison, automatic tracking, video analysis, and automatic editing to generate personalized multimedia content for each player or actor/actress, meeting the audience's needs to watch personalized content and spread it on social media.
Key Technologies for Online Education
ZTE launched an online education platform which supports display on the TV screen and a C200 AI STB in support of online education together with its accessories, making it more flexible to begin a class and realizing more interactive functions and a more intelligent teaching system. It solves the problem of two-way interactions on the TV screen and protects eyes from damages caused by small displays, opening a new chapter for telecom operators to develop online education on the TV screen.
To address the common and complicated problems of online education, for example, the difficulty in large-scale concurrency for the traditional MCU, high latency for ABR transmission, low transmission quality and stutters, ZTE launched the RTC-based interaction and distribution solution. This solution reduces the response latency by 30% compared with the average RTC latency in the industry, supports concurrency of massive users, and achieves seamless switching between online watching and real-time interaction. This system has already been successfully put into commercial use by China Mobile.
The CDN over bit index explicit replication (BIER) solution, first proposed by ZTE, efficiently addresses the severe shortage of live streams over the multicast protocol that telecom operators face when launching popular services such as online education, conference live streaming, video surveillance, live broadcast on social media, and e-commerce live streaming, helping them develop their own video services and the video services in the public cloud.
In the future, further development of the key technologies for AI, encoding and transmission will enable a better video experience and more diversified business models and applications.