Abstract
Compactly representing the visual signals is of fundamental importance in various image/video-centered applications. Although numerous approaches were developed for improving the image and video coding performance by removing the redundancies within visual signals, much less work has been dedicated to the transformation of the visual signals to another well-established modality for better representation capability. In this paper, we propose a new scheme for visual signal representation that leverages the philosophy of transferable modality. In particular, the deep learning model, which characterizes and absorbs the statistics of the input scene with online training, could be efficiently represented in the sense of rate-utility optimization to serve as the enhancement layer in the bitstream. As such, the overall performance can be further guaranteed by optimizing the new modality incorporated. The proposed framework is implemented on the state-of-the-art video coding standard (i.e., versatile video coding), and significantly better representation capability has been observed based on extensive evaluations.
Original language | English |
---|---|
Title of host publication | MM '20: Proceedings of the 28th ACM International Conference on Multimedia |
Publisher | Association for Computing Machinery |
Pages | 3705-3714 |
Number of pages | 10 |
ISBN (Print) | 9781450379885 |
DOIs | |
Publication status | Published - Oct 2020 |
Externally published | Yes |
Event | 28th ACM International Conference on Multimedia (MM 2020) - Virtual, Seattle, United States Duration: 12 Oct 2020 → 16 Oct 2020 https://2020.acmmm.org/ |
Conference
Conference | 28th ACM International Conference on Multimedia (MM 2020) |
---|---|
Country/Territory | United States |
City | Seattle |
Period | 12/10/20 → 16/10/20 |
Internet address |
Funding
This work was supported in part by the Hong Kong RGC General Research Funds under Grant 9042322 (CityU 11200116), Grant 9042489 (CityU 11206317), and Grant 9042816 (CityU 11209819). This work was also supported in part by the Natural Science Foundation of China under Grant 61672443, Grant 61901459, and in part by China Postdoctoral Science Foundation under Grant 2019M653127.
Keywords
- deep learning
- deep learning model communication
- rate-utility optimization
- visual signal representation