Towards Modality Transferable Visual Information Representation with Optimal Model Compression

Rongqun LIN, Linwei ZHU, Shiqi WANG, Sam KWONG

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Researchpeer-review

1 Citation (Scopus)


Compactly representing the visual signals is of fundamental importance in various image/video-centered applications. Although numerous approaches were developed for improving the image and video coding performance by removing the redundancies within visual signals, much less work has been dedicated to the transformation of the visual signals to another well-established modality for better representation capability. In this paper, we propose a new scheme for visual signal representation that leverages the philosophy of transferable modality. In particular, the deep learning model, which characterizes and absorbs the statistics of the input scene with online training, could be efficiently represented in the sense of rate-utility optimization to serve as the enhancement layer in the bitstream. As such, the overall performance can be further guaranteed by optimizing the new modality incorporated. The proposed framework is implemented on the state-of-the-art video coding standard (i.e., versatile video coding), and significantly better representation capability has been observed based on extensive evaluations.
Original languageEnglish
Title of host publicationMM '20: Proceedings of the 28th ACM International Conference on Multimedia
PublisherAssociation for Computing Machinery
Number of pages10
ISBN (Print)9781450379885
Publication statusPublished - Oct 2020
Externally publishedYes
Event28th ACM International Conference on Multimedia (MM 2020) - Virtual, Seattle, United States
Duration: 12 Oct 202016 Oct 2020


Conference28th ACM International Conference on Multimedia (MM 2020)
Country/TerritoryUnited States
Internet address

Bibliographical note

This work was supported in part by the Hong Kong RGC General Research Funds under Grant 9042322 (CityU 11200116), Grant 9042489 (CityU 11206317), and Grant 9042816 (CityU 11209819). This work was also supported in part by the Natural Science Foundation of China under Grant 61672443, Grant 61901459, and in part by China Postdoctoral Science Foundation under Grant 2019M653127.


  • deep learning
  • deep learning model communication
  • rate-utility optimization
  • visual signal representation


Dive into the research topics of 'Towards Modality Transferable Visual Information Representation with Optimal Model Compression'. Together they form a unique fingerprint.

Cite this