Complexity-Configurable Learning-based Genome Compression

Zhenhao SUN, Meng WANG, Shiqi WANG, Sam KWONG

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Researchpeer-review

1 Citation (Scopus)


In this paper, we propose the complexity configurable learning-based genome data compression method, in an effort to achieve a good balance between coding complexity and performance in lossless DNA compression. In particular, we first introduce the concept of Group of Bases (GoB), which serves as the foundation and enables the parallel implementation of the learning-based genome data compression. Subsequently, the Markov model is introduced for modeling the initial content, and the learning-based inference is achieved for the remaining base data. The compression is finally achieved with efficient arithmetic coding, and based upon a set of configurations on compression ratios and inference speed, the proposed method is shown to be more efficient and provide more flexibility in real-world applications.
Original languageEnglish
Title of host publicationProceedings of the Picture Coding Symposium
ISBN (Print)9781665425452
Publication statusPublished - 2021
Externally publishedYes
Event2021 Picture Coding Symposium (PCS) - Bristol, United Kingdom
Duration: 29 Jun 20212 Jul 2021


Symposium2021 Picture Coding Symposium (PCS)
Country/TerritoryUnited Kingdom

Bibliographical note

This work is supported in part by the National Natural Science Foundation of China under Grant 62022002 and in part by the Hong Kong ITF 9440264 (MHP/087/19).


  • Deep learning
  • Genome compression
  • Markov model
  • Parallel implementation


Dive into the research topics of 'Complexity-Configurable Learning-based Genome Compression'. Together they form a unique fingerprint.

Cite this