Complexity-Configurable Learning-based Genome Compression

Zhenhao SUN, Meng WANG, Shiqi WANG, Sam KWONG

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Researchpeer-review

1 Citation (Scopus)

Abstract

In this paper, we propose the complexity configurable learning-based genome data compression method, in an effort to achieve a good balance between coding complexity and performance in lossless DNA compression. In particular, we first introduce the concept of Group of Bases (GoB), which serves as the foundation and enables the parallel implementation of the learning-based genome data compression. Subsequently, the Markov model is introduced for modeling the initial content, and the learning-based inference is achieved for the remaining base data. The compression is finally achieved with efficient arithmetic coding, and based upon a set of configurations on compression ratios and inference speed, the proposed method is shown to be more efficient and provide more flexibility in real-world applications.
Original languageEnglish
Title of host publicationProceedings of the Picture Coding Symposium
PublisherIEEE
Pages241-245
ISBN (Print)9781665425452
DOIs
Publication statusPublished - 2021
Externally publishedYes
Event2021 Picture Coding Symposium (PCS) - Bristol, United Kingdom
Duration: 29 Jun 20212 Jul 2021

Symposium

Symposium2021 Picture Coding Symposium (PCS)
Country/TerritoryUnited Kingdom
CityBristol
Period29/06/212/07/21

Bibliographical note

This work is supported in part by the National Natural Science Foundation of China under Grant 62022002 and in part by the Hong Kong ITF 9440264 (MHP/087/19).

Keywords

  • Deep learning
  • Genome compression
  • Markov model
  • Parallel implementation

Fingerprint

Dive into the research topics of 'Complexity-Configurable Learning-based Genome Compression'. Together they form a unique fingerprint.

Cite this