Abstract
In this paper, we propose the complexity configurable learning-based genome data compression method, in an effort to achieve a good balance between coding complexity and performance in lossless DNA compression. In particular, we first introduce the concept of Group of Bases (GoB), which serves as the foundation and enables the parallel implementation of the learning-based genome data compression. Subsequently, the Markov model is introduced for modeling the initial content, and the learning-based inference is achieved for the remaining base data. The compression is finally achieved with efficient arithmetic coding, and based upon a set of configurations on compression ratios and inference speed, the proposed method is shown to be more efficient and provide more flexibility in real-world applications.
Original language | English |
---|---|
Title of host publication | Picture Coding Symposium Proceedings: PCS2021 Bristol |
Publisher | IEEE |
Pages | 241-245 |
ISBN (Electronic) | 9781665425452 |
ISBN (Print) | 9781665425452 |
DOIs | |
Publication status | Published - 2021 |
Externally published | Yes |
Event | 2021 Picture Coding Symposium (PCS) - Bristol, United Kingdom Duration: 29 Jun 2021 → 2 Jul 2021 |
Symposium
Symposium | 2021 Picture Coding Symposium (PCS) |
---|---|
Country/Territory | United Kingdom |
City | Bristol |
Period | 29/06/21 → 2/07/21 |
Bibliographical note
Publisher Copyright:© 2021 IEEE.
Funding
This work is supported in part by the National Natural Science Foundation of China under Grant 62022002 and in part by the Hong Kong ITF 9440264 (MHP/087/19).
Keywords
- Deep learning
- Genome compression
- Markov model
- Parallel implementation