In this paper, we propose the complexity configurable learning-based genome data compression method, in an effort to achieve a good balance between coding complexity and performance in lossless DNA compression. In particular, we first introduce the concept of Group of Bases (GoB), which serves as the foundation and enables the parallel implementation of the learning-based genome data compression. Subsequently, the Markov model is introduced for modeling the initial content, and the learning-based inference is achieved for the remaining base data. The compression is finally achieved with efficient arithmetic coding, and based upon a set of configurations on compression ratios and inference speed, the proposed method is shown to be more efficient and provide more flexibility in real-world applications.
|Title of host publication||Proceedings of the Picture Coding Symposium|
|Publication status||Published - 2021|
|Event||2021 Picture Coding Symposium (PCS) - Bristol, United Kingdom|
Duration: 29 Jun 2021 → 2 Jul 2021
|Symposium||2021 Picture Coding Symposium (PCS)|
|Period||29/06/21 → 2/07/21|
Bibliographical noteThis work is supported in part by the National Natural Science Foundation of China under Grant 62022002 and in part by the Hong Kong ITF 9440264 (MHP/087/19).
- Deep learning
- Genome compression
- Markov model
- Parallel implementation