Abstract
Based on the well-known k-mer model, we propose a k-mer natural vector model for representing a genetic sequence based on the numbers and distributions of k-mers in the sequence. We show that there exists a one-to-one correspondence between a genetic sequence and its associated k-mer natural vector. The k-mer natural vector method can be easily and quickly used to perform phylogenetic analysis of genetic sequences without requiring evolutionary models or human intervention. Whole or partial genomes can be handled more effective with our proposed method. It is applied to the phylogenetic analysis of genetic sequences, and the obtaining results fully demonstrate that the k-mer natural vector method is a very powerful tool for analysing and annotating genetic sequences and determining evolutionary relationships both in terms of accuracy and efficiency.
Original language | English |
---|---|
Pages (from-to) | 25-34 |
Number of pages | 10 |
Journal | Gene |
Volume | 546 |
Issue number | 1 |
DOIs | |
Publication status | Published - 1 Aug 2014 |
Externally published | Yes |
Funding
We thank Dr. Max Benson for critically reading and editing our manuscript. This work is supported by Youth Funding of Suihua University ( KQ1202004 , KQ1202002 ), Scientific Research Funding of Heilongjiang Education Department ( 12513097 ), U.S. NSF grant ( DMS-1120824 , 1119612 ), NIH grant ( 5 SC3 GM098180-04 ), China NSF grant ( 31271408 ), Tsinghua University start up funding, and Tsinghua University independent research project grant.
Keywords
- K-mer model
- Natural vector
- Phylogenetic analysis