An information-based sequence distance and its application to whole mitochondrial genome phylogeny

Ming LI, Jonathan H. BADGER, Xin CHEN, Sam KWONG, Paul KEARNEY, Haoyong ZHANG

Research output: Journal PublicationsJournal Article (refereed)peer-review

431 Citations (Scopus)

Abstract

Motivation: Traditional sequence distances require an alignment and therefore are not directly applicable to the problem of whole genome phylogeny where events such as rearrangements make full length alignments impossible. We present a sequence distance that works on unaligned sequences using the information theoretical concept of Kolmogorov complexity and a program to estimate this distance. Results: We establish the mathematical foundations of our distance and illustrate its use by constructing a phylogeny of the Eutherian orders using complete unaligned mitochondrial genomes. This phylogeny is consistent with the commonly accepted one for the Eutherians. A second, larger mammalian dataset is also analyzed, yielding a phylogeny generally consistent with the commonly accepted one for the mammals.
Original languageEnglish
Pages (from-to)149-154
JournalBioinformatics
Volume17
Issue number2
DOIs
Publication statusPublished - 2001
Externally publishedYes

Bibliographical note

We thank Ford Doolittle, Brian Golding, Masami Hasegawa, and Huaichun Wang for providing very useful information, and Tao Jiang, Pavel Pevzner, David Sankoff and Huaichum Wang. We especially thank Paul Vitányi, who pointed out several loopholes in an earlier version of this paper, and suggested the new formulation of Theorem 2. A referee suggested Reyes et al. (2000) data to us. J.H.B. was supported by a CITO grant. X.C., S.K., and M.L. were supported in part by a CityU research grant 7000 875, P.K. was supported by NSERC Research Grant 160321 and a CITO grant, M.L. was also supported in part by NSERC Research Grant OGP0046506, a CITO grant, and an NSERC Steacie Fellowship. H.Z. was supported by NSERC Research Grants OGP0046506 and 160321.

Fingerprint

Dive into the research topics of 'An information-based sequence distance and its application to whole mitochondrial genome phylogeny'. Together they form a unique fingerprint.

Cite this