Abstract
Pathway similarity search plays a vital role in the post-genomics era. Unfortunately, pathway similarity search involves the graph isomorphism problem which is NP-complete. Therefore, efficient search algorithms are desirable. In this work, we propose a novel global pathway similarity search approach named ToBio, which considers both topological and biological features for effective global pathway similarity search. Specifically, as motivated from nature, various topological and biological features including subgraph signature similarities, sequence similarities, and gene ontology similarities are considered in ToBio. Since different features carry different functional importance and dependences, we report three schemes of ToBio using different sets of features. In addition, to enhance the existing search algorithms for rigorous comparisons, post-processing pipelines are also proposed to investigate how different features can contribute to the search performance. ToBio and other state-of-the-art methods are benchmarked on the gold-standard pathway datasets from three species; the results demonstrate the competitive edges of ToBio over the state-of-the-arts ranging from the topological aspects to the biological aspects. Case studies have been conducted to reveal mechanistic insights into the unique search performance of ToBio.
Original language | English |
---|---|
Pages (from-to) | 336-349 |
Journal | IEEE/ACM Transactions on Computational Biology and Bioinformatics |
Volume | 16 |
Issue number | 1 |
Early online date | 3 Nov 2017 |
DOIs | |
Publication status | Published - Jan 2019 |
Externally published | Yes |
Bibliographical note
supported by two grants from the Research Grants Council of the Hong Kong Special Administrative Region [CityU 21200816] and [CityU 11203217]. In addition, the work described in this paper was partially supported by a grant from City University of Hong Kong (CityU Project No 7200444/CS), an Amazon Web Service (AWS) Research Grant, and an Microsoft Azure Research Award.Keywords
- biological network
- BLAST score
- GO annotation
- Pathway
- random forest regression
- subgraph signature