Abstract
Pathway analysis is a cornerstone of system biology. In particular, pathway similarity search plays a key role in establishing structural, functional, and evolutionary relationships between different biological entities. Given a query pathway as well as a database, a pathway similarity search aims to identify novel pathways that are homologous to the query pathway. Unfortunately, the pathway similarity search is computationally inefficient due to the NP-complete graph isomorphism problem. In this current study, we introduce a novel algorithmic framework for pathway similarity search, named PathEmb (Pathway Embedding), which is analogous to the Skip-gram model where each pathway is represented as a "document". PathEmb exploits a second order random walk strategy to explore diverse pathway patterns. All signaling paths traversed from random walks are regarded as "sentences", which are constituted as a "document" afterwards. Then, the "document" pattern for the individual pathway is mapped into a low-dimensional feature space for downstream tasks. Furthermore, PathEmb is a topology-free pathway similarity search algorithm, which is feasible to handle any pathway with arbitrary structure. We have extensively evaluated PathEmb and other cutting-edge methods on three pathway datasets. The experimental results demonstrate that PathEmb outperforms the existing methods in terms of computational efficiency and search accuracy. The source codes of PathEmb are freely available online https://github.com/zhangjiaobxy/PathEmb.
Original language | English |
---|---|
Pages (from-to) | 1329-1335 |
Journal | IEEE Journal of Biomedical and Health Informatics |
Volume | 23 |
Issue number | 3 |
Early online date | 27 Apr 2018 |
DOIs | |
Publication status | Published - May 2019 |
Externally published | Yes |
Funding
The work was supported by the Research Grants Council of the Hong Kong Special Administrative Region under Grants CityU 21200816 and CityU 11203217.
Keywords
- AdaBoost regression
- document embedding
- feature learning
- Global pathway search
- random walks