Abstract
Original language | English |
---|---|
Title of host publication | Proceedings of the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2013 - 2013 IEEE Symposium Series on Computational Intelligence, SSCI 2013 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 198-205 |
Number of pages | 8 |
ISBN (Print) | 9781467358750 |
DOIs | |
Publication status | Published - 12 Sep 2013 |
Fingerprint
Bibliographical note
Paper presented at the 10th Annual IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), Apr 16-19, 2013, Singapore.Cite this
}
Genetic algorithm for dimer-led and error-restricted spaced motif discovery. / CHAN, Tak Ming; LO, Leung Yau; WONG, Man Leung; LIANG, Yong; LEUNG, Kwong Sak.
Proceedings of the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2013 - 2013 IEEE Symposium Series on Computational Intelligence, SSCI 2013. Institute of Electrical and Electronics Engineers Inc., 2013. p. 198-205.Research output: Book Chapters | Papers in Conference Proceedings › Conference paper (refereed) › Research › peer-review
TY - GEN
T1 - Genetic algorithm for dimer-led and error-restricted spaced motif discovery
AU - CHAN, Tak Ming
AU - LO, Leung Yau
AU - WONG, Man Leung
AU - LIANG, Yong
AU - LEUNG, Kwong Sak
N1 - Paper presented at the 10th Annual IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), Apr 16-19, 2013, Singapore.
PY - 2013/9/12
Y1 - 2013/9/12
N2 - DNA motif discovery is an important problem for deciphering protein-DNA bindings in gene regulation. To discover generic spaced motifs which have multiple conserved patterns separated by wild-cards called spacers, the genetic algorithm (GA) based GASMEN has been proposed and shown to outperform related methods. However, the over-generic modeling of any number of spacers increases the optimization difficulty in practice. In protein-DNA binding case studies, complicated spaced motifs are rare while dimers with single spacers are more common spaced motifs. Moreover, errors (mismatches) in a conserved pattern are not arbitrarily distributed as certain highly conserved nucleotides are essential to maintain bindings. Motivated by better optimization in real applications, we have developed a new method, which is GA for Dimer-led and Error-restricted Spaced Motifs (GADESM). Common spaced motifs are paid special attention to using dimer-led initialization in the population initialization. The results on real datasets show that the dimer-led initialization in GADESM achieves better fitness than GASMEN with statistical significance. With additional error-restricted motif occurrence retrieval, GADESM has shown better performance than GASMEN on both comprehensive simulation data and a real ChIP-seq case study.
AB - DNA motif discovery is an important problem for deciphering protein-DNA bindings in gene regulation. To discover generic spaced motifs which have multiple conserved patterns separated by wild-cards called spacers, the genetic algorithm (GA) based GASMEN has been proposed and shown to outperform related methods. However, the over-generic modeling of any number of spacers increases the optimization difficulty in practice. In protein-DNA binding case studies, complicated spaced motifs are rare while dimers with single spacers are more common spaced motifs. Moreover, errors (mismatches) in a conserved pattern are not arbitrarily distributed as certain highly conserved nucleotides are essential to maintain bindings. Motivated by better optimization in real applications, we have developed a new method, which is GA for Dimer-led and Error-restricted Spaced Motifs (GADESM). Common spaced motifs are paid special attention to using dimer-led initialization in the population initialization. The results on real datasets show that the dimer-led initialization in GADESM achieves better fitness than GASMEN with statistical significance. With additional error-restricted motif occurrence retrieval, GADESM has shown better performance than GASMEN on both comprehensive simulation data and a real ChIP-seq case study.
UR - http://commons.ln.edu.hk/sw_master/6584
U2 - 10.1109/CIBCB.2013.6595409
DO - 10.1109/CIBCB.2013.6595409
M3 - Conference paper (refereed)
SN - 9781467358750
SP - 198
EP - 205
BT - Proceedings of the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2013 - 2013 IEEE Symposium Series on Computational Intelligence, SSCI 2013
PB - Institute of Electrical and Electronics Engineers Inc.
ER -