Genetic algorithm for dimer-led and error-restricted spaced motif discovery

Tak Ming CHAN, Leung Yau LO, Man Leung WONG, Yong LIANG, Kwong Sak LEUNG

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Researchpeer-review

Abstract

DNA motif discovery is an important problem for deciphering protein-DNA bindings in gene regulation. To discover generic spaced motifs which have multiple conserved patterns separated by wild-cards called spacers, the genetic algorithm (GA) based GASMEN has been proposed and shown to outperform related methods. However, the over-generic modeling of any number of spacers increases the optimization difficulty in practice. In protein-DNA binding case studies, complicated spaced motifs are rare while dimers with single spacers are more common spaced motifs. Moreover, errors (mismatches) in a conserved pattern are not arbitrarily distributed as certain highly conserved nucleotides are essential to maintain bindings. Motivated by better optimization in real applications, we have developed a new method, which is GA for Dimer-led and Error-restricted Spaced Motifs (GADESM). Common spaced motifs are paid special attention to using dimer-led initialization in the population initialization. The results on real datasets show that the dimer-led initialization in GADESM achieves better fitness than GASMEN with statistical significance. With additional error-restricted motif occurrence retrieval, GADESM has shown better performance than GASMEN on both comprehensive simulation data and a real ChIP-seq case study.
Original languageEnglish
Title of host publicationProceedings of the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2013 - 2013 IEEE Symposium Series on Computational Intelligence, SSCI 2013
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages198-205
Number of pages8
ISBN (Print)9781467358750
DOIs
Publication statusPublished - 12 Sep 2013

Fingerprint

Dimers
Lead
Genetic algorithms
DNA
Proteins
Nucleotides
Gene expression

Bibliographical note

Paper presented at the 10th Annual IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), Apr 16-19, 2013, Singapore.

Cite this

CHAN, T. M., LO, L. Y., WONG, M. L., LIANG, Y., & LEUNG, K. S. (2013). Genetic algorithm for dimer-led and error-restricted spaced motif discovery. In Proceedings of the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2013 - 2013 IEEE Symposium Series on Computational Intelligence, SSCI 2013 (pp. 198-205). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CIBCB.2013.6595409
CHAN, Tak Ming ; LO, Leung Yau ; WONG, Man Leung ; LIANG, Yong ; LEUNG, Kwong Sak. / Genetic algorithm for dimer-led and error-restricted spaced motif discovery. Proceedings of the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2013 - 2013 IEEE Symposium Series on Computational Intelligence, SSCI 2013. Institute of Electrical and Electronics Engineers Inc., 2013. pp. 198-205
@inproceedings{b372bec136b54ed0953ff036b5842a13,
title = "Genetic algorithm for dimer-led and error-restricted spaced motif discovery",
abstract = "DNA motif discovery is an important problem for deciphering protein-DNA bindings in gene regulation. To discover generic spaced motifs which have multiple conserved patterns separated by wild-cards called spacers, the genetic algorithm (GA) based GASMEN has been proposed and shown to outperform related methods. However, the over-generic modeling of any number of spacers increases the optimization difficulty in practice. In protein-DNA binding case studies, complicated spaced motifs are rare while dimers with single spacers are more common spaced motifs. Moreover, errors (mismatches) in a conserved pattern are not arbitrarily distributed as certain highly conserved nucleotides are essential to maintain bindings. Motivated by better optimization in real applications, we have developed a new method, which is GA for Dimer-led and Error-restricted Spaced Motifs (GADESM). Common spaced motifs are paid special attention to using dimer-led initialization in the population initialization. The results on real datasets show that the dimer-led initialization in GADESM achieves better fitness than GASMEN with statistical significance. With additional error-restricted motif occurrence retrieval, GADESM has shown better performance than GASMEN on both comprehensive simulation data and a real ChIP-seq case study.",
author = "CHAN, {Tak Ming} and LO, {Leung Yau} and WONG, {Man Leung} and Yong LIANG and LEUNG, {Kwong Sak}",
note = "Paper presented at the 10th Annual IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), Apr 16-19, 2013, Singapore.",
year = "2013",
month = "9",
day = "12",
doi = "10.1109/CIBCB.2013.6595409",
language = "English",
isbn = "9781467358750",
pages = "198--205",
booktitle = "Proceedings of the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2013 - 2013 IEEE Symposium Series on Computational Intelligence, SSCI 2013",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

CHAN, TM, LO, LY, WONG, ML, LIANG, Y & LEUNG, KS 2013, Genetic algorithm for dimer-led and error-restricted spaced motif discovery. in Proceedings of the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2013 - 2013 IEEE Symposium Series on Computational Intelligence, SSCI 2013. Institute of Electrical and Electronics Engineers Inc., pp. 198-205. https://doi.org/10.1109/CIBCB.2013.6595409

Genetic algorithm for dimer-led and error-restricted spaced motif discovery. / CHAN, Tak Ming; LO, Leung Yau; WONG, Man Leung; LIANG, Yong; LEUNG, Kwong Sak.

Proceedings of the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2013 - 2013 IEEE Symposium Series on Computational Intelligence, SSCI 2013. Institute of Electrical and Electronics Engineers Inc., 2013. p. 198-205.

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Researchpeer-review

TY - GEN

T1 - Genetic algorithm for dimer-led and error-restricted spaced motif discovery

AU - CHAN, Tak Ming

AU - LO, Leung Yau

AU - WONG, Man Leung

AU - LIANG, Yong

AU - LEUNG, Kwong Sak

N1 - Paper presented at the 10th Annual IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), Apr 16-19, 2013, Singapore.

PY - 2013/9/12

Y1 - 2013/9/12

N2 - DNA motif discovery is an important problem for deciphering protein-DNA bindings in gene regulation. To discover generic spaced motifs which have multiple conserved patterns separated by wild-cards called spacers, the genetic algorithm (GA) based GASMEN has been proposed and shown to outperform related methods. However, the over-generic modeling of any number of spacers increases the optimization difficulty in practice. In protein-DNA binding case studies, complicated spaced motifs are rare while dimers with single spacers are more common spaced motifs. Moreover, errors (mismatches) in a conserved pattern are not arbitrarily distributed as certain highly conserved nucleotides are essential to maintain bindings. Motivated by better optimization in real applications, we have developed a new method, which is GA for Dimer-led and Error-restricted Spaced Motifs (GADESM). Common spaced motifs are paid special attention to using dimer-led initialization in the population initialization. The results on real datasets show that the dimer-led initialization in GADESM achieves better fitness than GASMEN with statistical significance. With additional error-restricted motif occurrence retrieval, GADESM has shown better performance than GASMEN on both comprehensive simulation data and a real ChIP-seq case study.

AB - DNA motif discovery is an important problem for deciphering protein-DNA bindings in gene regulation. To discover generic spaced motifs which have multiple conserved patterns separated by wild-cards called spacers, the genetic algorithm (GA) based GASMEN has been proposed and shown to outperform related methods. However, the over-generic modeling of any number of spacers increases the optimization difficulty in practice. In protein-DNA binding case studies, complicated spaced motifs are rare while dimers with single spacers are more common spaced motifs. Moreover, errors (mismatches) in a conserved pattern are not arbitrarily distributed as certain highly conserved nucleotides are essential to maintain bindings. Motivated by better optimization in real applications, we have developed a new method, which is GA for Dimer-led and Error-restricted Spaced Motifs (GADESM). Common spaced motifs are paid special attention to using dimer-led initialization in the population initialization. The results on real datasets show that the dimer-led initialization in GADESM achieves better fitness than GASMEN with statistical significance. With additional error-restricted motif occurrence retrieval, GADESM has shown better performance than GASMEN on both comprehensive simulation data and a real ChIP-seq case study.

UR - http://commons.ln.edu.hk/sw_master/6584

U2 - 10.1109/CIBCB.2013.6595409

DO - 10.1109/CIBCB.2013.6595409

M3 - Conference paper (refereed)

SN - 9781467358750

SP - 198

EP - 205

BT - Proceedings of the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2013 - 2013 IEEE Symposium Series on Computational Intelligence, SSCI 2013

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

CHAN TM, LO LY, WONG ML, LIANG Y, LEUNG KS. Genetic algorithm for dimer-led and error-restricted spaced motif discovery. In Proceedings of the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2013 - 2013 IEEE Symposium Series on Computational Intelligence, SSCI 2013. Institute of Electrical and Electronics Engineers Inc. 2013. p. 198-205 https://doi.org/10.1109/CIBCB.2013.6595409