TY - JOUR
T1 - ICN : a normalization method for gene expression data considering the over-expression of informative genes
AU - CHENG, Lixin
AU - WANG, Xuan
AU - WONG, Pak-Kan
AU - LEE, Kwan-Yeung
AU - LI, Le
AU - XU, Bin
AU - WANG, Dong
AU - LEUNG, Kwong-Sak
PY - 2016
Y1 - 2016
N2 - The global increase of gene expression has been frequently established in cancer microarray studies. However, many genes may not deliver informative signals for a given experiment, due to insufficient expression or even non-expression, despite the DNA microarrays massively measuring genes in parallel. Hence the informative gene set, rather than the whole genome, should be more reasonable to represent the genome expression level. We observed that the trend of over-expression for informative genes is more obvious in human cancers, which is to some extent masked using the whole genome without any filtering. Accordingly we proposed a novel normalization method, Informative CrossNorm (ICN), which performs the cross normalization (CrossNorm) on the expression matrix merely containing the informative genes. ICN outperforms other methods with a consistently high precision, F-score, and Matthews correlation coefficient as well as an acceptable recall based on three available spiked-in datasets with ground truth. In addition, nine potential therapeutic target genes for esophageal squamous cell carcinoma (ESCC) were identified using ICN integrated with a protein-protein interaction network, which biologically demonstrates that ICN shows superior performance. Consequently, it is expected that ICN could be applied routinely in cancer microarray studies.
AB - The global increase of gene expression has been frequently established in cancer microarray studies. However, many genes may not deliver informative signals for a given experiment, due to insufficient expression or even non-expression, despite the DNA microarrays massively measuring genes in parallel. Hence the informative gene set, rather than the whole genome, should be more reasonable to represent the genome expression level. We observed that the trend of over-expression for informative genes is more obvious in human cancers, which is to some extent masked using the whole genome without any filtering. Accordingly we proposed a novel normalization method, Informative CrossNorm (ICN), which performs the cross normalization (CrossNorm) on the expression matrix merely containing the informative genes. ICN outperforms other methods with a consistently high precision, F-score, and Matthews correlation coefficient as well as an acceptable recall based on three available spiked-in datasets with ground truth. In addition, nine potential therapeutic target genes for esophageal squamous cell carcinoma (ESCC) were identified using ICN integrated with a protein-protein interaction network, which biologically demonstrates that ICN shows superior performance. Consequently, it is expected that ICN could be applied routinely in cancer microarray studies.
UR - http://www.scopus.com/inward/record.url?scp=84988649851&partnerID=8YFLogxK
U2 - 10.1039/c6mb00386a
DO - 10.1039/c6mb00386a
M3 - Journal Article (refereed)
C2 - 27452923
AN - SCOPUS:84988649851
SN - 1742-206X
VL - 12
SP - 3057
EP - 3066
JO - Molecular BioSystems
JF - Molecular BioSystems
IS - 10
ER -