Mixture distribution approach for identifying differentially expressed genes in microarray data of Arabidopsis thaliana

ARFA ANJUM; SEEMA JAGGI; ELDHO VARGHESE; SHWETANK LALL; ANIL RAI; ARPAN BHOWMIK; DWIJESH CHANDRA MISHRA; SARIKA SARIKA

doi:10.56093/ijas.v90i10.107977

Authors

ARFA ANJUM Ph D Scholar, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110 012, India
SEEMA JAGGI Head (DE), ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110 012, India
ELDHO VARGHESE Scientist, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110 012, India
SHWETANK LALL Ph D Scholar, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110 012, India
ANIL RAI Head (CABIN) and ADG (ICT), ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110 012, India
ARPAN BHOWMIK Scientist and corresponding author, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110 012, India
DWIJESH CHANDRA MISHRA Scientist, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110 012, India
SARIKA SARIKA Senior Scientist, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110 012, India

https://doi.org/10.56093/ijas.v90i10.107977

Keywords:

Differential gene expression, Microarray, Mixture distribution, Normal distribution

Abstract

The basic aim of analyzing gene expression data is to identify genes whose expression patterns differ in the treatment samples, with respect to the control or healthy samples. Microarray technology is a tool for analyzing simultaneous relative expression of thousands of genes within a particular cell population or tissue in a single experiment through the hybridization of RNA. Present paper deals with mixture distribution approach to investigate differentially expressed genes for sequence data of Arabidopsis thaliana under two conditions, salt-stressed and control. Two-component mixture normal model was fitted to the normalized data and the parameters were estimated using EM algorithm. Likelihood Ratio Test (LRT) was performed for testing goodness-of-fit. Fitting of two-component mixture normal model was found to be capable of capturing more variability as compared to single component normal distribution and was able to identify the differentially expressed genes more accurately.

Downloads

Download data is not yet available.

References

Anders S and Huber W. 2010. Differential expression analysis for sequence count data. Genome Biology 11(10): R106. DOI:10.1186/gb-2010-11-10-r106. DOI: https://doi.org/10.1186/gb-2010-11-10-r106

Anjum A, Jaggi S, Varghese E, Lall S, Bhowmik A and Rai A. 2016. Identification of differentially expressed genes in RNA-seq data of Arabidopsis thaliana: A compound distribution approach. Journal of Computational Biology 23(4): 239-47. DOI:10.1089/cmb.2015.0205. DOI: https://doi.org/10.1089/cmb.2015.0205

Benaglia T, Chauveau D, Hunter D and Young D. 2009. mixtools: An R package for analyzing finite mixture models. Journal of Statistical Software 32(6): 1-29.DOI:10.18637/jss.v032.i06 DOI: https://doi.org/10.18637/jss.v032.i06

Bonafede E, Picard F, Robin S and Viroli C. 2016. Modeling over dispersion heterogeneity in differential expression analysis using mixtures. Biometrics 72(3): 804-814.DOI: 10.1111/ biom.12458 DOI: https://doi.org/10.1111/biom.12458

Brazma A and Vilo J. 2000. Gene expression data analysis. FEBS Letters 480(1): 17-24. DOI: https://doi.org/10.1016/S0014-5793(00)01772-5

Jeffery I B, Higgins D G and Culhane A C. 2006. Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data. BMC Bioinformatics 7(1): 359. DOI: https://doi.org/10.1186/1471-2105-7-359

Karim R, Hossain P, Begum S and Hossain F. 2011. Rayleigh mixture distribution. Journal of Applied Mathematics. Article ID 238290, DOI:10.1155/2011/238290. DOI: https://doi.org/10.1155/2011/238290

Marioni J C, Mason C E, Mane S M, Stephens M and Gilad Y. 2008. RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Research 18(9): 1509-1517.DOI:10.1101/gr.079558.108. DOI: https://doi.org/10.1101/gr.079558.108

McLachlan G J, Bean R W and Peel D. 2002. A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 18(3): 413-22.DOI: 10.1093/ bioinformatics/18.3.413. DOI: https://doi.org/10.1093/bioinformatics/18.3.413

McLachlan G and Peel D. 2000. Finite Mixture Models. New York: Wiley. DOI: https://doi.org/10.1002/0471721182

Mortazavi A, Williams B A, McCue K, Schaeffer L and Wold B. 2008. Mapping and quantifying mammalian transcriptomes by RNA-seq. Nature Methods 5(7): 621-628. DOI:10.1038/ nmeth.1226. DOI: https://doi.org/10.1038/nmeth.1226

Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M and Snyder M. 2008. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320(5881):1344- 1349. DOI: 10.1126/science.1158441. DOI: https://doi.org/10.1126/science.1158441

Pearson K. 1895. Contributions to the mathematical theory of evolution. Philosophical Transactions of the Royal Society of London A. 185: 343-414. DOI: https://doi.org/10.1098/rsta.1895.0010

Yang Y, Tashman AP, Lee JY, Yoon S, Mao W, Ahn K, Kim W, Mendell N R, Gordon D and Finch S J. 2007. Mixture modeling of microarray gene expression data. BMC Proceedings 1(1): S50. DOI: https://doi.org/10.1186/1753-6561-1-S1-S50