Effect of influential observation in genomic prediction using LASSO diagnostic
297 / 202
Keywords:
GEBVs, Genomic prediction, Influential observation, LASSO, MSE, Prediction accuracyAbstract
Detection of influential observation is one of the crucial steps of pre-processing to identify suspicious elements of data that may be due to error or some other unknown source. Several statistical measures are developed for detection of influential observation but still challenges are there to detect a true influential observation for high dimension data like gene expression, genotyping data. In this article we have demonstrated the effect of influential observation on genomic prediction accuracy by using recently proposed LASSO diagnostic, i.e. Df-Model, Df-Regpath, Df-Cvpath, Df-Lambda and Influence-LASSO. The effect of influential observation on genomic prediction accuracy was explored by observing the change in estimated and true accuracies for dataset with and without influential observation scenario. For this purpose we have used wheat and maize datasets which are available in public domain. It has been observed that influential observation had significant effects on the genomic prediction accuracy. In this study it has been shown that by implementing efficient diagnostic measure for influential observation detection, accuracy of genomic prediction can be improved.Downloads
References
Belsley D A, Kuh E and Welsch R E. 1980. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. New York: Wiley. DOI: https://doi.org/10.1002/0471725153
Cook R D. 1977. Detection of influential observation in linear regression. Technometrics 19: 15–18. DOI: https://doi.org/10.1080/00401706.1977.10489493
Cook R D. 1979. Influential observations in linear regression. Journal of the American Statistical Association 74: 169–74. DOI: https://doi.org/10.1080/01621459.1979.10481634
Crossa J, De Los Campos G, Pérez P, Gianola D and Burgueno J. 2010. Prediction of genetic values of quantitative traitsin plant breeding using pedigree and molecular markers. Genetics 186: 713–24. DOI: https://doi.org/10.1534/genetics.110.118521
Cuevas J, Crossa J, Soberanis V, Perez-Elizalde S and Perez- Rodríguez P. 2016. Genomic prediction of genotype × environment interaction kernel regression models. Plant Genome 9: 1–12. DOI: https://doi.org/10.3835/plantgenome2016.03.0024
Geert V and Geert M. 2000. Linear Mixed Models for Longitudinal Data. Springer Series in Statistics. DOI:10.1007/978-1-4419- 0300-6.
Loy A, Hofmann H and Cook D. 2017. Model choice and diagnostics for linear mixed-effects models using statistics on street corners. Journal of Computational and Graphical Statistics 26: 478–92. DOI: https://doi.org/10.1080/10618600.2017.1330207
Meuwissen T H E, Hayes B J and Goddard M E. 2001. Prediction of total genetic value using genome-wide dense marker maps. Genetics 157: 1819–29. DOI: https://doi.org/10.1093/genetics/157.4.1819
Pena D. 2005. A new statistic for influence in linear regression. Technometrics 47: 1–12. DOI: https://doi.org/10.1198/004017004000000662
R Development Core Team. 2019. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/.
Rajaratnam B, Roberts S, Sparks D and Yu H. 2019. Influence diagnostics for high-dimensional LASSO regression. Journal of Computational and Graphical Statistics 28(4): 877-90.DOI: 10.1080/10618600.2019.1598869. DOI: https://doi.org/10.1080/10618600.2019.1598869
Tibshirani R. 1996. Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society 58: 267–88. DOI: https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
Wang T and Li Z. 2016. Outlier detection in high-dimensional regression model.Communications in Statistics—Theory and Methods 46: 6947–58. DOI: https://doi.org/10.1080/03610926.2016.1140783
Zhao J, Leng C, Li L and Wang H. 2013. High-dimensional influence measure. Annals of Statistics 41: 2639–67. DOI: https://doi.org/10.1214/13-AOS1165
Downloads
Submitted
Published
Issue
Section
License
Copyright (c) 2020 The Indian Journal of Agricultural Sciences

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The copyright of the articles published in The Indian Journal of Agricultural Sciences is vested with the Indian Council of Agricultural Research, which reserves the right to enter into any agreement with any organization in India or abroad, for reprography, photocopying, storage and dissemination of information. The Council has no objection to using the material, provided the information is not being utilized for commercial purposes and wherever the information is being used, proper credit is given to ICAR.