Identification of genetic markers for increasing agricultural productivity: An empirical study
202 / 119
Keywords:
BLUP, Genomic Selection, LASSO, mRMR, QTL, Regression, SpAMAbstract
Genomic selection (GS) has been used globally for increasing agricultural production and productivity. It has been used for complex quantitative traits by selecting breeding material after predicting Genomic Estimated Breeding Values (GEBVs) of target species. The accuracy of GS for estimation of GEBVs depends on various factors including sampling population, genetic architecture of target species, statistical models, etc. The feature (marker) selection is one of the important steps in development of GS models. There are large numbers of models proposed in the literature for GS. However, applicability of these models is based on many factors including extent of additive and epistatic effects of breeding population. Therefore, there is strong need to evaluate the performance of these models and techniques of feature selection under different situations. In this study, performance of linear/additive effect models, viz. linear least squared regression, BLUP, LASSO, ridge regression, SpAM as well as non-linear/epistatic effect models, viz. mRMR, HSIC LASSO have been evaluated through a simulation study in R platform. In general, performance of SpAM was found to be superior for GS than all other models considered in this study in case of presence of additive effect and absence of epistatic effect. However, in case of low heritability and high epistatic effect the HSIC LASSO outperformed all models. This study will assist researcher in selection of appropriate feature selection technique for a given situation.Downloads
References
Ding C and Peng H. 2005. Minimum redundancy feature selection from microarray gene expression data. Journal of Bioinformatics and Computational Biology 3(2): 185–205. DOI: https://doi.org/10.1142/S0219720005001004
Endelman J B. 2011. Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4: 250–5. Friedman J, Hastie T and Tibshirani R. 2010. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33: 1–22. DOI: https://doi.org/10.18637/jss.v033.i01
Goeman J J. 2010. L1 penalized estimation in the Cox proportional hazards model. Biometrical Journal 52(1): 70–84. DOI: https://doi.org/10.1002/bimj.200900028
Gretton A, Bousquet O, Smola A and Scholkopf B. 2005. Measuring statistical dependence with Hilbert-Schmidt norms, pp 63–77. Algorithmic Learning Theory. Springer. DOI: https://doi.org/10.1007/11564089_7
Henderson C R. 1975. Best linear unbiased estimation and prediction under a selection model. Biometrics 31(2): 423–47. DOI: https://doi.org/10.2307/2529430
Hoerl A E and Kennard R W. 1970. Ridge regression: biased estimation for non-orthogonal problems. Technometrics 12: 55–67. DOI: https://doi.org/10.1080/00401706.1970.10488634
Hoerl A E and Kennard R W. 1970. Ridge regression: applications to non-orthogonal problems. Technometrics 12: 69–82. DOI: https://doi.org/10.1080/00401706.1970.10488635
Howard R, Carriquiry A L and Beavis W D. 2014. Parametric and nonparametric statistical methods for genomic selection of traits with additive and epistatic genetic architectures. G3 (Bethesda) 4(6): 1027–46. DOI: https://doi.org/10.1534/g3.114.010298
Jay N D, Cavanagh S P, Olsen C, Hachem N E, Bontempi G and Haibe-Kains B. 2013. mRMRe: an R package for parallelized mRMR ensemble feature selection. Bioinformatics 29(18): 2365–68. DOI: https://doi.org/10.1093/bioinformatics/btt383
Kao C H and Zeng Z B. 2002. Modeling epistasis of quantitative trait loci using Cockerham’s model. Genetics 160: 1243–61. DOI: https://doi.org/10.1093/genetics/160.3.1243
Liu H, Lafferty J and Wasserman L. 2009. Nonparametric regression and classification with joint sparsity constraints, pp 969–76. (In) Advances in Neural Information Processing Systems.
Meuwissen T H E, Hayes B J and Goddard M E. 2001. Prediction of total genetic value using genome-wide dense marker maps. Genetics 157: 1819–29. DOI: https://doi.org/10.1093/genetics/157.4.1819
Peng, H, Long, F and Ding, C. 2005. Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27: 1226–37. DOI: https://doi.org/10.1109/TPAMI.2005.159
R Core Team. 2017. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL: https://www.R-project.org/
Raskutti G, Wainwright M and Yu B. 2012. Minimax-optimal rates for sparse additive models over kernel classes via convex programming. Journal of Machine Learning Research 13: 389–427.
Ravikumar P, Lafferty J, Liu H and Wasserman L. 2009. Sparse additive models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71(5): 1009–30. DOI: https://doi.org/10.1111/j.1467-9868.2009.00718.x
Suzuki T and Sugiyama M. 2013. Fast learning rate of multiple kernel learning: Trade-off between sparsity and smoothness. The Annals of Statistics 41(3): 1381–405. DOI: https://doi.org/10.1214/13-AOS1095
Tibshirani R. 1996. Regression shrinkage and selection via the Lasso. Journal of Royal Statistical Society 58: 267–88. DOI: https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
Yamada M, Jitkrittum W, Sigal L, Xing E P and Sugiyama M. 2014. High-dimensional feature selection by feature-wise kernelized Lasso. Neural Computation 26: 185–207. DOI: https://doi.org/10.1162/NECO_a_00537
Yandell B S, Mehta T, Banerjee S, Shriner D, Venkataraman R et al. 2007. R/qtlbim: QTL with Bayesian Interval Mapping in experimental crosses. Bioinformatics 23: 641–43. DOI: https://doi.org/10.1093/bioinformatics/btm011
Yandell B S, Nengjun Y, Mehta T, Banerjee S, Shriner D et al. 2012. qtlbim: QTL Bayesian Interval Mapping. R package version 2.0.5. URL: http://CRAN.R-project.org/package=qtlbim
Zhao Z, Wang L and Li H. 2010. Efficient spectral feature selection with minimum redundancy, pp 673–78. (In) AAAI Conference on Artificial Intelligence. DOI: https://doi.org/10.1609/aaai.v24i1.7671
Zhao T, Li X, Liu H and Roeder K. 2014. SAM: Sparse Additive Modelling. R package version 1.0.5. URL: https://CRAN.R-project.org/package=SAM
Downloads
Submitted
Published
Issue
Section
License
Copyright (c) 2019 The Indian Journal of Agricultural Sciences

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The copyright of the articles published in The Indian Journal of Agricultural Sciences is vested with the Indian Council of Agricultural Research, which reserves the right to enter into any agreement with any organization in India or abroad, for reprography, photocopying, storage and dissemination of information. The Council has no objection to using the material, provided the information is not being utilized for commercial purposes and wherever the information is being used, proper credit is given to ICAR.