Identification of genetic markers for increasing agricultural productivity: An empirical study

SAYANTI GUHA MAJUMDAR; ANIL RAI; D C MISHRA

doi:10.56093/ijas.v89i10.94633

Authors

SAYANTI GUHA MAJUMDAR PhD Scholar, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110 012, India
ANIL RAI Head and Principal Scientist, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110 012, India
D C MISHRA Scientist, Division of Bioinformatics, ICAR-IASRI, New Delhi

https://doi.org/10.56093/ijas.v89i10.94633

Keywords:

BLUP, Genomic Selection, LASSO, mRMR, QTL, Regression, SpAM

Abstract

Genomic selection (GS) has been used globally for increasing agricultural production and productivity. It has been used for complex quantitative traits by selecting breeding material after predicting Genomic Estimated Breeding Values (GEBVs) of target species. The accuracy of GS for estimation of GEBVs depends on various factors including sampling population, genetic architecture of target species, statistical models, etc. The feature (marker) selection is one of the important steps in development of GS models. There are large numbers of models proposed in the literature for GS. However, applicability of these models is based on many factors including extent of additive and epistatic effects of breeding population. Therefore, there is strong need to evaluate the performance of these models and techniques of feature selection under different situations. In this study, performance of linear/additive effect models, viz. linear least squared regression, BLUP, LASSO, ridge regression, SpAM as well as non-linear/epistatic effect models, viz. mRMR, HSIC LASSO have been evaluated through a simulation study in R platform. In general, performance of SpAM was found to be superior for GS than all other models considered in this study in case of presence of additive effect and absence of epistatic effect. However, in case of low heritability and high epistatic effect the HSIC LASSO outperformed all models. This study will assist researcher in selection of appropriate feature selection technique for a given situation.

Downloads

Download data is not yet available.

References

Ding C and Peng H. 2005. Minimum redundancy feature selection from microarray gene expression data. Journal of Bioinformatics and Computational Biology 3(2): 185–205. DOI: https://doi.org/10.1142/S0219720005001004

Endelman J B. 2011. Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4: 250–5. Friedman J, Hastie T and Tibshirani R. 2010. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33: 1–22. DOI: https://doi.org/10.18637/jss.v033.i01

Goeman J J. 2010. L1 penalized estimation in the Cox proportional hazards model. Biometrical Journal 52(1): 70–84. DOI: https://doi.org/10.1002/bimj.200900028

Gretton A, Bousquet O, Smola A and Scholkopf B. 2005. Measuring statistical dependence with Hilbert-Schmidt norms, pp 63–77. Algorithmic Learning Theory. Springer. DOI: https://doi.org/10.1007/11564089_7

Henderson C R. 1975. Best linear unbiased estimation and prediction under a selection model. Biometrics 31(2): 423–47. DOI: https://doi.org/10.2307/2529430

Hoerl A E and Kennard R W. 1970. Ridge regression: biased estimation for non-orthogonal problems. Technometrics 12: 55–67. DOI: https://doi.org/10.1080/00401706.1970.10488634

Hoerl A E and Kennard R W. 1970. Ridge regression: applications to non-orthogonal problems. Technometrics 12: 69–82. DOI: https://doi.org/10.1080/00401706.1970.10488635

Howard R, Carriquiry A L and Beavis W D. 2014. Parametric and nonparametric statistical methods for genomic selection of traits with additive and epistatic genetic architectures. G3 (Bethesda) 4(6): 1027–46. DOI: https://doi.org/10.1534/g3.114.010298

Jay N D, Cavanagh S P, Olsen C, Hachem N E, Bontempi G and Haibe-Kains B. 2013. mRMRe: an R package for parallelized mRMR ensemble feature selection. Bioinformatics 29(18): 2365–68. DOI: https://doi.org/10.1093/bioinformatics/btt383

Kao C H and Zeng Z B. 2002. Modeling epistasis of quantitative trait loci using Cockerham’s model. Genetics 160: 1243–61. DOI: https://doi.org/10.1093/genetics/160.3.1243

Liu H, Lafferty J and Wasserman L. 2009. Nonparametric regression and classification with joint sparsity constraints, pp 969–76. (In) Advances in Neural Information Processing Systems.

Meuwissen T H E, Hayes B J and Goddard M E. 2001. Prediction of total genetic value using genome-wide dense marker maps. Genetics 157: 1819–29. DOI: https://doi.org/10.1093/genetics/157.4.1819

Peng, H, Long, F and Ding, C. 2005. Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27: 1226–37. DOI: https://doi.org/10.1109/TPAMI.2005.159

R Core Team. 2017. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL: https://www.R-project.org/

Raskutti G, Wainwright M and Yu B. 2012. Minimax-optimal rates for sparse additive models over kernel classes via convex programming. Journal of Machine Learning Research 13: 389–427.

Ravikumar P, Lafferty J, Liu H and Wasserman L. 2009. Sparse additive models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71(5): 1009–30. DOI: https://doi.org/10.1111/j.1467-9868.2009.00718.x

Suzuki T and Sugiyama M. 2013. Fast learning rate of multiple kernel learning: Trade-off between sparsity and smoothness. The Annals of Statistics 41(3): 1381–405. DOI: https://doi.org/10.1214/13-AOS1095

Tibshirani R. 1996. Regression shrinkage and selection via the Lasso. Journal of Royal Statistical Society 58: 267–88. DOI: https://doi.org/10.1111/j.2517-6161.1996.tb02080.x

Yamada M, Jitkrittum W, Sigal L, Xing E P and Sugiyama M. 2014. High-dimensional feature selection by feature-wise kernelized Lasso. Neural Computation 26: 185–207. DOI: https://doi.org/10.1162/NECO_a_00537

Yandell B S, Mehta T, Banerjee S, Shriner D, Venkataraman R et al. 2007. R/qtlbim: QTL with Bayesian Interval Mapping in experimental crosses. Bioinformatics 23: 641–43. DOI: https://doi.org/10.1093/bioinformatics/btm011

Yandell B S, Nengjun Y, Mehta T, Banerjee S, Shriner D et al. 2012. qtlbim: QTL Bayesian Interval Mapping. R package version 2.0.5. URL: http://CRAN.R-project.org/package=qtlbim

Zhao Z, Wang L and Li H. 2010. Efficient spectral feature selection with minimum redundancy, pp 673–78. (In) AAAI Conference on Artificial Intelligence. DOI: https://doi.org/10.1609/aaai.v24i1.7671

Zhao T, Li X, Liu H and Roeder K. 2014. SAM: Sparse Additive Modelling. R package version 1.0.5. URL: https://CRAN.R-project.org/package=SAM