Comparative analysis of machine learning based classification for abiotic stress proteins


554 / 316

Authors

  • BULBUL AHMED ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110 012, India
  • ANIL RAI ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110 012, India
  • MIR ASIF IQUEBAL ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110 012, India
  • SARIKA JAISWAL ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110 012, India

https://doi.org/10.56093/ijas.v91i6.114287

Keywords:

Classification, Deep learning, LSTM, Poaceae, Random forest, SVM

Abstract

For thousands of years, cereals which include rice, wheat, maize, sorghum and millets etc. have been playing major role in human civilization. These are the principal components of human diet and important staples for daily survival of billions of people globally. The cereal crops belong to poaceae family and rich in vitamins, minerals and fiber. They are reported to reduce the coronary heart disease and other serious diseases. These crops are adversely affected by biotic and abiotic stresses like cold, drought, heat and salinity. With the advent of modern NGS technologies, the plethora of molecular data leads to infer many unexplored facts of the cereal crops using in-silico approach. In the present work, computational techniques were applied to study thoroughly the classification of abiotic stresses (cold, drought, heat and salinity) responsive genes in cereals. The datasets of four stress responsive genes in poaceae family was retrieved from public domain. The machine learning based methodologies namely, Random forest, Support Vector Machines and Deep Learning-Long Short-Term Memory (DL-LSTM) were applied. A comparative analysis was carried out for classification of the retrieved data with k-fold cross validation applying the machine learning techniques at different parameters. It was observed that for all the four sets of data, accuracy was maximum, i.e. 95.11%, 76.88%, 94.31% and 82.04% for cold, drought, heat and salinity, respectively using DL-LSTM. Comparison of the methodologies obviates the outperformance of deep leaning. Such approach of computational studies will help researchers to study the complex biological problems of gene classification more efficiently.

Downloads

Download data is not yet available.

References

Arel I, Rose D C and Karnowski T P. 2010. Deep machine learning-a new frontier in artificial intelligence research. IEEE Computer Intelligent Magazine 5(4): 13–18.

Bal S, Saha S, Fand B, Singh N, Rane J and Minhas P. 2014. Hailstorms: Causes, damage and post-hail management in agriculture. Technical Bulletin 5: 44.

Biau G. 2012. Analysis of a random forests model. Journal of Machine Learning Research 13: 1063–95.

Bergstra J S, Bardenet R, Bengio Y and Kegl B. 2011. Algorithms for hyper-parameter optimization. Advances in Neural Information Processing Systems 1: 9.

Breiman L. 2001. Random Forests. Machine Learning 45: 5–32.

Cortes C and Vapnik V. 1995. Support-vector networks. Machine learning 20(3): 273–97.

Deng Y and Li D. 2011. Deep learning and its applications to signal and information processing. IEEE Signal Proc Mag 28(1): 145–54.

Eckle K and Schmidt-Hieber J. 2019. A comparison of deep networks with ReLU activation function and linear spline-type methods. Neural Networks 110: 232–42.

Hendrycks D and Gimpel K. 2016. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415.

LeCun Y, Bengio Y and Hinton G. 2015. Deep learning. Nature 521(7553): 436–44.

Otoom A F, Abdallah E E, Kilani Y, Kefaye A and Ashour M. 2015. Effective diagnosis and monitoring of heart disease. International Journal of Software Engineering and its Applications: 9(1): 143–56.

Qiu J, Wu Q, Ding G, Xu Y and Feng S. 2016. A survey of machine learning for big data processing. EURASIP Journal on Advances in Signal Processing 1(67): 1–16.

Reddy S, Reddy K T and Kumari V V. 2018. Optimization of deep learning using various optimizers, loss functions and dropout. International Journal of Recent Technology and Engineering (IJRTE) 7(4S2): 448–55.

Roh Y, Heo G and Whang S E. 2019. A survey on data collection for machine learning: a big data-ai integration perspective. IEEE Transactions on Knowledge and Data Engineering 1–20.

Sarwar H. 2013. The importance of cereals (Poaceae: Gramineae) nutrition in human health: A review. Journal of Cereals and Oilseeds 4(3): 32–35.

Sak H, Senior A W and Beaufays F. 2014. Long short-term memory recurrent neural network architectures for large scale acoustic modeling, pp 1–5. Tauber L and Sánchez V. 2002. Introducing the normal distribution in a data analysis course: specific meaning contributed by the use of computers. Proceedings of Seventh International Congress for Teaching Statistics, Citeseer, pp 1–6.

Vapnik, Vladimir N. 1995. The Nature of Statistical Learning Theory, 1-334. Springer, New York.

Vieira S, Pinaya W H and Mechelli A. 2017. Using deep learning to investigate the neuroimaging correlates of psychiatric and neurological disorders: Methods and applications. Neuroscience & Biobehavioral Reviews 74: 58–75.

Wen M, Cong P, Zhang Z, Lu H and Li T. 2018. DeepMirTar: a deep learning approach for predicting human miRNA targets. Bioinformatics 34(22): 3781–87.

Young-Man K, Yong-woo K, Dong-Keun C and Myung-Jae Lim. 2019. The comparison of performance according to initialization methods of deep neural network for malware dataset. International Journal of Innovative Technology and Exploring Engineering (IJITEE) 8(4S2): 57–62.

Zhang J M, Harman M, Ma L and Liu Y. 2020. Machine learning testing: Survey, landscapes and horizons. IEEE Transactions on Software Engineering 1–37.

Downloads

Submitted

2021-08-24

Published

2021-08-24

Issue

Section

Articles

How to Cite

AHMED, B., RAI, A., IQUEBAL, M. A., & JAISWAL, S. (2021). Comparative analysis of machine learning based classification for abiotic stress proteins. The Indian Journal of Agricultural Sciences, 91(6), 861–866. https://doi.org/10.56093/ijas.v91i6.114287
Citation