Comparative analysis of machine learning based classification for abiotic stress proteins
554 / 316
Keywords:
Classification, Deep learning, LSTM, Poaceae, Random forest, SVMAbstract
For thousands of years, cereals which include rice, wheat, maize, sorghum and millets etc. have been playing major role in human civilization. These are the principal components of human diet and important staples for daily survival of billions of people globally. The cereal crops belong to poaceae family and rich in vitamins, minerals and fiber. They are reported to reduce the coronary heart disease and other serious diseases. These crops are adversely affected by biotic and abiotic stresses like cold, drought, heat and salinity. With the advent of modern NGS technologies, the plethora of molecular data leads to infer many unexplored facts of the cereal crops using in-silico approach. In the present work, computational techniques were applied to study thoroughly the classification of abiotic stresses (cold, drought, heat and salinity) responsive genes in cereals. The datasets of four stress responsive genes in poaceae family was retrieved from public domain. The machine learning based methodologies namely, Random forest, Support Vector Machines and Deep Learning-Long Short-Term Memory (DL-LSTM) were applied. A comparative analysis was carried out for classification of the retrieved data with k-fold cross validation applying the machine learning techniques at different parameters. It was observed that for all the four sets of data, accuracy was maximum, i.e. 95.11%, 76.88%, 94.31% and 82.04% for cold, drought, heat and salinity, respectively using DL-LSTM. Comparison of the methodologies obviates the outperformance of deep leaning. Such approach of computational studies will help researchers to study the complex biological problems of gene classification more efficiently.Downloads
References
Arel I, Rose D C and Karnowski T P. 2010. Deep machine learning-a new frontier in artificial intelligence research. IEEE Computer Intelligent Magazine 5(4): 13–18.
Bal S, Saha S, Fand B, Singh N, Rane J and Minhas P. 2014. Hailstorms: Causes, damage and post-hail management in agriculture. Technical Bulletin 5: 44.
Biau G. 2012. Analysis of a random forests model. Journal of Machine Learning Research 13: 1063–95.
Bergstra J S, Bardenet R, Bengio Y and Kegl B. 2011. Algorithms for hyper-parameter optimization. Advances in Neural Information Processing Systems 1: 9.
Breiman L. 2001. Random Forests. Machine Learning 45: 5–32.
Cortes C and Vapnik V. 1995. Support-vector networks. Machine learning 20(3): 273–97.
Deng Y and Li D. 2011. Deep learning and its applications to signal and information processing. IEEE Signal Proc Mag 28(1): 145–54.
Eckle K and Schmidt-Hieber J. 2019. A comparison of deep networks with ReLU activation function and linear spline-type methods. Neural Networks 110: 232–42.
Hendrycks D and Gimpel K. 2016. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415.
LeCun Y, Bengio Y and Hinton G. 2015. Deep learning. Nature 521(7553): 436–44.
Otoom A F, Abdallah E E, Kilani Y, Kefaye A and Ashour M. 2015. Effective diagnosis and monitoring of heart disease. International Journal of Software Engineering and its Applications: 9(1): 143–56.
Qiu J, Wu Q, Ding G, Xu Y and Feng S. 2016. A survey of machine learning for big data processing. EURASIP Journal on Advances in Signal Processing 1(67): 1–16.
Reddy S, Reddy K T and Kumari V V. 2018. Optimization of deep learning using various optimizers, loss functions and dropout. International Journal of Recent Technology and Engineering (IJRTE) 7(4S2): 448–55.
Roh Y, Heo G and Whang S E. 2019. A survey on data collection for machine learning: a big data-ai integration perspective. IEEE Transactions on Knowledge and Data Engineering 1–20.
Sarwar H. 2013. The importance of cereals (Poaceae: Gramineae) nutrition in human health: A review. Journal of Cereals and Oilseeds 4(3): 32–35.
Sak H, Senior A W and Beaufays F. 2014. Long short-term memory recurrent neural network architectures for large scale acoustic modeling, pp 1–5. Tauber L and Sánchez V. 2002. Introducing the normal distribution in a data analysis course: specific meaning contributed by the use of computers. Proceedings of Seventh International Congress for Teaching Statistics, Citeseer, pp 1–6.
Vapnik, Vladimir N. 1995. The Nature of Statistical Learning Theory, 1-334. Springer, New York.
Vieira S, Pinaya W H and Mechelli A. 2017. Using deep learning to investigate the neuroimaging correlates of psychiatric and neurological disorders: Methods and applications. Neuroscience & Biobehavioral Reviews 74: 58–75.
Wen M, Cong P, Zhang Z, Lu H and Li T. 2018. DeepMirTar: a deep learning approach for predicting human miRNA targets. Bioinformatics 34(22): 3781–87.
Young-Man K, Yong-woo K, Dong-Keun C and Myung-Jae Lim. 2019. The comparison of performance according to initialization methods of deep neural network for malware dataset. International Journal of Innovative Technology and Exploring Engineering (IJITEE) 8(4S2): 57–62.
Zhang J M, Harman M, Ma L and Liu Y. 2020. Machine learning testing: Survey, landscapes and horizons. IEEE Transactions on Software Engineering 1–37.
Downloads
Submitted
Published
Issue
Section
License
Copyright (c) 2021 The Indian Journal of Agricultural Sciences

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The copyright of the articles published in The Indian Journal of Agricultural Sciences is vested with the Indian Council of Agricultural Research, which reserves the right to enter into any agreement with any organization in India or abroad, for reprography, photocopying, storage and dissemination of information. The Council has no objection to using the material, provided the information is not being utilized for commercial purposes and wherever the information is being used, proper credit is given to ICAR.