Topic Modelling for Discovering Themes in the Queries Raised at Farmers’ Call Center
16 / 11
Keywords:
Topic models; Latent Dirichlet Allocation; Text analysis; Kisan call center.Abstract
Topic modelling has gained prominence in the recent years due to the availability and necessities for the analysis of large volumes of unstructured text data. In agriculture, a huge amount of text data is generated in kisan call centers in the form of queries raised by the farmers. This study attempts to use the Latent Dirichlet Allocation method of topic modelling to discover the hidden topics in the queries raised at kisan call centers of five south Indian states. Through exploratory text analysis, it was found that the most common terms appeared in the query texts are ‘weather’, ‘management’ and ‘market’. The topic modelling lead to identification of 12 topics, out of which the topic ‘pest management in paddy, cotton and chilli’ reported the maximum number of queries.
Downloads
References
Antons, D., Kleer, R. and Salge, T.O. (2016). Mapping the topic landscape of JPIM, 1984-2013: In search of hidden structures and development trajectories. Journal of Product Innovation Management, 33, 726-749.
Bastani, K., Namavari, H. and Shaffer, J. (2019). Latent Dirichlet allocation (LDA) for topic modeling of the CFPB consumer complaints, Expert Systems with Applications, 127, 256-271.
Biemans, W., Griffin A. and Moenaert, R. (2007). Twenty years of the Journal of Product Innovation Management: History, participants, and knowledge stock and flows. Journal of Product Innovation Management, 24(3), 193-213.
Biemans, W., Griffin, A. and Moenaert, R. (2010). In search of the classics: A study of the impact of JPIM papers from 1984 to 2003. Journal of Product Innovation Management, 27(4), 461-84.
Biswas, S., and Jain, R. (2018). Text document categorization using machine learning algorithm in agricultural domain. Journal of the Indian Society of Agricultural Statistics, 72(1), 61-69
Blei, D.M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84.
Blei, D.M., Ng, A.Y. and Jordan, M.I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993-1022.
Brody, S. and Lapata, M. (2009). Bayesian word sense induction. In Proceedings of the 12th Conferenceof the European Chapter of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 103-111.
Broniatowski, D.A. and Magee, C.L. (2017). The emergence and collapse of knowledge boundaries, IEEE Trans. Eng. Manage., 64(3), 337-350.
Chauhan, U., and Shah, A. (2021). Topic modeling using latent Dirichlet allocation: A survey. ACM Computing Surveys (CSUR), 54(7), 1-35.
Deveaud, R., SanJuan, E. and Bellot, P. (2014). Accurate and effective latent concept modeling for ad hoc information retrieval. Doc. Numer. 17, 61-84
Dhaliwal, G.S., Jindal, V. an Mohindru, B. (2015). Crop losses due to insect pests: global and Indian scenario, Indian Journal of Entomology, 77(2), 165-168
Durisin, B., Calabretta, G. and Permeggiani, V. (2010). The intellectual structure of product innovation research. Journal of Product Innovation Management, 27(3), 437-51.
Eidelman, V., Boyd-Graber, J. and Resnik, P. (2012). Topic models for dynamic translation model adaptation. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Vol. 2. Stroudsburg, PA, USA: Association for Computational Linguistics, 115-119.
Fei-Fei, L. and Perona, P. (2005). A Bayesian hierarchical model for learning natural scene categories. In Proceedings of the 10th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2005). Los Alamitos, CA, USA: IEEE Computer Society, 524-531.
Feldman, R. and Sanger, J. (2007). The Text Mining Handbook. Advanced Approaches in Analyzing Unstructured Data, Cambridge University Press.
Ghazinoory, S., Ameri, F. and Farnoodi, S. (2013). An application of the text mining approach to select technology centers of excellence, Technological Forecasting & Social Change, 80, 918-931.
Griffiths, T.L. and Steyvers, M. (2004). Finding scientific topics, Proc. Nat. Acad. Sci. 101, 5228-5235.
Griffiths, T.L. and Steyvers, M. (2004). Finding Scientific Topics, Proceedings of the National Academy of Sciences of the United States of America, 101, 5228-5235.
Guo, L. (2008) Perspective: An analysis of 22 years of research in JPIM. Journal of Product Innovation Management 25(3), 249-60.
Gurcan, F., Ozyurt, O., and Cagitay, N.E. (2021). Investigation of emerging trends in the e-learning field using latent dirichlet allocation. International Review of Research in Open and Distributed Learning, 22(2), 1-18.
Haghighi, A. and Vanderwende, A. (2009). Exploring content models for multi-document summarization. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 362-370, Boulder, Colorado.
Kim, D., and Oh, A. (2011). Topic chains for understanding a news corpus. In International Conference on Intelligent Text Processing and Computational Linguistics (pp. 163-176). Springer, Berlin, Heidelberg.
Maskeri, G., Sarkar, S., and Heafield, K. (2008). Mining business topics in source code using latent dirichlet allocation. In Proceedings of the 1st India software engineering conference (pp. 113-120).
Nguyen, V.A., Boyd-Graber, J. and Resnik, P. (2012). SITS: A hierarchical nonparametric model using speaker identity for topic segmentation in multiparty conversations. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers –Vol. 1. Stroudsburg, PA, USA: Association for Computational Linguistics, 78-87.
Noel, G.E., Peterson, G.L. (2014). Applicability of Latent Dirichlet Allocation to multi-disk search. Digit. Investig. 11(1), 43-56.
Okon, E., Rachakonda, V., Hong, H. J., Callison-Burch, C. and Lipoff, J. (2020). Natural language processing of Reddit data to evaluate dermatology patient experiences and therapeutics. Journal of the American Academy of Dermatology, 83(3), 803-808.
16 B.S. Yashavanth and P.D. Sreekanth / Journal of the Indian Society of Agricultural Statistics 76(1) 2022 7–16.
R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Computing, Vienna, Austria. https://www.R-project.org/.
Sutherland, I., Sim, Y., Lee, S.K., Byun, J., and Kiatkawsin, K. (2020). Topic modeling of online accommodation reviews via latent Dirichlet allocation. Sustainability, 12(5), 1821.
Titov, I. and McDonald, R. (2008). A joint model of text and aspect ratings for sentiment summarization. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA, USA: Association for Computational Linguistics, 308-316.
Vamshi, K.B., Pandey, A.K. and Siva, K.A.P. (2018). Topic Model Based Opinion Mining and Sentiment Analysis, International Conference on Computer Communication and Informatics (ICCCI), pp. 1-4, doi: 10.1109/ICCCI.2018.8441220.
Wei, X. and Croft, B. (2006). LDA-based document models for ad-hoc retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’06). New York, NY, USA: ACM, 178-185.
Yang, L., Qiu, M., Gottipati, S. Zhu, F., Jiang, J., Sun, H. and Chen, Z. (2013). CQARank: jointly model topics and expertise in community question answering. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, 99-108