Harnessing Multi-label Classification Approaches for Economic Phenomena Categorization


Machine learning
Performance analysis


One fashion to report a country’s economic state is by compiling economic phenomena from several sources. The collected data may be explored based on their sentiments and economic categories. This research attempted to perform and analyze multiple approaches to multi-label text classification in addition to providing sentiment analysis on the economic phenomena. The sentiment and single-label category classification was performed utilizing the logistic regression model. Meanwhile, the multi-label category classification was fulfilled using a combination of logistic regression, support vector machines, k-nearest neighbor, naïve Bayes, and decision trees as base classifiers, with binary relevance, classifier chain, and label power set as the implementation approaches. The results showed that logistic regression works well in sentiment and single-label classification, with a classification accuracy of 80.08% and 92.71%, respectively. However, it was also discovered that it works poorly as a base classifier in multi-label classification, indicated by the classification accuracy dropping to 13.35%, 15.40%, and 30.65% for binary relevance, classifier chain, and label power set, respectively. Alternatively, naïve Bayes works best as a base classifier in the label power set approach for multi-label classification, with a classification accuracy of 63.22%, followed by decision trees and support vector machines.



Abdulqader QM. 2017. Applying the binary logistic regression analysis on the medical data. Sci J Univ Zakho. 5(4):330–334. doi:10.25271/2017.5.4.388.

Alenezi M, Akour M, Qasem OA. 2020. Harnessing deep learning algorithms to predict software refactoring. Telkomnika (Telecommun Comput Electron Control). 18(6):2977–2982. doi:10.12928/telkomnika.v18i6.16743.

Australian Government. 2021. The G20. Canberra: Department of Foreign Affairs and Trade Australian Government. https://www.dfat.gov.au/trade/organisations/g20.

Dhanabal S, Chandramathi DS. 2011. A review of various k-nearest neighbor query processing techniques. Int J Comput Appl. 31(7):14–22.

Endarnoto SK, Pradipta S, Nugroho AS, Purnama J. 2011. Traffic condition information extraction & visualization from social media Twitter for Android mobile application. Paper presented at: ICEEI 2011. Proceedings of the 2011 International Conference on Electrical Engineering and Informatics; Bandung, Indonesia. p. 1–4. doi:10.1109/ICEEI.2011.6021743.

Femina BT, Sudheep EM. 2020. A novel fuzzy linguistic fusion approach to naive Bayes classifier for decision making applications. Int J Adv Sci Eng Inf Technol. 10(5):1889–1897. doi:10.18517/ijaseit.10.5.8186.

Gabrilovich E, Markovitch S. 2007. Harnessing the expertise of 70,000 human editors: knowledge-based feature generation for text categorization. J Mach Learn Res. 8:2297–2345.

Grover K. c2021. Advantages and disadvantages of logistic regression. OpenGenus IQ. [accessed 2020 Nov 4]. https://iq.opengenus.org/advantages-and-disadvantages-of-logistic-regression.

Gunawan KI, Santoso J. 2021. Multilabel text classification menggunakan SVM dan Doc2Vec classification pada dokumen berita bahasa Indonesia [Multi-label text classification using SVM and Doc2Vec classification in Indonesian news documents]. J Inf Syst Graphics Hospitality Technol. 3(1):29–38. doi:10.37823/insight.v3i01.126.

Hasan RA, Ibrahem Alhayali RA, Zaki ND, Ali AH. 2019. An adaptive clustering and classification algorithm for Twitter data streaming in Apache Spark. Telkomnika (Telecommun Comput Electron Control). 17(6):3086–3099. doi:10.12928/TELKOMNIKA.v17i6.11711.

Ikonomakis EK, Kotsiantis S, Tampakas V. 2005. Text classification: a recent overview. Paper presented at: DNCOCO’10. Proceedings of the 9th WSEAS International Conference on Data Networks, Communications, Computers; Stevens Point, United States. http://www.wseas.us/e-library/conferences/2005athens/cscc/papers/497-328.pdf.

Irsan IC, Khodra ML. 2019. Hierarchical multi-label news article classification with distributed semantic model based features. Int J Adv Intell Inf. 5(1):40–47. doi:10.26555/ijain.v5i1.168.

Korde V. 2012. Text classification and classifiers: a survey. Int J Adv Artif Intell Appl. 3(2):85–99. doi:10.5121/ijaia.2012.3208.

Koswari K, Meimandi KJ, Heidarysafa M, Mendu S, Barnes L, Brown D. 2019. Text classification algorithms: a survey. Information. 10(4):150. doi:10.3390/info10040150.

Luaces O, Diez J, Barranquero J, del Coz JJ, Bahamonde A. 2012. Binary relevance efficacy for multilabel classification. Prog Artif Intell. 4(4):303–313. doi:10.1007/s13748-012-0030-x.

Maalouf M. 2011. Logistic regression in data analysis: an overview. Int J Data Anal Tech Strategy. 3(3):281–299. doi:10.1504/ijdats.2011.041335.

Mahdi NN, Abed HT, Sadik NJ. 2020. The discriminant analysis in the evaluation of cancers diseases in Iraq. Int J Adv Sci Eng Inf Technol. 10(5):2170–2176. doi:10.18517/ijaseit.10.5.12969.

Mahinovs A, Tiwari A. 2007. Text classification method review. In: Roy R, Baxter D, editors. Decision engineering report series. Cranfield: Cranfield University.

Manning CD, Raghavan P, Schutze H. 2009. Scoring, term weighting, and the vector space model. In: Introduction to information retrieval. Cambridge: Cambridge University Press. p. 100–123. doi:10.1017/cbo9780511809071.007.

Nofriani N. 2019. Comparations of supervised machine learning techniques in predicting the classification of the household’s welfare status. Journal Pekommas. 4(1):43. doi:10.30818/jpkm.2019.2040105.

Nugroho YD, Murti SA. 2020. Analysis of input-output table: integrated of economic development by leading sectors in Indonesia. Paper presented at: 2020 Asia-Pacific Statistics Week; Bangkok, Thailand.

Oberman R, Dobbs R, Budiman A, Thompson F, Rosse M. 2012. The archipelago economy: unleashing Indonesia’s potential. Seoul and Washington, DC: McKinsey Global Institute.

OECD. 2012. Component expenditures of GDP. In: Eurostat-OECD methodological manual on purchasing power parities. Paris: OECD Publishing. p. 65–86. doi:10.1787/9789264189232-7-en.

Oraño JFV, Maravillas EA, Alia CJG. 2020. Classification of jackfruit fruit damage using color texture features and backpropagation neural network. Int J Adv Sci Eng Inf Technol. 10(5):1813–1820. doi:10.18517/ijaseit.10.5.8508.

Ortiz-Zambrano JA, Montejo-Raéz A. 2020. Barriers in reading comprehension of university students: analysis of the complex words noted in the VYTEDU-CW corpus. Int J Adv Sci Eng Inf Technol. 10(5):1798–1805. doi:10.18517/ijaseit.10.5.10809.

Pachón-suescún CG, Pinzón-arenas JO, Jiménez-moreno R. 2020. Fruit identification and quality detection by means of DAG-CNN. Int J Adv Sci Eng Inf Technol. 10(5):2183–2188. doi:10.18517/ijaseit.10.5.8684.

Palmer NT. 2012. The importance of economic growth. Dublin: CPA Ireland.

Peng CyJ, Lee KL, Ingersoll GM. 2002. An introduction to logistic regression analysis and reporting. J Educ Res. 96(1):3–14. doi:10.1080/00220670209598786.

Permatasari DA, Fakhrurroja H, Machbuba C. 2020. Human-robot interaction based on dialog management using sentence similarity comparison method. Int J Adv Sci Eng Inf Technol. 10(5):1881–1888. doi:10.18517/ijaseit.10.5.7606.

Pizer SM, Marron J. 2017. Object statistics on curved manifolds. In: Zheng G, Li S, Székely G, editors. Statistical shape and deformation analysis. London: Academic Press. p. 137–164. doi:10.1016/b978-0-12-810493-4.00007-9.

Prajapati P, Thakkar A. 2019. Extreme multi-label learning: a large scale classification approach in machine learning. J Inf Optim Sci. 40(4):983–1001. doi:10.1080/02522667.2019.1598000.

Prasetyo E, Dimas R, Suciati N, Fatichah C. 2020. Partial centroid contour distance (PCCD) in mango leaf classification. Int J Adv Sci Eng Inf Technol. 10(5):1920–1926. doi:10.18517/ijaseit.10.5.8047.

Pushpa M, Karpagavalli S. 2017. Multi-label classification: problem transformation methods in Tamil phoneme classification. Procedia Comput Sci. 115:572–579. doi:10.1016/j.procs.2017.09.116.

Putra IKGD, Fauzi R, Witarsyah D, Putra IPDJ. 2020. Classification of tomato plants diseases using convolutional neural network. Int J Adv Sci Eng Inf Technol. 10(5):1821–1827. doi:10.18517/ijaseit.10.5.11665.

Read J, Pfahringer B, Holmes G, Frank E. 2011. Classifier chains for multi-label classification. Mach Learn. 85(3):333–359. doi:10.1007/s10994-011-5256-5.

Saadah MN, Atmagi RW, Rahayu DS, Arifin AZ. 2013. Information retrieval of text document with weighting TF-IDF and LCS. J Comput Sci Inf. 6(1):34–37.

Sebastiani F. 2005. Text categorization. In: Rivero LC, Doorn JH, Ferraggine VE, editors. Encyclopedia of database technologies and applications. Hershey and London: Idea Group. p. 683–687.

Thangaraj M, Sivakami M. 2018. Text classification techniques: a literature review. Interdiscip J Inf Knowl Manage. 13:117–135. doi:10.28945/4066.

Venkatesan R, Er MJ. 2014. Multi-label classification method based on extreme learning machines. Paper presented at: ICARCV 2014. Proceedings of the 13th International Conference on Control Automation Robotics and Vision; Singapore. p. 619–624. doi:10.1109/ICARCV.2014.7064375.

Creative Commons License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Copyright (c) 2021 The Author(s)


Download data is not yet available.