Combination of Support Vector Machine and Lexicon-Based Algorithm in Twitter Sentiment Analysis

Data from the Ministry of Civil Works and Public Housing (Kementrian PUPR) in 2019 shows that around 81 million millennials do not own houses. Government Regulation Number 25 of 2020 on the Implementation of Public Housing Savings, commonly called PP 25 Tapera 2020, is one of the government's e...

Volledige beschrijving

Bewaard in:

Bibliografische gegevens
Hoofdauteurs:	Muhammadi, Rindu Hafil, Laksana, Tri Ginanjar, Arifa, Amalia Beladinna
Formaat:	UMS Journal (OJS)
Taal:	eng
Gepubliceerd in:	Department of Informatics, Universitas Muhammadiyah Surakarta, Indonesia 2022
Onderwerpen:	sentiment analysis; tapera; public housing; lexicon-based; confusion matrix
Online toegang:	https://journals.ums.ac.id/index.php/khif/article/view/15213
Tags:	Voeg label toe Geen labels, Wees de eerste die dit record labelt!

_version_	1805342485978808320
author	Muhammadi, Rindu Hafil Laksana, Tri Ginanjar Arifa, Amalia Beladinna
author_facet	Muhammadi, Rindu Hafil Laksana, Tri Ginanjar Arifa, Amalia Beladinna
author_sort	Muhammadi, Rindu Hafil
collection	OJS
description	Data from the Ministry of Civil Works and Public Housing (Kementrian PUPR) in 2019 shows that around 81 million millennials do not own houses. Government Regulation Number 25 of 2020 on the Implementation of Public Housing Savings, commonly called PP 25 Tapera 2020, is one of the government's efforts to ensure that Indonesian people can afford houses. Tapera is a deposit of workers for house financing, which is refundable after the term expires. Immediately after enaction, there were many public responses regarding the ordinance. We investigate public sentiments commenting on the regulation and use Support Vector Machine (SVM) in the study since it has a good level of accuracy. It also requires labels and training data. To speed up labeling, we use the lexicon-based method. The issue in the lexicon-based lies in the dictionary component as the most significant factor. Therefore, it is possible to update the dictionary automatically by combining lexicon-based and SVM. The SVM approach can contribute to lexicon-based, and lexicon-based can help label datasets on SVM to produce good accuracy. The research begins with collecting data from Twitter, preprocessing raw and unstructured data into ready-to-use data, labeling the data with lexicon-based, weighting with TF-IDF, processing using SVM, and evaluating algorithm performance model with a confusion matrix. The results showed that the combination of lexicon-based and SVM worked well. Lexicon-based managed to label 519 tweet data. SVM managed to get an accuracy value of 81.73% with the RBF kernel function. Another test with a Sigmoid kernel attains the highest precision at 78.68%. The RBF kernel has the highest recall result with a value of 81.73%. Then, the F1-score for both the RBF kernel and Sigmoid is 79.60%.
format	UMS Journal (OJS)
id	oai:ojs2.journals.ums.ac.id:article-15213
institution	Universitas Muhammadiyah Surakarta
language	eng
publishDate	2022
publisher	Department of Informatics, Universitas Muhammadiyah Surakarta, Indonesia
record_format	ojs
spelling	oai:ojs2.journals.ums.ac.id:article-15213 Combination of Support Vector Machine and Lexicon-Based Algorithm in Twitter Sentiment Analysis Muhammadi, Rindu Hafil Laksana, Tri Ginanjar Arifa, Amalia Beladinna sentiment analysis; tapera; public housing; lexicon-based; confusion matrix Data from the Ministry of Civil Works and Public Housing (Kementrian PUPR) in 2019 shows that around 81 million millennials do not own houses. Government Regulation Number 25 of 2020 on the Implementation of Public Housing Savings, commonly called PP 25 Tapera 2020, is one of the government's efforts to ensure that Indonesian people can afford houses. Tapera is a deposit of workers for house financing, which is refundable after the term expires. Immediately after enaction, there were many public responses regarding the ordinance. We investigate public sentiments commenting on the regulation and use Support Vector Machine (SVM) in the study since it has a good level of accuracy. It also requires labels and training data. To speed up labeling, we use the lexicon-based method. The issue in the lexicon-based lies in the dictionary component as the most significant factor. Therefore, it is possible to update the dictionary automatically by combining lexicon-based and SVM. The SVM approach can contribute to lexicon-based, and lexicon-based can help label datasets on SVM to produce good accuracy. The research begins with collecting data from Twitter, preprocessing raw and unstructured data into ready-to-use data, labeling the data with lexicon-based, weighting with TF-IDF, processing using SVM, and evaluating algorithm performance model with a confusion matrix. The results showed that the combination of lexicon-based and SVM worked well. Lexicon-based managed to label 519 tweet data. SVM managed to get an accuracy value of 81.73% with the RBF kernel function. Another test with a Sigmoid kernel attains the highest precision at 78.68%. The RBF kernel has the highest recall result with a value of 81.73%. Then, the F1-score for both the RBF kernel and Sigmoid is 79.60%. Department of Informatics, Universitas Muhammadiyah Surakarta, Indonesia 2022-03-10 info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion application/pdf https://journals.ums.ac.id/index.php/khif/article/view/15213 10.23917/khif.v8i1.15213 Khazanah Informatika : Jurnal Ilmu Komputer dan Informatika; Vol. 8 No. 1 April 2022; 59-71 Khazanah Informatika; Vol. 8 No. 1 April 2022; 59-71 2477-698X 2621-038X eng https://journals.ums.ac.id/index.php/khif/article/view/15213/7397 Copyright (c) 2022 Khazanah Informatika: Jurnal Ilmu Komputer dan Informatika http://creativecommons.org/licenses/by/4.0
spellingShingle	sentiment analysis; tapera; public housing; lexicon-based; confusion matrix Muhammadi, Rindu Hafil Laksana, Tri Ginanjar Arifa, Amalia Beladinna Combination of Support Vector Machine and Lexicon-Based Algorithm in Twitter Sentiment Analysis
title	Combination of Support Vector Machine and Lexicon-Based Algorithm in Twitter Sentiment Analysis
title_full	Combination of Support Vector Machine and Lexicon-Based Algorithm in Twitter Sentiment Analysis
title_fullStr	Combination of Support Vector Machine and Lexicon-Based Algorithm in Twitter Sentiment Analysis
title_full_unstemmed	Combination of Support Vector Machine and Lexicon-Based Algorithm in Twitter Sentiment Analysis
title_short	Combination of Support Vector Machine and Lexicon-Based Algorithm in Twitter Sentiment Analysis
title_sort	combination of support vector machine and lexicon based algorithm in twitter sentiment analysis
topic	sentiment analysis; tapera; public housing; lexicon-based; confusion matrix
topic_facet	sentiment analysis; tapera; public housing; lexicon-based; confusion matrix
url	https://journals.ums.ac.id/index.php/khif/article/view/15213
work_keys_str_mv	AT muhammadirinduhafil combinationofsupportvectormachineandlexiconbasedalgorithmintwittersentimentanalysis AT laksanatriginanjar combinationofsupportvectormachineandlexiconbasedalgorithmintwittersentimentanalysis AT arifaamaliabeladinna combinationofsupportvectormachineandlexiconbasedalgorithmintwittersentimentanalysis

Combination of Support Vector Machine and Lexicon-Based Algorithm in Twitter Sentiment Analysis

Gelijkaardige items