Combination of Support Vector Machine and Lexicon-Based Algorithm in Twitter Sentiment Analysis

Data from the Ministry of Civil Works and Public Housing (Kementrian PUPR) in 2019 shows that around 81 million millennials do not own houses. Government Regulation Number 25 of 2020 on the Implementation of Public Housing Savings, commonly called PP 25 Tapera 2020, is one of the government's e...

Full description

Saved in:
Bibliographic Details
Main Authors: Muhammadi, Rindu Hafil, Laksana, Tri Ginanjar, Arifa, Amalia Beladinna
Format: UMS Journal (OJS)
Language:eng
Published: Department of Informatics, Universitas Muhammadiyah Surakarta, Indonesia 2022
Subjects:
Online Access:https://journals.ums.ac.id/index.php/khif/article/view/15213
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1805342485978808320
author Muhammadi, Rindu Hafil
Laksana, Tri Ginanjar
Arifa, Amalia Beladinna
author_facet Muhammadi, Rindu Hafil
Laksana, Tri Ginanjar
Arifa, Amalia Beladinna
author_sort Muhammadi, Rindu Hafil
collection OJS
description Data from the Ministry of Civil Works and Public Housing (Kementrian PUPR) in 2019 shows that around 81 million millennials do not own houses. Government Regulation Number 25 of 2020 on the Implementation of Public Housing Savings, commonly called PP 25 Tapera 2020, is one of the government's efforts to ensure that Indonesian people can afford houses. Tapera is a deposit of workers for house financing, which is refundable after the term expires. Immediately after enaction, there were many public responses regarding the ordinance. We investigate public sentiments commenting on the regulation and use Support Vector Machine (SVM) in the study since it has a good level of accuracy. It also requires labels and training data. To speed up labeling, we use the lexicon-based method. The issue in the lexicon-based lies in the dictionary component as the most significant factor. Therefore, it is possible to update the dictionary automatically by combining lexicon-based and SVM. The SVM approach can contribute to lexicon-based, and lexicon-based can help label datasets on SVM to produce good accuracy. The research begins with collecting data from Twitter, preprocessing raw and unstructured data into ready-to-use data, labeling the data with lexicon-based, weighting with TF-IDF, processing using SVM, and evaluating algorithm performance model with a confusion matrix. The results showed that the combination of lexicon-based and SVM worked well. Lexicon-based managed to label 519 tweet data. SVM managed to get an accuracy value of 81.73% with the RBF kernel function. Another test with a Sigmoid kernel attains the highest precision at 78.68%. The RBF kernel has the highest recall result with a value of 81.73%. Then, the F1-score for both the RBF kernel and Sigmoid is 79.60%.
format UMS Journal (OJS)
id oai:ojs2.journals.ums.ac.id:article-15213
institution Universitas Muhammadiyah Surakarta
language eng
publishDate 2022
publisher Department of Informatics, Universitas Muhammadiyah Surakarta, Indonesia
record_format ojs
spelling oai:ojs2.journals.ums.ac.id:article-15213 Combination of Support Vector Machine and Lexicon-Based Algorithm in Twitter Sentiment Analysis Muhammadi, Rindu Hafil Laksana, Tri Ginanjar Arifa, Amalia Beladinna sentiment analysis; tapera; public housing; lexicon-based; confusion matrix Data from the Ministry of Civil Works and Public Housing (Kementrian PUPR) in 2019 shows that around 81 million millennials do not own houses. Government Regulation Number 25 of 2020 on the Implementation of Public Housing Savings, commonly called PP 25 Tapera 2020, is one of the government's efforts to ensure that Indonesian people can afford houses. Tapera is a deposit of workers for house financing, which is refundable after the term expires. Immediately after enaction, there were many public responses regarding the ordinance. We investigate public sentiments commenting on the regulation and use Support Vector Machine (SVM) in the study since it has a good level of accuracy. It also requires labels and training data. To speed up labeling, we use the lexicon-based method. The issue in the lexicon-based lies in the dictionary component as the most significant factor. Therefore, it is possible to update the dictionary automatically by combining lexicon-based and SVM. The SVM approach can contribute to lexicon-based, and lexicon-based can help label datasets on SVM to produce good accuracy. The research begins with collecting data from Twitter, preprocessing raw and unstructured data into ready-to-use data, labeling the data with lexicon-based, weighting with TF-IDF, processing using SVM, and evaluating algorithm performance model with a confusion matrix. The results showed that the combination of lexicon-based and SVM worked well. Lexicon-based managed to label 519 tweet data. SVM managed to get an accuracy value of 81.73% with the RBF kernel function. Another test with a Sigmoid kernel attains the highest precision at 78.68%. The RBF kernel has the highest recall result with a value of 81.73%. Then, the F1-score for both the RBF kernel and Sigmoid is 79.60%. Department of Informatics, Universitas Muhammadiyah Surakarta, Indonesia 2022-03-10 info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion application/pdf https://journals.ums.ac.id/index.php/khif/article/view/15213 10.23917/khif.v8i1.15213 Khazanah Informatika : Jurnal Ilmu Komputer dan Informatika; Vol. 8 No. 1 April 2022; 59-71 Khazanah Informatika; Vol. 8 No. 1 April 2022; 59-71 2477-698X 2621-038X eng https://journals.ums.ac.id/index.php/khif/article/view/15213/7397 Copyright (c) 2022 Khazanah Informatika: Jurnal Ilmu Komputer dan Informatika http://creativecommons.org/licenses/by/4.0
spellingShingle sentiment analysis; tapera; public housing; lexicon-based; confusion matrix
Muhammadi, Rindu Hafil
Laksana, Tri Ginanjar
Arifa, Amalia Beladinna
Combination of Support Vector Machine and Lexicon-Based Algorithm in Twitter Sentiment Analysis
title Combination of Support Vector Machine and Lexicon-Based Algorithm in Twitter Sentiment Analysis
title_full Combination of Support Vector Machine and Lexicon-Based Algorithm in Twitter Sentiment Analysis
title_fullStr Combination of Support Vector Machine and Lexicon-Based Algorithm in Twitter Sentiment Analysis
title_full_unstemmed Combination of Support Vector Machine and Lexicon-Based Algorithm in Twitter Sentiment Analysis
title_short Combination of Support Vector Machine and Lexicon-Based Algorithm in Twitter Sentiment Analysis
title_sort combination of support vector machine and lexicon based algorithm in twitter sentiment analysis
topic sentiment analysis; tapera; public housing; lexicon-based; confusion matrix
topic_facet sentiment analysis; tapera; public housing; lexicon-based; confusion matrix
url https://journals.ums.ac.id/index.php/khif/article/view/15213
work_keys_str_mv AT muhammadirinduhafil combinationofsupportvectormachineandlexiconbasedalgorithmintwittersentimentanalysis
AT laksanatriginanjar combinationofsupportvectormachineandlexiconbasedalgorithmintwittersentimentanalysis
AT arifaamaliabeladinna combinationofsupportvectormachineandlexiconbasedalgorithmintwittersentimentanalysis