Combination of Support Vector Machine and Lexicon-Based Algorithm in Twitter Sentiment Analysis
Data from the Ministry of Civil Works and Public Housing (Kementrian PUPR) in 2019 shows that around 81 million millennials do not own houses. Government Regulation Number 25 of 2020 on the Implementation of Public Housing Savings, commonly called PP 25 Tapera 2020, is one of the government's e...
Saved in:
Main Authors: | , , |
---|---|
Format: | UMS Journal (OJS) |
Language: | eng |
Published: |
Department of Informatics, Universitas Muhammadiyah Surakarta, Indonesia
2022
|
Subjects: | |
Online Access: | https://journals.ums.ac.id/index.php/khif/article/view/15213 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1805342485978808320 |
---|---|
author | Muhammadi, Rindu Hafil Laksana, Tri Ginanjar Arifa, Amalia Beladinna |
author_facet | Muhammadi, Rindu Hafil Laksana, Tri Ginanjar Arifa, Amalia Beladinna |
author_sort | Muhammadi, Rindu Hafil |
collection | OJS |
description | Data from the Ministry of Civil Works and Public Housing (Kementrian PUPR) in 2019 shows that around 81 million millennials do not own houses. Government Regulation Number 25 of 2020 on the Implementation of Public Housing Savings, commonly called PP 25 Tapera 2020, is one of the government's efforts to ensure that Indonesian people can afford houses. Tapera is a deposit of workers for house financing, which is refundable after the term expires. Immediately after enaction, there were many public responses regarding the ordinance. We investigate public sentiments commenting on the regulation and use Support Vector Machine (SVM) in the study since it has a good level of accuracy. It also requires labels and training data. To speed up labeling, we use the lexicon-based method. The issue in the lexicon-based lies in the dictionary component as the most significant factor. Therefore, it is possible to update the dictionary automatically by combining lexicon-based and SVM. The SVM approach can contribute to lexicon-based, and lexicon-based can help label datasets on SVM to produce good accuracy. The research begins with collecting data from Twitter, preprocessing raw and unstructured data into ready-to-use data, labeling the data with lexicon-based, weighting with TF-IDF, processing using SVM, and evaluating algorithm performance model with a confusion matrix. The results showed that the combination of lexicon-based and SVM worked well. Lexicon-based managed to label 519 tweet data. SVM managed to get an accuracy value of 81.73% with the RBF kernel function. Another test with a Sigmoid kernel attains the highest precision at 78.68%. The RBF kernel has the highest recall result with a value of 81.73%. Then, the F1-score for both the RBF kernel and Sigmoid is 79.60%. |
format | UMS Journal (OJS) |
id | oai:ojs2.journals.ums.ac.id:article-15213 |
institution | Universitas Muhammadiyah Surakarta |
language | eng |
publishDate | 2022 |
publisher | Department of Informatics, Universitas Muhammadiyah Surakarta, Indonesia |
record_format | ojs |
spelling | oai:ojs2.journals.ums.ac.id:article-15213 Combination of Support Vector Machine and Lexicon-Based Algorithm in Twitter Sentiment Analysis Muhammadi, Rindu Hafil Laksana, Tri Ginanjar Arifa, Amalia Beladinna sentiment analysis; tapera; public housing; lexicon-based; confusion matrix Data from the Ministry of Civil Works and Public Housing (Kementrian PUPR) in 2019 shows that around 81 million millennials do not own houses. Government Regulation Number 25 of 2020 on the Implementation of Public Housing Savings, commonly called PP 25 Tapera 2020, is one of the government's efforts to ensure that Indonesian people can afford houses. Tapera is a deposit of workers for house financing, which is refundable after the term expires. Immediately after enaction, there were many public responses regarding the ordinance. We investigate public sentiments commenting on the regulation and use Support Vector Machine (SVM) in the study since it has a good level of accuracy. It also requires labels and training data. To speed up labeling, we use the lexicon-based method. The issue in the lexicon-based lies in the dictionary component as the most significant factor. Therefore, it is possible to update the dictionary automatically by combining lexicon-based and SVM. The SVM approach can contribute to lexicon-based, and lexicon-based can help label datasets on SVM to produce good accuracy. The research begins with collecting data from Twitter, preprocessing raw and unstructured data into ready-to-use data, labeling the data with lexicon-based, weighting with TF-IDF, processing using SVM, and evaluating algorithm performance model with a confusion matrix. The results showed that the combination of lexicon-based and SVM worked well. Lexicon-based managed to label 519 tweet data. SVM managed to get an accuracy value of 81.73% with the RBF kernel function. Another test with a Sigmoid kernel attains the highest precision at 78.68%. The RBF kernel has the highest recall result with a value of 81.73%. Then, the F1-score for both the RBF kernel and Sigmoid is 79.60%. Department of Informatics, Universitas Muhammadiyah Surakarta, Indonesia 2022-03-10 info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion application/pdf https://journals.ums.ac.id/index.php/khif/article/view/15213 10.23917/khif.v8i1.15213 Khazanah Informatika : Jurnal Ilmu Komputer dan Informatika; Vol. 8 No. 1 April 2022; 59-71 Khazanah Informatika; Vol. 8 No. 1 April 2022; 59-71 2477-698X 2621-038X eng https://journals.ums.ac.id/index.php/khif/article/view/15213/7397 Copyright (c) 2022 Khazanah Informatika: Jurnal Ilmu Komputer dan Informatika http://creativecommons.org/licenses/by/4.0 |
spellingShingle | sentiment analysis; tapera; public housing; lexicon-based; confusion matrix Muhammadi, Rindu Hafil Laksana, Tri Ginanjar Arifa, Amalia Beladinna Combination of Support Vector Machine and Lexicon-Based Algorithm in Twitter Sentiment Analysis |
title | Combination of Support Vector Machine and Lexicon-Based Algorithm in Twitter Sentiment Analysis |
title_full | Combination of Support Vector Machine and Lexicon-Based Algorithm in Twitter Sentiment Analysis |
title_fullStr | Combination of Support Vector Machine and Lexicon-Based Algorithm in Twitter Sentiment Analysis |
title_full_unstemmed | Combination of Support Vector Machine and Lexicon-Based Algorithm in Twitter Sentiment Analysis |
title_short | Combination of Support Vector Machine and Lexicon-Based Algorithm in Twitter Sentiment Analysis |
title_sort | combination of support vector machine and lexicon based algorithm in twitter sentiment analysis |
topic | sentiment analysis; tapera; public housing; lexicon-based; confusion matrix |
topic_facet | sentiment analysis; tapera; public housing; lexicon-based; confusion matrix |
url | https://journals.ums.ac.id/index.php/khif/article/view/15213 |
work_keys_str_mv | AT muhammadirinduhafil combinationofsupportvectormachineandlexiconbasedalgorithmintwittersentimentanalysis AT laksanatriginanjar combinationofsupportvectormachineandlexiconbasedalgorithmintwittersentimentanalysis AT arifaamaliabeladinna combinationofsupportvectormachineandlexiconbasedalgorithmintwittersentimentanalysis |