Performance Analysis of Isolation Forest Algorithm in Fraud Detection of Credit Card Transactions
Losses incurred due to fraud on e-commerce transactions, especially those based on credit cards, continue to increase, resulting in large losses each year. One mechanism to minimize the risk of fraudulent credit card transactions is to utilize a detection technique for ongoing transactions. Credit c...
Saved in:
Main Authors: | , , , |
---|---|
Format: | UMS Journal (OJS) |
Language: | eng |
Published: |
Department of Informatics, Universitas Muhammadiyah Surakarta, Indonesia
2020
|
Subjects: | |
Online Access: | https://journals.ums.ac.id/index.php/khif/article/view/10520 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1805342473911795712 |
---|---|
author | Waspada, Indra Bahtiar, Nurdin Wirawan, Panji Wisnu Awan, Bagus Dwi Ari |
author_facet | Waspada, Indra Bahtiar, Nurdin Wirawan, Panji Wisnu Awan, Bagus Dwi Ari |
author_sort | Waspada, Indra |
collection | OJS |
description | Losses incurred due to fraud on e-commerce transactions, especially those based on credit cards, continue to increase, resulting in large losses each year. One mechanism to minimize the risk of fraudulent credit card transactions is to utilize a detection technique for ongoing transactions. Credit card transaction data in its original state does not have a label, and the amount of fraud data on the training data is very small so that it belongs to a very unbalanced category, and the pattern of fraud continues to change. Isolation forest is an unsupervised algorithm that is efficient in detecting anomalies. Several techniques can be applied to improve the performance of the Isolation forest model. Previous studies used the ROC-AUC metric in analyzing the performance of Isolation Forests, which could provide incorrect information. This study made two contributions; the first is to present a performance analysis with both the ROC-AUC and AUCPR. Thus, it can be seen that the high ROC-AUC value does not guarantee the model has the reliability in detecting fraud. In comparison, the information provided through AUCPR is more appropriate to describe the ability of the model to capture data fraud. The second contribution is to propose several techniques that can be applied to improve the performance of the Isolation forest model, namely to optimize the determination of the amount of training data, feature selection, the amount of fraud contamination, and setting hyper-parameters in the modeling stage (training). Experiments were carried out using a real-life dataset from ULB. The best results are obtained when the validation data split ratio is 60:40, using the five most important features, using only 60% of fraud data, and setting hyper-parameters with the number of trees 100, 128 sample maximum, and 0.001 contamination. The validation performance of this model is precision 0.809917, recall 0.710145, F1-score 0.756757, ROC-AUC 0.969728, and AUCPR 0.637993, while for Testing results obtained precision 0.807143, recall 0.763514, F1-score 0.784722, ROC-AUC 0.97371, and AUCPR 0.759228. |
format | UMS Journal (OJS) |
id | oai:ojs2.journals.ums.ac.id:article-10520 |
institution | Universitas Muhammadiyah Surakarta |
language | eng |
publishDate | 2020 |
publisher | Department of Informatics, Universitas Muhammadiyah Surakarta, Indonesia |
record_format | ojs |
spelling | oai:ojs2.journals.ums.ac.id:article-10520 Performance Analysis of Isolation Forest Algorithm in Fraud Detection of Credit Card Transactions Waspada, Indra Bahtiar, Nurdin Wirawan, Panji Wisnu Awan, Bagus Dwi Ari credit card; fraud; Isolation forest; unsupervised; precision; recall; ROC-AUC, AUCPR Losses incurred due to fraud on e-commerce transactions, especially those based on credit cards, continue to increase, resulting in large losses each year. One mechanism to minimize the risk of fraudulent credit card transactions is to utilize a detection technique for ongoing transactions. Credit card transaction data in its original state does not have a label, and the amount of fraud data on the training data is very small so that it belongs to a very unbalanced category, and the pattern of fraud continues to change. Isolation forest is an unsupervised algorithm that is efficient in detecting anomalies. Several techniques can be applied to improve the performance of the Isolation forest model. Previous studies used the ROC-AUC metric in analyzing the performance of Isolation Forests, which could provide incorrect information. This study made two contributions; the first is to present a performance analysis with both the ROC-AUC and AUCPR. Thus, it can be seen that the high ROC-AUC value does not guarantee the model has the reliability in detecting fraud. In comparison, the information provided through AUCPR is more appropriate to describe the ability of the model to capture data fraud. The second contribution is to propose several techniques that can be applied to improve the performance of the Isolation forest model, namely to optimize the determination of the amount of training data, feature selection, the amount of fraud contamination, and setting hyper-parameters in the modeling stage (training). Experiments were carried out using a real-life dataset from ULB. The best results are obtained when the validation data split ratio is 60:40, using the five most important features, using only 60% of fraud data, and setting hyper-parameters with the number of trees 100, 128 sample maximum, and 0.001 contamination. The validation performance of this model is precision 0.809917, recall 0.710145, F1-score 0.756757, ROC-AUC 0.969728, and AUCPR 0.637993, while for Testing results obtained precision 0.807143, recall 0.763514, F1-score 0.784722, ROC-AUC 0.97371, and AUCPR 0.759228. Department of Informatics, Universitas Muhammadiyah Surakarta, Indonesia 2020-10-27 info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion application/pdf https://journals.ums.ac.id/index.php/khif/article/view/10520 10.23917/khif.v6i2.10520 Khazanah Informatika : Jurnal Ilmu Komputer dan Informatika; Vol. 6 No. 2 October 2020 Khazanah Informatika; Vol. 6 No. 2 October 2020 2477-698X 2621-038X eng https://journals.ums.ac.id/index.php/khif/article/view/10520/6095 Copyright (c) 2020 Khazanah Informatika: Jurnal Ilmu Komputer dan Informatika http://creativecommons.org/licenses/by/4.0 |
spellingShingle | credit card; fraud; Isolation forest; unsupervised; precision; recall; ROC-AUC, AUCPR Waspada, Indra Bahtiar, Nurdin Wirawan, Panji Wisnu Awan, Bagus Dwi Ari Performance Analysis of Isolation Forest Algorithm in Fraud Detection of Credit Card Transactions |
title | Performance Analysis of Isolation Forest Algorithm in Fraud Detection of Credit Card Transactions |
title_full | Performance Analysis of Isolation Forest Algorithm in Fraud Detection of Credit Card Transactions |
title_fullStr | Performance Analysis of Isolation Forest Algorithm in Fraud Detection of Credit Card Transactions |
title_full_unstemmed | Performance Analysis of Isolation Forest Algorithm in Fraud Detection of Credit Card Transactions |
title_short | Performance Analysis of Isolation Forest Algorithm in Fraud Detection of Credit Card Transactions |
title_sort | performance analysis of isolation forest algorithm in fraud detection of credit card transactions |
topic | credit card; fraud; Isolation forest; unsupervised; precision; recall; ROC-AUC, AUCPR |
topic_facet | credit card; fraud; Isolation forest; unsupervised; precision; recall; ROC-AUC, AUCPR |
url | https://journals.ums.ac.id/index.php/khif/article/view/10520 |
work_keys_str_mv | AT waspadaindra performanceanalysisofisolationforestalgorithminfrauddetectionofcreditcardtransactions AT bahtiarnurdin performanceanalysisofisolationforestalgorithminfrauddetectionofcreditcardtransactions AT wirawanpanjiwisnu performanceanalysisofisolationforestalgorithminfrauddetectionofcreditcardtransactions AT awanbagusdwiari performanceanalysisofisolationforestalgorithminfrauddetectionofcreditcardtransactions |