Performance Analysis of Isolation Forest Algorithm in Fraud Detection of Credit Card Transactions

Losses incurred due to fraud on e-commerce transactions, especially those based on credit cards, continue to increase, resulting in large losses each year. One mechanism to minimize the risk of fraudulent credit card transactions is to utilize a detection technique for ongoing transactions. Credit c...

Full description

Saved in:
Bibliographic Details
Main Authors: Waspada, Indra, Bahtiar, Nurdin, Wirawan, Panji Wisnu, Awan, Bagus Dwi Ari
Format: UMS Journal (OJS)
Language:eng
Published: Department of Informatics, Universitas Muhammadiyah Surakarta, Indonesia 2020
Subjects:
Online Access:https://journals.ums.ac.id/index.php/khif/article/view/10520
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1805342473911795712
author Waspada, Indra
Bahtiar, Nurdin
Wirawan, Panji Wisnu
Awan, Bagus Dwi Ari
author_facet Waspada, Indra
Bahtiar, Nurdin
Wirawan, Panji Wisnu
Awan, Bagus Dwi Ari
author_sort Waspada, Indra
collection OJS
description Losses incurred due to fraud on e-commerce transactions, especially those based on credit cards, continue to increase, resulting in large losses each year. One mechanism to minimize the risk of fraudulent credit card transactions is to utilize a detection technique for ongoing transactions. Credit card transaction data in its original state does not have a label, and the amount of fraud data on the training data is very small so that it belongs to a very unbalanced category, and the pattern of fraud continues to change. Isolation forest is an unsupervised algorithm that is efficient in detecting anomalies. Several techniques can be applied to improve the performance of the Isolation forest model. Previous studies used the ROC-AUC metric in analyzing the performance of Isolation Forests, which could provide incorrect information. This study made two contributions; the first is to present a performance analysis with both the ROC-AUC and AUCPR. Thus, it can be seen that the high ROC-AUC value does not guarantee the model has the reliability in detecting fraud. In comparison, the information provided through AUCPR is more appropriate to describe the ability of the model to capture data fraud. The second contribution is to propose several techniques that can be applied to improve the performance of the Isolation forest model, namely to optimize the determination of the amount of training data, feature selection, the amount of fraud contamination, and setting hyper-parameters in the modeling stage (training). Experiments were carried out using a real-life dataset from ULB. The best results are obtained when the validation data split ratio is 60:40, using the five most important features, using only 60% of fraud data, and setting hyper-parameters with the number of trees 100, 128 sample maximum, and 0.001 contamination. The validation performance of this model is precision 0.809917, recall 0.710145, F1-score 0.756757, ROC-AUC 0.969728, and AUCPR 0.637993, while for Testing results obtained precision 0.807143, recall 0.763514, F1-score 0.784722, ROC-AUC 0.97371, and AUCPR 0.759228.
format UMS Journal (OJS)
id oai:ojs2.journals.ums.ac.id:article-10520
institution Universitas Muhammadiyah Surakarta
language eng
publishDate 2020
publisher Department of Informatics, Universitas Muhammadiyah Surakarta, Indonesia
record_format ojs
spelling oai:ojs2.journals.ums.ac.id:article-10520 Performance Analysis of Isolation Forest Algorithm in Fraud Detection of Credit Card Transactions Waspada, Indra Bahtiar, Nurdin Wirawan, Panji Wisnu Awan, Bagus Dwi Ari credit card; fraud; Isolation forest; unsupervised; precision; recall; ROC-AUC, AUCPR Losses incurred due to fraud on e-commerce transactions, especially those based on credit cards, continue to increase, resulting in large losses each year. One mechanism to minimize the risk of fraudulent credit card transactions is to utilize a detection technique for ongoing transactions. Credit card transaction data in its original state does not have a label, and the amount of fraud data on the training data is very small so that it belongs to a very unbalanced category, and the pattern of fraud continues to change. Isolation forest is an unsupervised algorithm that is efficient in detecting anomalies. Several techniques can be applied to improve the performance of the Isolation forest model. Previous studies used the ROC-AUC metric in analyzing the performance of Isolation Forests, which could provide incorrect information. This study made two contributions; the first is to present a performance analysis with both the ROC-AUC and AUCPR. Thus, it can be seen that the high ROC-AUC value does not guarantee the model has the reliability in detecting fraud. In comparison, the information provided through AUCPR is more appropriate to describe the ability of the model to capture data fraud. The second contribution is to propose several techniques that can be applied to improve the performance of the Isolation forest model, namely to optimize the determination of the amount of training data, feature selection, the amount of fraud contamination, and setting hyper-parameters in the modeling stage (training). Experiments were carried out using a real-life dataset from ULB. The best results are obtained when the validation data split ratio is 60:40, using the five most important features, using only 60% of fraud data, and setting hyper-parameters with the number of trees 100, 128 sample maximum, and 0.001 contamination. The validation performance of this model is precision 0.809917, recall 0.710145, F1-score 0.756757, ROC-AUC 0.969728, and AUCPR 0.637993, while for Testing results obtained precision 0.807143, recall 0.763514, F1-score 0.784722, ROC-AUC 0.97371, and AUCPR 0.759228. Department of Informatics, Universitas Muhammadiyah Surakarta, Indonesia 2020-10-27 info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion application/pdf https://journals.ums.ac.id/index.php/khif/article/view/10520 10.23917/khif.v6i2.10520 Khazanah Informatika : Jurnal Ilmu Komputer dan Informatika; Vol. 6 No. 2 October 2020 Khazanah Informatika; Vol. 6 No. 2 October 2020 2477-698X 2621-038X eng https://journals.ums.ac.id/index.php/khif/article/view/10520/6095 Copyright (c) 2020 Khazanah Informatika: Jurnal Ilmu Komputer dan Informatika http://creativecommons.org/licenses/by/4.0
spellingShingle credit card; fraud; Isolation forest; unsupervised; precision; recall; ROC-AUC, AUCPR
Waspada, Indra
Bahtiar, Nurdin
Wirawan, Panji Wisnu
Awan, Bagus Dwi Ari
Performance Analysis of Isolation Forest Algorithm in Fraud Detection of Credit Card Transactions
title Performance Analysis of Isolation Forest Algorithm in Fraud Detection of Credit Card Transactions
title_full Performance Analysis of Isolation Forest Algorithm in Fraud Detection of Credit Card Transactions
title_fullStr Performance Analysis of Isolation Forest Algorithm in Fraud Detection of Credit Card Transactions
title_full_unstemmed Performance Analysis of Isolation Forest Algorithm in Fraud Detection of Credit Card Transactions
title_short Performance Analysis of Isolation Forest Algorithm in Fraud Detection of Credit Card Transactions
title_sort performance analysis of isolation forest algorithm in fraud detection of credit card transactions
topic credit card; fraud; Isolation forest; unsupervised; precision; recall; ROC-AUC, AUCPR
topic_facet credit card; fraud; Isolation forest; unsupervised; precision; recall; ROC-AUC, AUCPR
url https://journals.ums.ac.id/index.php/khif/article/view/10520
work_keys_str_mv AT waspadaindra performanceanalysisofisolationforestalgorithminfrauddetectionofcreditcardtransactions
AT bahtiarnurdin performanceanalysisofisolationforestalgorithminfrauddetectionofcreditcardtransactions
AT wirawanpanjiwisnu performanceanalysisofisolationforestalgorithminfrauddetectionofcreditcardtransactions
AT awanbagusdwiari performanceanalysisofisolationforestalgorithminfrauddetectionofcreditcardtransactions