| تعداد نشریات | 61 |
| تعداد شمارهها | 2,201 |
| تعداد مقالات | 17,933 |
| تعداد مشاهده مقاله | 54,980,685 |
| تعداد دریافت فایل اصل مقاله | 28,774,369 |
Mitigating data imbalance for enhanced third-party insurance claim prediction using machine learning | ||
| Journal of Mathematics and Modeling in Finance | ||
| دوره 5، شماره 1، مهر 2025، صفحه 175-187 اصل مقاله (192.52 K) | ||
| نوع مقاله: Research Article | ||
| شناسه دیجیتال (DOI): 10.22054/jmmf.2025.84807.1169 | ||
| نویسندگان | ||
| Maryam Esna-Ashari* 1؛ Hamideh Badi2؛ Majid Chahkandi2؛ Hamid Saadatfar3 | ||
| 1Insurance Research Center, Tehran, Iran | ||
| 2Department of statistics, University of Birjand, Birjand, Iran | ||
| 3Department of Computer Engineering, University of Birjand, Birjand, Iran | ||
| چکیده | ||
| Accurate prediction of third-party insurance claims is critical for pricing policies and managing risk. However, the highly imbalanced nature of insurance data—where non-claim cases vastly outnumber claim cases—poses significant challenges to standard predictive models. This study explores the use of machine learning algorithms to enhance claim prediction by directly addressing this imbalance. We use real data from the Insurance Research Center of Iran, incorporating variables such as driver characteristics, vehicle features, location, and claims history. Five models are evaluated: logistic regression, decision tree, bagging, random forest, and boosting. To handle the imbalance, we apply random undersampling, oversampling, and SMOTE. Model performance is assessed using accuracy, sensitivity, specificity, precision, and F-score. Results indicate that when data imbalance is properly treated, ensemble methods—particularly decision trees, bagging, and random forest—significantly outperform logistic regression and boosting, especially in detecting actual claim cases. The study underscores the importance of using appropriate resampling techniques and evaluation metrics in imbalanced settings. These findings can help insurers develop more reliable models for pricing and risk classification. | ||
| کلیدواژهها | ||
| Machine learning algorithms؛ Third-party insurance؛ Imbalanced data | ||
| مراجع | ||
|
[1] E.M., Aldahasi, R.K.,Alsheikh, F.A., Khan, G., Jeon, Optimizing fraud detection in financial transactions with machine learning and imbalance mitigation, Expert Systems, 42 (2025), e13682. [2] A., Abdallah, M.A., Maarof, A., Zainal, Fraud detection system: A survey. Journal of Network and Computer Applications, 68 (2016), pp. 90-113. [3] P., Baecke, L., Bocca , The value of vehicle telematics data in insurance risk selection processes, Decision Support Systems, 98 (2017), pp. 69-79. [4] K., Ding, B., Lev, X., Peng, T., Sun, M.A., Vasarhelyi, Machine learning improves accounting estimates: Evidence from insurance payments. Review of accounting studies, 25 (2020), pp. 1098-1134. [5] G., Dionne (Ed.), Handbook of Insurance, 2nd ed. Springer, 2013. [6] M., Esna-Ashari, Using a new data mining method for automobile insurance fraud detection: a case study by a real data from an Iranian insurance company, International Journal of Mathematical Modeling Computations, 14 (2024), pp. 15-20. [7] M., Firuzi, M., Shakouri, L., Kazemi, S., Zahedi, A data mining approach to auto insurance fraud, Iranian Journal of Insurance Research (Sanaat-e-Bimeh). 26 (2011), pp. 103-128. Available from: https://sid.ir/paper/100794/en (in Persian). [8] E.W., Frees , Regression modeling with actuarial and financial applications, Cambridge University Press, 2014. [9] N.K., Frempong, N., Nicholas, M.A., Boateng, Decision tree as a predictive modeling tool for auto insurance claims, International Journal of Statistics and Applications, 7 (2017), pp. 117-120. [10] I., Goodfellow, Y., Bengio, A., Courville, Machine learning basics, Deep Learning, 1 (2016), pp. 98-164. [11] N., Hajiheidari, S., Khaleie, A., Farahi, The insured risk classification in auto collision insurance using data mining algorithms: evidence from an Iranian insurance company, Iranian Journal of Insurance Research (Sanaat-e-Bimeh). 26 (2012), pp. 107-129. Available from: https://sid.ir/paper/100920/en (in Persian). [12] M., Hanafy, R., Ming, Machine learning approaches for auto insurance big data, Risks, 9 (2021), pp. 42. [13] M., Hanafy, R., Ming, Improving imbalanced data classification in auto insurance by the data level approaches, Journal of Advanced Computer Science and Applications, (2021), pp. 493-499. [14] J.T., Hancock, T.M., Khoshgoftaar, J.M., Johnson, Evaluating classifier performance with highly imbalanced big data, Journal of Big Data, 10 (2023), pp. 1-31. [15] G., James, D., Witten, T., Hastie, R., Tibshirani, An Introduction to Statistical Learning: with Applications in R, 2nd ed. Springer, 2021. [16] V., Kaelan, L., Kaelan, M., Novovi Buri, A nonparametric data mining approach for risk prediction in car insurance, Economic Research-Ekonomska Istraivanja. 29 (2016), pp. 545-558. [17] F., Khamesian, M., Esna-Ashari, E., Dei Ofosu-Hene, F., Khanizadeh, Risk classification of imbalanced data for car insurance companies: Machine learning approaches, International Journal of Mathematical Modelling & Computations, 12 (2022), pp. 153-162. [18] G., Kowshalya, M., Nandhini, Predicting fraudulent claims in automobile insurance, In: Proceedings of the 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT), (2018), pp. 1338-1343. [19] M., Manteqipour, V., Ghorbani, M., Aalaei, Classifying age of policyholders according to the claim rates in Iran, Journal of Applied Economics Studies in Iran, 39 (2021), pp. 141-175. [20] R., Ming, O., Mohamad, N., Innab, M., Hanafy, (2024). Bagging Vs. Boosting in Ensemble Machine Learning? An Integrated Application to Fraud Risk Analysis in the Insurance Sector, Applied Artificial Intelligence, 38 (2024), 2355024. [21] J., Pesantez-Narvaez, M., Guillen, M., Alcaniz ˜ , Predicting motor insurance claims using telematics dataXGBoost versus logistic regression, Risks, 7 (2019), 70. [22] K.A., Smith, R.J., Willis, M., Brooks, An analysis of customer retention and insurance claim patterns using data mining: a case study, Journal of the Operational Research Society, 53 (2002), pp. 532-541. [23] G.G., Sundarkumar, V., Ravi, A novel hybrid under-sampling method for mining unbalanced datasets in banking and insurance, Engineering Applications of Artificial Intelligence, 37 (2015), pp. 368-377. [24] M., Torkestani, A., Dehpanah, M.T., Taghavifard, S., Shafiee, Providing a framework for reforming premium rates of vehicle collision coverage using neural networks model: a case study of Asia Insurance Company, Journal of Information Technology Management, 8 (2017), pp. 711-732. Available from: https://sid.ir/paper/140340/en (in Persian). [25] K.P.M.LP., Weerasinghe, M.C., Wijegunasekara, A comparative study of data mining algorithms in the prediction of auto insurance claims, European International Journal of Science and Technology, 5 (2016), pp. 47-54. [26] M.V., Wuthrich, M., Merz ¨ , Statistical foundations of actuarial learning and its applications, Springer Nature, 2023. [27] S., Wuyu, P., Cerna, Risk assessment predictive modelling in insurance industry using data mining, Software Engineering, 6 (2019), 121. | ||
|
آمار تعداد مشاهده مقاله: 643 تعداد دریافت فایل اصل مقاله: 158 |
||