Warranty Claims Fraud Prediction

Project Overview

The aim of this project is to analyze the warranty claims based on their region, product, claim value, and other features to predict their authenticity. The dataset is taken from Kaggle and contains 358 rows and 21 columns.

Data Dictionary

Column Name Description
Unnamed: 0 Index
Region Region of the claim
State State of the claim
Area Area of the claim
City City of the claim
Consumer_profile Consumer profile Business/Personal
Product_category Product category Household/Entertainment
Product_type Product type AC/TV
AC_1001_Issue 0- No issue / No componenent, 1- repair, 2-replacement
AC_1002_Issue 0- No issue / No componenent, 1- repair, 2-replacement
AC_1003_Issue 0- No issue / No componenent, 1- repair, 2-replacement
TV_2001_Issue 0- No issue / No componenent, 1- repair, 2-replacement
TV_2002_Issue 0- No issue / No componenent, 1- repair, 2-replacement
TV_2003_Issue 0- No issue / No componenent, 1- repair, 2-replacement
Claim_Value Claim value in INR
Service_Center Service center code
Product_Age Product age in days
Purchased_from Purchased from - Dealer, Manufacturer, Internet
Call_details Call duration
Purpose Purpose of the call
Fraud Fraudulent (1) or Genuine (0)

Conclusion

From the exploratory data analysis, it was concluded that fraudulent claims tend to have higher claim values, and certain regions and purchase methods are associated with a higher likelihood of fraudulent claims.

Notable findings from the analysis include:

  • Warranty claims are most frequent in the southern region of India, particularly in Andhra Pradesh and Tamil Nadu.
  • Fraudulent claims are more common in urban regions like Hyderabad and Chennai.
  • TVs had higher warranty claims when purchased for personal use compared to ACs.
  • Fraudulent claims for ACs were made even when there were no issues with AC parts, while in the case of TVs, fraudulent claims occurred both with and without issues in TV parts.
  • Fraudulent claims were more frequent when purchases were made directly through the manufacturer.
  • Fraudulent claims tend to have a higher claim value as compared to genuine claims.
  • Service center 13 had the highest number of fraudulent claims despite having fewer total warranty claims.
  • Fraudulent claims were more frequent when the customer care call duration was less than 3-4 minutes.

Machine learning models, including Decision Tree Classifier, Random Forest Classifier, and Logistic Regression, achieved high accuracy, but there is room for improvement with more data.

View Notebook Download Dataset