Telecom Customer Churn Prediction

Project Overview

The aim of this project is to analyze customer demographics, services, tenure, and other variables to predict whether a particular customer will churn or not.

Data Dictionary

Variable Description
CustomerID Unique customer ID
Gender Customer's gender
SeniorCitizen Whether the customer is a senior citizen or not (1, 0)
Partner Whether the customer has a partner or not (Yes, No)
Dependents Whether the customer has dependents or not (Yes, No)
Tenure Number of months the customer has stayed with the company
PhoneService Whether the customer has a phone service or not (Yes, No)
MultipleLines Whether the customer has multiple lines or not (Yes, No, No phone service)
InternetService Customer’s internet service provider (DSL, Fiber optic, No)
OnlineSecurity Whether the customer has online security or not (Yes, No, No internet service)
OnlineBackup Whether the customer has online backup or not (Yes, No, No internet service)
DeviceProtection Whether the customer has device protection or not (Yes, No, No internet service)
TechSupport Whether the customer has tech support or not (Yes, No, No internet service)
StreamingTV Whether the customer has streaming TV or not (Yes, No, No internet service)
StreamingMovies Whether the customer has streaming movies or not (Yes, No, No internet service)
Contract The contract term of the customer (Month-to-month, One year, Two year)
PaperlessBilling Whether the customer has paperless billing or not (Yes, No)
PaymentMethod The customer’s payment method (Electronic check, Mailed check, Bank transfer (automatic), Credit card (automatic))
MonthlyCharges The amount charged to the customer monthly
TotalCharges The total amount charged to the customer
Churn Whether the customer churned or not (Yes or No)

Conclusion

From the exploratory data analysis, I came to know that senior citizens have a lower churn count, whereas customers who are single or don't have dependents have a higher churn count. In addition to that, customers are more satisfied with the streaming services than other services such as Online backup and Device protection, which has resulted in a lower churn count in customers with streaming services than other services.

The tenure has an inverse relation with churn count, where customers with tenure shorter than 5 months have a higher churn count. Moreover, customers with a month-to-month contract have a higher churn count compared to those with one or two-year contracts, which also proves that customers who have a longer contract with the company have a lower churn count.

It has been observed that customers with higher monthly charges and lower total charges have a higher churn count. Therefore, the company should focus on lowering the monthly charges for the customers in order to reduce the churn count. From the feature importance, it is clear that the tenure, contract, monthly charges, and total charges are the most important features for predicting customer churn. Therefore, the company should focus on these features to reduce customer churn.

Coming to the machine learning models, I have used three models - Decision Tree Classifier, Random Forest Classifier, and K Nearest Neighbors Classifier. The Random Forest Classifier has the highest accuracy i.e. 82% and F1 Score, and lowest mean squared error, mean absolute error. Therefore, the Random Forest Classifier is a good fit for predicting customer churn.

View Notebook Download Dataset