E-Commerce Product Delivery Prediction
Project Overview
The aim of this project is to predict whether products from an international e-commerce company will reach customers on time or experience delays. Additionally, the project analyzes various factors influencing product delivery and studies customer behavior. The company primarily sells electronic products.
Data Dictionary
Variable | Description |
---|---|
ID | ID Number of Customers |
Warehouse_block | The Company have big Warehouse which is divided into block such as A,B,C,D,E |
Mode_of_Shipment | The Company Ships the products in multiple way such as Ship, Flight and Road |
Customer_care_calls | The number of calls made from enquiry for enquiry of the shipment |
Customer_rating | The company has rated from every customer. 1 is the lowest (Worst), 5 is the highest (Best) |
Cost_of_the_Product | Cost of the Product in US Dollars |
Prior_purchases | The Number of Prior Purchase |
Product_importance | The company has categorized the product in the various parameter such as low, medium, high |
Gender | Male and Female |
Discount_offered | Discount offered on that specific product |
Weight_in_gms | It is the weight in grams |
Reached.on.Time_Y.N | It is the target variable, where 1 Indicates that the product has NOT reached on time and 0 indicates it has reached on time |
Conclusion
The aim of the project was to predict whether the product from an e-commerce company will reach on time or not. This project also analyzes various factors that affect the delivery of the product as well as studies the customer behavior. From the exploratory data analysis, I found that the product weight and cost have an impact on the product delivery. Where products that weigh between 2500 - 3500 grams and have a cost less than 250 dollars had a higher rate of being delivered on time. Most of the products were shipped from Warehouse F though ship, so it is quite possible that Warehouse F is close to a seaport.
The customer's behavior also helps in predicting the timely delivery of the product. The more the customer calls, the higher the chances the product delivery is delayed. Interestingly, the customers who have done more prior purchases have a higher count of products delivered on time, and this is the reason that they are purchasing again from the company. The products that have 0-10% discount have a higher count of products delivered late, whereas products that have a discount more than 10% have a higher count of products delivered on time.
Coming to the machine learning models, the decision tree classifier had the highest accuracy among the other models, with accuracy of 69%. The random forest classifier and logistic regression had accuracy of 68% and 67% respectively. The K Nearest Neighbors had the lowest accuracy of 65%.
View Notebook Download Dataset