E-Commerce Product Delivery Prediction

Project Overview

The aim of this project is to predict whether products from an international e-commerce company will reach customers on time or experience delays. Additionally, the project analyzes various factors influencing product delivery and studies customer behavior. The company primarily sells electronic products.

Data Dictionary

Variable Description
ID ID Number of Customers
Warehouse_block The Company have big Warehouse which is divided into block such as A,B,C,D,E
Mode_of_Shipment The Company Ships the products in multiple way such as Ship, Flight and Road
Customer_care_calls The number of calls made from enquiry for enquiry of the shipment
Customer_rating The company has rated from every customer. 1 is the lowest (Worst), 5 is the highest (Best)
Cost_of_the_Product Cost of the Product in US Dollars
Prior_purchases The Number of Prior Purchase
Product_importance The company has categorized the product in the various parameter such as low, medium, high
Gender Male and Female
Discount_offered Discount offered on that specific product
Weight_in_gms It is the weight in grams
Reached.on.Time_Y.N It is the target variable, where 1 Indicates that the product has NOT reached on time and 0 indicates it has reached on time

Conclusion

The aim of the project was to predict whether the product from an e-commerce company will reach on time or not. This project also analyzes various factors that affect the delivery of the product as well as studies the customer behavior. From the exploratory data analysis, I found that the product weight and cost have an impact on the product delivery. Where products that weigh between 2500 - 3500 grams and have a cost less than 250 dollars had a higher rate of being delivered on time. Most of the products were shipped from Warehouse F though ship, so it is quite possible that Warehouse F is close to a seaport.

The customer's behavior also helps in predicting the timely delivery of the product. The more the customer calls, the higher the chances the product delivery is delayed. Interestingly, the customers who have done more prior purchases have a higher count of products delivered on time, and this is the reason that they are purchasing again from the company. The products that have 0-10% discount have a higher count of products delivered late, whereas products that have a discount more than 10% have a higher count of products delivered on time.

Coming to the machine learning models, the decision tree classifier had the highest accuracy among the other models, with accuracy of 69%. The random forest classifier and logistic regression had accuracy of 68% and 67% respectively. The K Nearest Neighbors had the lowest accuracy of 65%.

View Notebook Download Dataset