Hotel Reservations Cancellation Prediction
Project Overview
The aim of this project is to predict the possible reservations that are going to be canceled by the customers by analyzing various features and variables associated with the reservation. Online hotel reservation channels have dramatically changed booking possibilities and customers' behavior. A significant number of hotel reservations are called off due to cancellations or no-shows. The typical reasons for cancellations include change of plans, scheduling conflicts, etc. This is often made easier by the option to do so free of charge or preferably at a low cost, which is beneficial to hotel guests but it is a less desirable and possibly revenue-diminishing factor for hotels to deal with.
Data Dictionary
Column Name | Description |
---|---|
Booking_ID | unique identifier of each booking |
no_of_adults | number of adults |
no_of_children | number of children |
no_of_weekend_nights | number of weekend nights (Saturday or Sunday) the guest stayed or booked to stay at the hotel |
no_of_week_nights | number of week nights (Monday to Friday) the guest stayed or booked to stay at the hotel |
meal_type | meal type booked by the customer |
required_car_parking_spaces | Does the customer require a car parking space? (0 - No, 1- Yes) |
lead_time | Number of days between the date of booking and the arrival date |
arrival_year | Year of arrival |
arrival_month | Month of arrival |
arrival_date | Date of arrival |
market_segment | Market segment designation |
repeated_guest Is the customer a repeated guest? (0 - No, 1- Yes) | |
no_previous_cancellations | Number of previous bookings that were canceled by the customer prior to the current booking |
previous_bookings_not_canceled | Number of previous bookings not canceled by the customer prior to the current booking |
avg_price_per_room | Average price per day of the reservation; prices of the rooms are dynamic. (in euros) |
no_of_special_requests | Total number of special requests made by the customer (e.g. high floor, view from the room, etc) |
booking_status | Flag indicating if the booking was canceled or not |
Conclusion
From the Exploratory Data Analysis, I came to know that, most of the reservations were made for 2 adults with no children which could probably for a couple had the highest cancellation count. In addition to that, the cancellation count of reservations decreases when there are children involved. Most of the reservations were made for week nights and had exponentially higher cancellations as compared to those made for weekend nights.
The year 2018 had higher cancellation rate as compared to 2017, with most of the cancellation done in the month of July and October. Upon visualization of the services opted during reservation with booking status, it was found that the services opted during reservation do not have any impact on the reservation cancellation.
The lead time had a huge impact on the reservation cancellation, which has been evident from feature importance as well. The guests who have lead time very less are less likely to cancel the reservation as compared to the guests who have more lead time. Therefore, with increased lead time, the guests have more time to think about the reservation and thus they are more likely to cancel the reservation. So, the hotel should try to take reservations for shorter lead time.
The market segment of the reservation also had an impact on the reservation cancellation. The reservations made through online platforms had the highest number of cancellations. This highlights the hotel's reputation and presence on online platforms. The hotel should try to improve its reputation on online platforms to reduce the reservation cancellation.
Coming to the classification models, I have used Decision Tree Classifier, Random Forest Classifier, Logistic Regression for predicting the reservation cancellation. The Decision Tree Classifier had the highest accuracy i.e. 85% among all the models.
View Notebook Download Dataset