Hotel Reservations Cancellation Prediction

Project Overview

The aim of this project is to predict the possible reservations that are going to be canceled by the customers by analyzing various features and variables associated with the reservation. Online hotel reservation channels have dramatically changed booking possibilities and customers' behavior. A significant number of hotel reservations are called off due to cancellations or no-shows. The typical reasons for cancellations include change of plans, scheduling conflicts, etc. This is often made easier by the option to do so free of charge or preferably at a low cost, which is beneficial to hotel guests but it is a less desirable and possibly revenue-diminishing factor for hotels to deal with.

Data Dictionary

Column Name Description
Booking_ID unique identifier of each booking
no_of_adults number of adults
no_of_children number of children
no_of_weekend_nights number of weekend nights (Saturday or Sunday) the guest stayed or booked to stay at the hotel
no_of_week_nights number of week nights (Monday to Friday) the guest stayed or booked to stay at the hotel
meal_type meal type booked by the customer
required_car_parking_spaces Does the customer require a car parking space? (0 - No, 1- Yes)
lead_time Number of days between the date of booking and the arrival date
arrival_year Year of arrival
arrival_month Month of arrival
arrival_date Date of arrival
market_segment Market segment designation
repeated_guest Is the customer a repeated guest? (0 - No, 1- Yes)
no_previous_cancellations Number of previous bookings that were canceled by the customer prior to the current booking
previous_bookings_not_canceled Number of previous bookings not canceled by the customer prior to the current booking
avg_price_per_room Average price per day of the reservation; prices of the rooms are dynamic. (in euros)
no_of_special_requests Total number of special requests made by the customer (e.g. high floor, view from the room, etc)
booking_status Flag indicating if the booking was canceled or not

Conclusion

From the Exploratory Data Analysis, I came to know that, most of the reservations were made for 2 adults with no children which could probably for a couple had the highest cancellation count. In addition to that, the cancellation count of reservations decreases when there are children involved. Most of the reservations were made for week nights and had exponentially higher cancellations as compared to those made for weekend nights.

The year 2018 had higher cancellation rate as compared to 2017, with most of the cancellation done in the month of July and October. Upon visualization of the services opted during reservation with booking status, it was found that the services opted during reservation do not have any impact on the reservation cancellation.

The lead time had a huge impact on the reservation cancellation, which has been evident from feature importance as well. The guests who have lead time very less are less likely to cancel the reservation as compared to the guests who have more lead time. Therefore, with increased lead time, the guests have more time to think about the reservation and thus they are more likely to cancel the reservation. So, the hotel should try to take reservations for shorter lead time.

The market segment of the reservation also had an impact on the reservation cancellation. The reservations made through online platforms had the highest number of cancellations. This highlights the hotel's reputation and presence on online platforms. The hotel should try to improve its reputation on online platforms to reduce the reservation cancellation.

Coming to the classification models, I have used Decision Tree Classifier, Random Forest Classifier, Logistic Regression for predicting the reservation cancellation. The Decision Tree Classifier had the highest accuracy i.e. 85% among all the models.

View Notebook Download Dataset