Wine Quality
Prediction

Project Overview

This data science project aims to predict the quality of red variants of the Portuguese "Vinho Verde" wine. The dataset consists of physicochemical (input) and sensory (output) variables, with no information about grape types, wine brand, or selling price due to privacy and logistic constraints

The dataset can be approached as either a classification or regression task. The quality scores are ordered and not evenly distributed, meaning there are significantly more normal wines than excellent or poor ones.

About the Dataset:

The dataset used in this project is sourced from the UCI Machine Learning Repository and is also available for convenience on Kaggle. However, if there are any licensing concerns, the dataset will be promptly removed upon request. Additional details can be found in the reference [Cortez et al., 2009].

Input Variables (Physicochemical Tests)

  • Fixed acidity
  • Volatile acidity
  • Citric acid
  • Residual sugar
  • Chlorides
  • Free sulfur dioxide
  • Total sulfur dioxide
  • Density
  • pH
  • Sulphates
  • Alcohol
  • Output Variable (Sensory Data)

    Quality (score between 0 and 10) For further insights and analysis, refer to the research paper by Cortez et al. (2009). This project aims to leverage machine learning techniques to build a predictive model that can estimate the quality of red wines based on their physicochemical properties and sensory attributes.

    View Notebook Download Dataset