Crop Yield
Prediction
Project Overview
The aim of this data science project is to predict crop yield using the dataset provided from Crop Yield Prediction. The dataset includes various environmental and agricultural factors such as rainfall, temperature, fertilizer usage, and macronutrient levels, along with the corresponding crop yield in Quintals per acre.
Data Dictionary
Column Name | Description |
---|---|
Rain Fall (mm) | Rainfall in millimeters |
Temperature (C) | Temperature in Celsius |
Fertilizer (kg) | Fertilizer in kilograms |
Nitrogen (N) | Nitrogen macro nutrient |
Phosphorous (P) | Phosphorous macro nutrient |
Potassium (K) | Potassium macro nutrient |
Yield (Q/acres) | Crop yield Quintals per acre |
Data Preprocessing
Upon examining the dataset, it was found that the temperature column contained invalid values represented as ":". These entries were removed, and the column was converted to a float data type. Missing values were replaced with the median of their respective columns.
Exploratory Data Analysis (EDA)
EDA revealed several insights:
- The dataset likely represents two distinct crops, indicated by clusters in graphs of rainfall, temperature, and crop yield.
- Fertilizer usage appears to have a proportional relationship with crop yield, with some exceptions.
- Temperature and rainfall also exhibit distinct patterns possibly corresponding to different crop seasons.
- Macronutrients (Nitrogen, Phosphorus, Potassium) show varying distributions, indicating potential differences in crop requirements.
- Temperature emerges as the most influential factor in predicting crop yield, followed by rainfall.
Model Building and Evaluation
Two regression models, Decision Tree Regressor and Random Forest Regressor, were trained and evaluated. Random Forest Regressor outperformed Decision Tree Regressor, achieving an R2 score of 0.802 compared to 0.77. Both models indicate the importance of temperature and rainfall in predicting crop yield.
Conclusion
The dataset represents two distinct crops with varying environmental and agricultural factors influencing crop yield. Temperature emerges as the most significant predictor, followed by rainfall, while fertilizer and macronutrient levels also play a role. The Random Forest Regressor model proves effective in predicting crop yield based on these factors. However, further analysis could explore additional variables and factors impacting crop yield.
View Notebook Download Dataset