☢️ Air Quality Index (AQI) Prediction Using Random Forest Regressor
October, 2025
🧠 Overview
This project aims to predict the next day’s Air Quality Index (AQI) using historical pollutant measurements. The dataset contains hourly readings of pollutants such as CO, NH₃, NO₂, O₃, PM10, PM2.5, and SO₂, aggregated into daily averages for modeling.
A Random Forest Regressor was used to capture nonlinear relationships between pollutant levels and AQI. Features include both pollutant concentrations and temporal attributes such as month and weekday.
💡 Motivation
With Diwali approaching, I wanted to observe how AQI levels deviate during the festival period — since air quality typically worsens significantly due to fireworks and increased emissions.
My blog 🔗Effects of Diwali on AQI: Insights from my model is targeted around this.
⚙️ Tech Stack
- Python
- scikit-learn
- pandas, numpy
- matplotlib, seaborn
📊 Model Performance
✅ Pre-Festival (Seen Data)
Predictions before Diwali were highly accurate, as shown by standard regression metrics:
- MAE, MSE, and R² scores indicated solid performance (R² ≈ 0.86 on seen data).
Predicted vs Actual Graph:
⚠️ Festival Period (Unseen Data)
Predictions during Diwali were less consistent, often off by 20–25 AQI points. This occurred because Diwali arrived earlier than usual and month/weekday features couldn’t effectively capture the sudden festival-related changes.
💢 Stress Test (Removing Previous Day AQI Feature)
In the feature importance graph, AQI was the most important feature. We did a stress test to see model performance without this key feature.
With AQI Feature:
Without AQI Feature:
🧩 Summary
- Model: Random Forest Regressor
- Data: Daily averages of major air pollutants
- Goal: Next-day AQI prediction
- Result: Strong general accuracy (R² ≈ 0.86) on seen data; reduced accuracy during unmodeled festival events