← Back to Home

☢️ Air Quality Index (AQI) Prediction Using Random Forest Regressor

October, 2025

🧠 Overview

This project aims to predict the next day’s Air Quality Index (AQI) using historical pollutant measurements. The dataset contains hourly readings of pollutants such as CO, NH₃, NO₂, O₃, PM10, PM2.5, and SO₂, aggregated into daily averages for modeling.

A Random Forest Regressor was used to capture nonlinear relationships between pollutant levels and AQI. Features include both pollutant concentrations and temporal attributes such as month and weekday.

💡 Motivation

With Diwali approaching, I wanted to observe how AQI levels deviate during the festival period — since air quality typically worsens significantly due to fireworks and increased emissions.

My blog 🔗Effects of Diwali on AQI: Insights from my model is targeted around this.

⚙️ Tech Stack

📊 Model Performance

✅ Pre-Festival (Seen Data)

Predictions before Diwali were highly accurate, as shown by standard regression metrics:

Predicted vs Actual Graph:

Prediction Graph

⚠️ Festival Period (Unseen Data)

Predictions during Diwali were less consistent, often off by 20–25 AQI points. This occurred because Diwali arrived earlier than usual and month/weekday features couldn’t effectively capture the sudden festival-related changes.

💢 Stress Test (Removing Previous Day AQI Feature)

In the feature importance graph, AQI was the most important feature. We did a stress test to see model performance without this key feature.

With AQI Feature:

With AQI Feature

Without AQI Feature:

Without AQI Feature

🧩 Summary

🔗 View on GitHub