← back

air quality index prediction

october 2025

overview

This project aims to predict the next day's Air Quality Index (AQI) using historical pollutant measurements. The dataset contains hourly readings of pollutants such as CO, NH₃, NO₂, O₃, PM10, PM2.5, and SO₂, aggregated into daily averages for modeling.

A Random Forest Regressor was used to capture nonlinear relationships between pollutant levels and AQI. Features include both pollutant concentrations and temporal attributes such as month and weekday.

motivation

With Diwali approaching, I wanted to observe how AQI levels deviate during the festival period — since air quality typically worsens significantly due to fireworks and increased emissions.

My blog effects of diwali on aqi: insights from my model explores this in detail.

tech stack

model performance

pre-festival (seen data)

Predictions before Diwali were highly accurate, as shown by standard regression metrics:

predicted vs actual

Prediction Graph

festival period (unseen data)

Predictions during Diwali were less consistent, often off by 20–25 AQI points. This occurred because Diwali arrived earlier than usual and month/weekday features couldn't effectively capture the sudden festival-related changes.

  • These two reddit post date just weeks apart, one on 21st oct before the lakshmi pujan and the other on 6nov even a week after diwali festival, momentary fun causes long term impact
  • stress test (removing previous day AQI feature)

    In the feature importance graph, AQI was the most important feature. We did a stress test to see model performance without this key feature.

    with AQI feature

    With AQI Feature

    without AQI feature

    Without AQI Feature

    summary

    → view on github