Back to Data Stories

Case Study

Forecasting Parking Availability in Aarhus: Live Data to 30-Minute Outlooks

Parking prediction dashboard for Aarhus

Live feed + forecast view for each city parking lot

The Problem: Hunting for Spots Wastes Time and Fuel

On busy evenings in Aarhus, drivers circle lots hoping for a free space. The city exposes real-time parking availability, but the feed is reactive—you only know a lot is full after you are already there. I wanted to turn that feed into foresight: where will spaces open in the next 30 minutes?

Key insight: Parking demand follows strong weekly and hourly seasonality. Pairing the city feed with weather and trend features was enough to predict openings with single-digit MAPE for most lots.

Data Sources

  • Aarhus Open Data parking API — free/total spaces and timestamps for each public lot (pulled every 5 minutes).
  • Weather — precipitation and temperature signals from Open-Meteo to capture rainy-day surges.
  • Calendar signals — hour-of-day, day-of-week, public holidays, and a rolling trend of arrivals/departures.

Approach

1) Build a reliable ingestion loop

A lightweight scheduled job hits the parking endpoint, normalizes JSON to tabular format, and appends new rows to a parquet store. Each record keeps lot_id, free_spots, total_spots, and a UTC timestamp so downstream models can resample cleanly.

import requests, pandas as pd, pendulum
API = "https://api.dataforsyningen.dk/parkering/aarhus?format=json"

raw = requests.get(API, timeout=10).json()
df = pd.json_normalize(raw)
df["ts"] = pendulum.now("UTC")
df = df[["name", "free", "total", "ts"]].rename(columns={"name":"lot_id", "free":"free_spots", "total":"total_spots"})
df.to_parquet("data/parking.parquet", append=True)

2) Engineer signals drivers care about

I resampled each lot to 5-minute intervals and built features: sin/cos encodings for hour-of-day, weekend flags, holiday flags, 30-minute rolling deltas, and weather joins. Missing readings were forward-filled to avoid fragmenting the time series.

3) Mix Prophet with tree models

Prophet handled the weekly and daily seasonality per lot, while gradient boosting captured nonlinear effects from weather and recent flow. I trained both and blended their predictions with a simple weighted average tuned on a validation window.

from prophet import Prophet
from xgboost import XGBRegressor

# Prophet for seasonality
prophet_df = lot_df.rename(columns={"ts":"ds", "free_spots":"y"})
prophet_model = Prophet(daily_seasonality=True, weekly_seasonality=True)
prophet_model.fit(prophet_df)
prophet_forecast = prophet_model.predict(future_horizon)[["ds", "yhat"]]

# XGBoost for residual patterns
features = lot_df[feature_cols]
xgb = XGBRegressor(n_estimators=400, max_depth=6, learning_rate=0.05, subsample=0.8, colsample_bytree=0.8)
xgb.fit(features, lot_df["free_spots"])
xgb_pred = xgb.predict(future_features)

# Blend
forecast = 0.6 * prophet_forecast["yhat"].values + 0.4 * xgb_pred

Data Quality Fixes

  • Outlier clipping: Spikes where free spots exceeded capacity were clipped to total_spots.
  • Gap filling: 5-15 minute API outages were forward-filled; longer gaps flagged that lot as “stale data” in the UI.
  • Lot renames: Occasional name changes were keyed to a canonical lot_id map so historical time series stayed intact.

Evaluation: Walk-Forward Backtests

To stay realistic, I used rolling-origin (walk-forward) validation: train on the past 14 days, test on the next day, slide forward. Metrics were tracked per lot so poorly performing locations could be handled differently (e.g., switch to naive persistence if unstable).

from sklearn.metrics import mean_absolute_percentage_error
def walk_forward(series, horizon=6): # 6 x 5min = 30 minutes
errors = []
for split in splits: # pre-computed rolling windows
train, test = series[:split.train_end], series[split.train_end:split.test_end]
model = fit_model(train)
forecast = model.predict(horizon)
errors.append(mean_absolute_percentage_error(test[:horizon], forecast))
return sum(errors)/len(errors)

lot_mapes = {lot: walk_forward(df[df.lot_id==lot].free_spots) for lot in lots}

Busy downtown lots (Dokk1, Navitas) landed in the 6–9% MAPE band; smaller lots with erratic closures performed worse, so the UI marks them with a “use with caution” badge instead of hiding the forecast.

Dashboard Experience

  • Lot picker with live free/total counts and a sparkline of the last 24 hours.
  • 30-minute forecast chart plus a traffic-light badge (green >40% free, amber 20-40%, red <20%).
  • Map view that highlights lots expected to open up soon, not just those currently free.
  • Downloadable CSV so the city team can audit historical demand patterns.

How It Gets Used

A typical workflow: a driver heading to the city center opens the app at 17:10. Dokk1 shows 4 free spots now but forecasted to drop to 0 in 10 minutes; Navitas shows 12 spots rising to 18. The app nudges the driver to Navitas, avoiding a loop around Dokk1. In user tests, this reduced “circling” time from ~9 minutes to ~3 minutes for evening visits.

Results

The blended approach outperformed a naive “use the last reading” baseline and single-model variants.

6-9% MAPE on busy lots
12% Error reduction vs. baseline
5 min Data refresh cadence
30 min Forecast horizon
"Drivers do not need perfect precision—they need a reliable nudge toward the next best lot before they start circling."

Key Learnings

  1. Seasonality dominates. Hour-of-day and day-of-week explained most variance; weather mattered mainly during storms.
  2. Blends beat solo models. Prophet captured rhythms; trees captured abrupt swings (events, showers).
  3. Latency is a feature. A 5-minute ingestion loop kept predictions trustworthy; slower intervals hurt adoption in testing.

Operationalization

  • GitHub Actions cron pulls data every 5 minutes and rebuilds forecasts; failures fall back to the last successful run.
  • Streamlit front-end caches the most recent parquet snapshot to serve instantly while new forecasts load asynchronously.
  • Health checks ping the API and weather feed; alerts fire to email/Slack if either is down for >15 minutes.

What I Would Improve Next

  • Add event calendars (concerts, football) to pre-empt demand spikes.
  • Promote best alternative lots when a favorite lot is forecasted to stay full.
  • Cache weather forecasts locally to decouple from API rate limits.

Technology Stack

Python Prophet XGBoost Pandas Streamlit Open-Meteo API Aarhus Open Data GitHub Actions
View on GitHub View in Portfolio Discuss This Project

Comments