Forecasting Parking Availability in Aarhus

The Problem: Hunting for Spots Wastes Time and Fuel

On busy evenings in Aarhus, drivers circle lots hoping for a free space. The city exposes real-time parking availability, but the feed is reactive—you only know a lot is full after you are already there. I wanted to turn that feed into foresight: where will spaces open in the next 30 minutes?

Key insight: Parking demand follows strong weekly and hourly seasonality. Pairing the city feed with weather and trend features was enough to predict openings with single-digit MAPE for most lots.

Data Sources

Aarhus Open Data parking API — free/total spaces and timestamps for each public lot (pulled every 5 minutes).
Weather — precipitation and temperature signals from Open-Meteo to capture rainy-day surges.
Calendar signals — hour-of-day, day-of-week, public holidays, and a rolling trend of arrivals/departures.

Approach

1) Build a reliable ingestion loop

A lightweight scheduled job hits the parking endpoint, normalizes JSON to tabular format, and appends new rows to a parquet store. Each record keeps lot_id, free_spots, total_spots, and a UTC timestamp so downstream models can resample cleanly.

            
import requests, pandas as pd, pendulum

API = "https://api.dataforsyningen.dk/parkering/aarhus?format=json"

raw = requests.get(API, timeout=10).json()

df = pd.json_normalize(raw)

df["ts"] = pendulum.now("UTC")

df = df[["name", "free", "total", "ts"]].rename(columns={"name":"lot_id", "free":"free_spots", "total":"total_spots"})

df.to_parquet("data/parking.parquet", append=True)

2) Engineer signals drivers care about

I resampled each lot to 5-minute intervals and built features: sin/cos encodings for hour-of-day, weekend flags, holiday flags, 30-minute rolling deltas, and weather joins. Missing readings were forward-filled to avoid fragmenting the time series.

3) Mix Prophet with tree models

Prophet handled the weekly and daily seasonality per lot, while gradient boosting captured nonlinear effects from weather and recent flow. I trained both and blended their predictions with a simple weighted average tuned on a validation window.

            
from prophet import Prophet

from xgboost import XGBRegressor

# Prophet for seasonality

prophet_df = lot_df.rename(columns={"ts":"ds", "free_spots":"y"})

prophet_model = Prophet(daily_seasonality=True, weekly_seasonality=True)

prophet_model.fit(prophet_df)

prophet_forecast = prophet_model.predict(future_horizon)[["ds", "yhat"]]

# XGBoost for residual patterns

features = lot_df[feature_cols]

xgb = XGBRegressor(n_estimators=400, max_depth=6, learning_rate=0.05, subsample=0.8, colsample_bytree=0.8)

xgb.fit(features, lot_df["free_spots"])

xgb_pred = xgb.predict(future_features)

# Blend

forecast = 0.6 * prophet_forecast["yhat"].values + 0.4 * xgb_pred

Data Quality Fixes

Outlier clipping: Spikes where free spots exceeded capacity were clipped to total_spots.
Gap filling: 5-15 minute API outages were forward-filled; longer gaps flagged that lot as “stale data” in the UI.
Lot renames: Occasional name changes were keyed to a canonical lot_id map so historical time series stayed intact.

Evaluation: Walk-Forward Backtests

To stay realistic, I used rolling-origin (walk-forward) validation: train on the past 14 days, test on the next day, slide forward. Metrics were tracked per lot so poorly performing locations could be handled differently (e.g., switch to naive persistence if unstable).

            
from sklearn.metrics import mean_absolute_percentage_error

def walk_forward(series, horizon=6):  # 6 x 5min = 30 minutes

    errors = []

    for split in splits:  # pre-computed rolling windows

        train, test = series[:split.train_end], series[split.train_end:split.test_end]

        model = fit_model(train)

        forecast = model.predict(horizon)

        errors.append(mean_absolute_percentage_error(test[:horizon], forecast))

    return sum(errors)/len(errors)

lot_mapes = {lot: walk_forward(df[df.lot_id==lot].free_spots) for lot in lots}

Busy downtown lots (Dokk1, Navitas) landed in the 6–9% MAPE band; smaller lots with erratic closures performed worse, so the UI marks them with a “use with caution” badge instead of hiding the forecast.

Dashboard Experience

Lot picker with live free/total counts and a sparkline of the last 24 hours.
30-minute forecast chart plus a traffic-light badge (green >40% free, amber 20-40%, red <20%).
Map view that highlights lots expected to open up soon, not just those currently free.
Downloadable CSV so the city team can audit historical demand patterns.

How It Gets Used

A typical workflow: a driver heading to the city center opens the app at 17:10. Dokk1 shows 4 free spots now but forecasted to drop to 0 in 10 minutes; Navitas shows 12 spots rising to 18. The app nudges the driver to Navitas, avoiding a loop around Dokk1. In user tests, this reduced “circling” time from ~9 minutes to ~3 minutes for evening visits.

Results

The blended approach outperformed a naive “use the last reading” baseline and single-model variants.

6-9% MAPE on busy lots

12% Error reduction vs. baseline

5 min Data refresh cadence

30 min Forecast horizon

"Drivers do not need perfect precision—they need a reliable nudge toward the next best lot before they start circling."

Key Learnings

Seasonality dominates. Hour-of-day and day-of-week explained most variance; weather mattered mainly during storms.
Blends beat solo models. Prophet captured rhythms; trees captured abrupt swings (events, showers).
Latency is a feature. A 5-minute ingestion loop kept predictions trustworthy; slower intervals hurt adoption in testing.

Operationalization

GitHub Actions cron pulls data every 5 minutes and rebuilds forecasts; failures fall back to the last successful run.
Streamlit front-end caches the most recent parquet snapshot to serve instantly while new forecasts load asynchronously.
Health checks ping the API and weather feed; alerts fire to email/Slack if either is down for >15 minutes.

What I Would Improve Next

Add event calendars (concerts, football) to pre-empt demand spikes.
Promote best alternative lots when a favorite lot is forecasted to stay full.
Cache weather forecasts locally to decouple from API rate limits.

Technology Stack

Python Prophet XGBoost Pandas Streamlit Open-Meteo API Aarhus Open Data GitHub Actions

View on GitHub View in Portfolio Discuss This Project

Forecasting Parking Availability in Aarhus: Live Data to 30-Minute Outlooks