Back to Data Stories

Case Study

Predicting Customer Churn Before It Happens: An AI-Powered Approach

Customer Churn Prediction Dashboard

The final dashboard showing churn predictions and SHAP explanations

The Problem: Silent Customer Exits

Every business knows the pain of customer churn. A customer who was active last month suddenly disappears. No complaints, no feedback—just gone. By the time you notice, it's too late. The cost of acquiring a new customer is 5-7x higher than retaining an existing one.

The real question isn't "Who churned?"—it's "Who is about to churn, and why?"

💡 Key Insight: Companies that proactively identify at-risk customers can reduce churn by 15-25% through targeted retention campaigns.

The Challenge

A telecom company approached me with a classic problem: they had thousands of customers, limited retention budget, and no systematic way to identify who needed attention. Their customer service team was reactive—waiting for complaints instead of preventing departures.

They needed a solution that could:

  • Predict which customers were likely to churn in the next 30 days
  • Explain why each customer was at risk (not just a black-box score)
  • Recommend personalized retention actions
  • Be easy for non-technical staff to use

My Approach

1. Data Understanding & Preparation

The dataset included customer demographics, account information, service usage, and billing history. I started with exploratory data analysis to understand the churn landscape:

7,043 Total Customers
26.5% Churn Rate
19 Features Used
3 Models Compared

Key findings from EDA revealed that customers with month-to-month contracts, electronic check payments, and fiber optic internet had significantly higher churn rates. These insights already hinted at potential intervention strategies.

2. Model Selection with AutoML

Rather than manually tuning hyperparameters, I used PyCaret's AutoML capabilities combined with Optuna for hyperparameter optimization. This approach automatically compared multiple algorithms—XGBoost, LightGBM, CatBoost, and more—to find the best performer.

# AutoML model comparison with PyCaret + Optuna
from pycaret.classification import *

setup(data, target='Churn', session_id=42)
best_model = compare_models(sort='AUC', n_select=1)

# Tune with Optuna for optimal hyperparameters
tuned_model = tune_model(best_model, optimize='AUC')

PyCaret tested Logistic Regression, Random Forest, XGBoost, LightGBM, and CatBoost. The gradient boosting models consistently outperformed others, with XGBoost achieving the best AUC score after Optuna optimization.

3. Making the Model Explainable with SHAP

A prediction is only useful if stakeholders trust and understand it. I integrated SHAP (SHapley Additive exPlanations) to provide feature-level explanations for every prediction.

"The most powerful prediction is useless if no one believes it or knows how to act on it."

SHAP values revealed that these factors had the strongest influence on churn:

  1. Contract type — Month-to-month customers are 3x more likely to churn
  2. Tenure — First 12 months are the danger zone
  3. Monthly charges — High bills without perceived value trigger exits
  4. Tech support usage — Customers who never contact support may be disengaged
  5. Payment method — Electronic check users churn more (possibly due to payment friction)

4. AI-Powered Retention Recommendations

Here's where it gets interesting. Instead of just flagging at-risk customers, I integrated OpenAI's API to generate personalized retention strategies and explain model insights in plain English. The AI analyzes each customer's profile and their specific churn drivers to suggest actionable interventions.

Example AI Recommendation: "Customer #4521 has a 78% churn probability. Primary driver: Month-to-month contract with high monthly charges ($89). Suggested action: Offer a 12-month contract upgrade with 15% discount, emphasizing the annual savings of $160. Include free premium tech support for 3 months."

5. Building the Interactive Dashboard

I built the entire solution as a Streamlit web application with a clean dark-themed interface, deployed live on DigitalOcean App Platform. The dashboard enables business users to:

  • Upload any CSV customer dataset directly in the app
  • Automatically train and optimize models using PyCaret AutoML
  • View dataset summary (rows, columns, missing values, data types) and model KPIs
  • Explore SHAP visual breakdowns of key churn risk features
  • Generate AI-powered retention suggestions in plain English

Technology Stack

Python 3.10+ Streamlit PyCaret 3.3.0 Optuna XGBoost LightGBM CatBoost SHAP OpenAI API DigitalOcean

Results & Impact

The deployed solution demonstrated measurable business value:

84% Model AUC
76% Precision
68% Recall
~20% Est. Churn Reduction

Key Learnings

  1. Explainability builds trust. SHAP visualizations helped business teams understand and accept model predictions.
  2. AutoML accelerates iteration. PyCaret let me test dozens of model configurations in minutes, not days.
  3. AI augments, not replaces. GPT-generated recommendations gave customer service teams a starting point, not a script.
  4. Simple UI drives adoption. Streamlit made the solution accessible to non-technical users immediately.

What's Next?

Future enhancements planned:

  • Persistent model storage with cloud database (PostgreSQL / S3)
  • Customer segmentation and advanced retention strategy generation
  • Multi-user authentication and dashboard access control
  • Customer lifetime value (CLV) prediction alongside churn
View on GitHub View in Portfolio Discuss This Project

Comments