Back to Data Stories

Machine Learning Case Study

Helping Bob Price His Phones: A Classification Journey in R

Mobile Phone Price Classification

Classifying mobile phones into price ranges using machine learning

The Business Problem

Meet Bob—a young entrepreneur who just launched his mobile phone company. He's competing against giants like Apple, Samsung, and Xiaomi. Bob knows that accurate pricing is critical for market entry and customer appeal.

Bob has collected detailed specifications of 2,000 mobile phones from various manufacturers, along with their price categories. His goal isn't to predict exact prices, but to classify each phone into one of four price ranges: Low, Medium, High, or Very High.

The catch? Bob lacks the data science skills to build the model himself. That's where I come in.

🎯 Objective: Build a classification model that predicts price range from technical specs, and identify which features most influence pricing decisions.

Understanding the Data

The dataset contains 21 variables capturing everything from battery capacity to screen dimensions. Let's break down what we're working with:

2,000 Observations
21 Features
4 Price Classes
0 Missing Values

Key Features

ram RAM in MB
battery_power Battery capacity (mAh)
px_height / px_width Screen resolution
int_memory Internal storage (GB)
mobile_wt Weight in grams
four_g / three_g Network support (binary)
pc / fc Primary/Front camera (MP)
n_cores Processor cores

Good news: the dataset is perfectly balanced—exactly 500 phones in each price category. This means we won't need to worry about class imbalance techniques.

Exploratory Data Analysis

Before diving into modeling, I explored the relationships between features and price ranges. Two variables stood out immediately:

# RAM shows clear separation between price ranges
ggplot(data, aes(x = price_range, y = ram, fill = price_range)) +
  geom_boxplot() +
  labs(title = "RAM by Price Range")

# Battery power also correlates with price
ggplot(data, aes(x = price_range, y = battery_power, fill = price_range)) +
  geom_boxplot() +
  labs(title = "Battery Power by Price Range")

The boxplots revealed that RAM has the clearest relationship with price—higher RAM phones consistently fall into higher price categories. Battery power shows a similar but weaker pattern.

Model Building: Two Approaches

Approach 1: Decision Tree

I started with a Decision Tree classifier—intuitive, interpretable, and great for understanding feature importance. Using 5-fold cross-validation, I tuned the complexity parameter (cp) to avoid overfitting.

# 5-fold cross-validation for hyperparameter tuning
ctrl <- trainControl(method = "cv", number = 5)

# Tune cp from 0.001 to 0.05
tuned_tree <- train(price_range ~ ., data = train_data,
    method = "rpart",
    trControl = ctrl,
    tuneGrid = expand.grid(cp = seq(0.001, 0.05, 0.005)))

Approach 2: Support Vector Machine (SVM)

SVMs are powerful for multi-class classification, especially when features have complex relationships. I used a radial basis function (RBF) kernel and tuned the cost and sigma parameters through cross-validation.

# SVM with radial kernel + automatic tuning
svm_tuned <- train(price_range ~ ., data = train_data,
    method = "svmRadial",
    trControl = trainControl(method = "cv", number = 5),
    tuneLength = 5)

Model Comparison

After training both models on 80% of the data and testing on the held-out 20%, here's how they performed:

Model Accuracy Kappa Notes
Decision Tree (tuned) 82.5% 0.77 Interpretable, fast
SVM Radial (tuned) ✓ 96.8% 0.96 Best performer

🏆 Winner: Tuned SVM with 96.8% accuracy! The radial kernel effectively captured non-linear relationships between features and price categories.

What Drives Phone Pricing?

The most actionable insight for Bob: which features should he prioritize? Variable importance analysis from both models pointed to a clear hierarchy:

RAM
100%
Battery Power
78%
Pixel Height
65%
Pixel Width
58%
Mobile Weight
42%
"RAM is the single most important predictor of mobile phone price range. If Bob wants to position a phone as premium, RAM should be his first consideration."

Technology Stack

R caret e1071 (SVM) rpart (Decision Tree) ggplot2 dplyr

Business Recommendations for Bob

  1. RAM First: When positioning a phone in a higher price tier, prioritize RAM above all else. It's the clearest signal of premium pricing.
  2. Battery Matters: A strong battery (3000+ mAh) is expected in mid-to-high tier phones. Don't skimp here.
  3. Screen Resolution: Higher pixel density justifies premium pricing. Invest in display quality for high-end models.
  4. Use the Model: Before launching a new phone, run its specs through the SVM model to predict its natural price category. Pricing outside this range may confuse customers.

Key Learnings

  1. SVM excels at multi-class problems. The radial kernel captured complex feature interactions that the decision tree missed.
  2. Cross-validation prevents overfitting. 5-fold CV ensured our 96.8% accuracy would generalize to new data.
  3. Domain knowledge confirms models. RAM and battery being top predictors aligns with real-world consumer behavior.
  4. Balanced datasets are a gift. No need for SMOTE or class weighting when classes are already equal.

Files & Reproducibility

  • train.csv — Training dataset (2,000 phones)
  • test.csv — Holdout test set for final predictions
  • Mobile_Case.R — Complete R script with all code
  • svm_tuned_model.rds — Saved model for production use
View on GitHub View in Portfolio Discuss This Project

Comments