Project Type: Machine Learning | Classification
Tools Used: Python, Pandas, Matplotlib, Seaborn, Scikit-Learn, XGBoost
Dataset: Telco Customer Churn (Kaggle)

Objective

Customer churn directly impacts a company’s revenue and growth. This project builds a predictive model to identify high-risk customers, enabling proactive retention strategies.

Key Steps

  • Exploratory Data Analysis (EDA): Identified key trends in customer behavior and churn rates.
  • Data Preprocessing & Feature Engineering: Handled missing values, transformed categorical variables, and engineered features for better model performance.
  • Model Training & Evaluation: Compared Logistic Regression, Random Forest, SVM, and XGBoost, optimizing for recall and AUC-ROC.

Results

  • Best Model: Logistic Regression (Highest explainability, strong performance)
  • Key Insights: Customers with fiber optic internet and electronic check payments had higher churn rates.
  • Business Impact: Predicting churn allows companies to proactively retain customers, reducing revenue loss.

🔗 View my blog post: Link
🔗 View Notebook on GitHub: GitHub Link
🔗 View Interactive Notebook (nbviewer): nbviewer Link

Updated: