Bank Marketing Campaign Classification

Addressing Imbalanced Classification with Multiple Models and Feature Selection Strategies

Objective

This repository aims to address the problem of imbalanced classification by implementing and evaluating different machine learning models combined with feature selection strategies. The process involves tuning hyperparameters, selecting the best subset of features, and calibrating model probabilities to achieve optimal performance.

Dataset Description

The dataset used in this repository comes from a bank marketing campaign. The goal is to predict whether a client will subscribe to a term deposit based on various features. The dataset includes both categorical and numerical features:

ID: Unique identifier for each client.
age: Age of the client.
job: Type of job.
marital: Marital status.
education: Level of education.
default: Whether the client has credit in default.
balance: Account balance.
housing: Whether the client has a housing loan.
loan: Whether the client has a personal loan.
contact: Contact communication type.
day: Last contact day of the month.
month: Last contact month of the year.
duration: Duration of the last contact in seconds.
campaign: Number of contacts performed during this campaign.
pdays: Number of days since the client was last contacted from a previous campaign.
previous: Number of contacts performed before this campaign.
poutcome: Outcome of the previous marketing campaign.
subscribed: Target variable indicating whether the client subscribed to a term deposit.

Methodology

1. Data Preprocessing

Encoding Categorical Features: Used Label Encoding.
Handling Numerical Features: Applied log transformation and Yeo-Johnson transformation.
Scaling Features: Employed StandardScaler and RobustScaler.

2. Initial Evaluation for Imbalanced Classification Approaches

Methods Evaluated:
- SMOTETomek
- Class Weights
- Calibrated Probability
Validation: 3-repeated stratified 5-fold cross-validation.
Metric: AUPRC (Average Precision Score).

3. Hyperparameter Tuning and Feature Selection

Models Implemented:
- Random Forest
- XGBoost
- KNN
- Logistic Regression
Feature Selection Strategies:
- Mutual Information
- Pearson Correlation
Optimization Techniques:
- Optuna
Calibration: Used CalibratedClassifierCV for probability calibration.
Evaluation Metrics:
- AUPRC (Average Precision Score)
- ROC AUC
- Confusion Matrix

Outcome

The project identified several key findings:

Label Encoding proved to be more effective compared to other encoding methods.
Calibrated Probability method performed better than SMOTETomek and Class Weights in addressing the imbalanced classification problem.
Mutual Information (MI) as the feature selection strategy enhanced the AUPRC compared to Pearson Correlation.
The number of trials for optimization was set to 5, reducing execution time and resources.
KNN with 10 out of 17 features using MI feature selection achieved a 100% AUPRC, demonstrating the highest performance among the models and feature selection strategies evaluated.

This comprehensive approach to handling imbalanced classification, optimizing feature selection, and tuning model hyperparameters has led to significant improvements in predictive performance for the bank marketing campaign dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
TermDepositSubscriptionPredictor- Ali Amini.ipynb		TermDepositSubscriptionPredictor- Ali Amini.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bank Marketing Campaign Classification

Addressing Imbalanced Classification with Multiple Models and Feature Selection Strategies

Objective

Dataset Description

Methodology

1. Data Preprocessing

2. Initial Evaluation for Imbalanced Classification Approaches

3. Hyperparameter Tuning and Feature Selection

Outcome

About

Releases

Packages

Languages

AliAmini93/Bank-Marketing-Campaign-Classification

Folders and files

Latest commit

History

Repository files navigation

Bank Marketing Campaign Classification

Addressing Imbalanced Classification with Multiple Models and Feature Selection Strategies

Objective

Dataset Description

Methodology

1. Data Preprocessing

2. Initial Evaluation for Imbalanced Classification Approaches

3. Hyperparameter Tuning and Feature Selection

Outcome

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages