๐ Live Research Repository โ PhD Thesis Implementation | Credit Scoring with ML & Fairness Evaluation By Dr. Deepa Shukla, 2025 This repository presents the datasets, models, and fairness-aware algorithms developed as part of my PhD thesis.
๐ข Explore the live project site:
๐ ML-Credit-Scoring-ThinFile-Consumers GitHub Page
Final PhD Thesis Implementation โ Chapter 5
This repository contains the complete implementation of Chapter 5 of my PhD thesis, titled:
โMachine Learning Algorithms for Credit Scoring of Thin-File Consumers: A Fairness-Aware Evaluationโ
It addresses the challenge of evaluating creditworthiness for thin-file consumersโindividuals with limited or no traditional credit historyโusing machine learning models, alternative data, and fairness-aware methodologies.
ML-Credit-Scoring-ThinFile-Consumers/ โ โโโ notebooks/ # Jupyter notebooks: training, SHAP/LIME, fairness โโโ results/ # Visualizations, model outputs, fairness plots โโโ tables/ # Tables 5.1โ5.6 in CSV format โโโ thesis_demo_colab.ipynb# Google Colab-compatible demo โโโ LICENSE # MIT License โโโ README.md # Youโre here โโโ CITATION.cff # Citation metadata
A synthetic dataset was created and published as part of this thesis to support reproducibility and fairness research:
๐ Dataset Title: Synthetic Credit Score of Thin-File Consumers
๐
Published: May 2024
๐ Hosted on: Harvard Dataverse
๐ Permanent DOI: https://doi.org/10.7910/DVN/6MLVVI
๐ Format: CSV with metadata following FAIR principles
๐ Contents: Credit scoring features (traditional + alternative), synthetic labels, demographic proxies
If you use this dataset in your research, please cite as:
```bibtex @dataset{shukla_2024_synthetic, author = {Deepa Shukla}, title = {Synthetic Credit Score of Thin-File Consumers}, year = 2024, doi = {10.7910/DVN/6MLVVI}, url = {https://doi.org/10.7910/DVN/6MLVVI} }
| Model | Description |
|---|---|
| Logistic Regression | Linear baseline |
| Decision Tree | Interpretable tree-based learner |
| Random Forest | Ensemble method for robust performance |
| Gradient Boosting | Fine-grained boosting classifier |
| Support Vector Machine | Margin-based nonlinear classifier |
| Deep Neural Network (DNN) | Multi-layer perceptron-based classifier |
This work integrates fairness into machine learning model evaluation through:
| Table No. | Description |
|---|---|
| 5.1 | AUC-ROC Scores for Traditional Models |
| 5.2 | Performance on Alternative Data Features |
| 5.3 | Model Comparison (AUC, F1) |
| 5.4 | Fairness Metrics across Models |
| 5.5 | Bias Scores (Before vs After Mitigation) |
| 5.6 | Fairness-aware Model Performance (AUC, Bias %) |
All tables are reproducible via linked notebooks and available in /tables/.
Please cite this repository and dataset using the following format:
```bibtex @dataset{shukla_2024_synthetic, author = {Deepa Shukla}, title = {Synthetic Credit Score of Thin-File Consumers}, year = 2024, doi = {10.7910/DVN/6MLVVI}, url = {https://doi.org/10.7910/DVN/6MLVVI} }
@misc{shukla_2025_repo, author = {Deepa Shukla}, title = {ML-Credit-Scoring-ThinFile-Consumers}, year = 2025, howpublished = {\url{https://github.com/Deezpa/ML-Credit-Scoring-ThinFile-Consumers}}, note = {Version 1.0 โ Final Thesis Implementation} }