ML-Credit-Scoring-ThinFile-Consumers

๐ŸŽ“ Live Research Repository โ€“ PhD Thesis Implementation | Credit Scoring with ML & Fairness Evaluation By Dr. Deepa Shukla, 2025 This repository presents the datasets, models, and fairness-aware algorithms developed as part of my PhD thesis.

๐Ÿ“ข Explore the live project site:
๐Ÿ‘‰ ML-Credit-Scoring-ThinFile-Consumers GitHub Page

View Documentation

๐Ÿ“˜ Machine Learning for Credit Scoring of Thin-File Consumers

Final PhD Thesis Implementation โ€“ Chapter 5

DOI
License: MIT
Open In Colab


๐ŸŽ“ Project Overview

This repository contains the complete implementation of Chapter 5 of my PhD thesis, titled:
โ€œMachine Learning Algorithms for Credit Scoring of Thin-File Consumers: A Fairness-Aware Evaluationโ€

It addresses the challenge of evaluating creditworthiness for thin-file consumersโ€”individuals with limited or no traditional credit historyโ€”using machine learning models, alternative data, and fairness-aware methodologies.


ML-Credit-Scoring-ThinFile-Consumers/ โ”‚ โ”œโ”€โ”€ notebooks/ # Jupyter notebooks: training, SHAP/LIME, fairness โ”œโ”€โ”€ results/ # Visualizations, model outputs, fairness plots โ”œโ”€โ”€ tables/ # Tables 5.1โ€“5.6 in CSV format โ”œโ”€โ”€ thesis_demo_colab.ipynb# Google Colab-compatible demo โ”œโ”€โ”€ LICENSE # MIT License โ”œโ”€โ”€ README.md # Youโ€™re here โ”œโ”€โ”€ CITATION.cff # Citation metadata


๐Ÿ“Š Dataset

A synthetic dataset was created and published as part of this thesis to support reproducibility and fairness research:

๐Ÿ“˜ Dataset Title: Synthetic Credit Score of Thin-File Consumers
๐Ÿ“… Published: May 2024
๐Ÿ“ Hosted on: Harvard Dataverse
๐Ÿ”— Permanent DOI: https://doi.org/10.7910/DVN/6MLVVI ๐Ÿ“‚ Format: CSV with metadata following FAIR principles
๐Ÿ” Contents: Credit scoring features (traditional + alternative), synthetic labels, demographic proxies

๐Ÿ“š Citation

If you use this dataset in your research, please cite as:

```bibtex @dataset{shukla_2024_synthetic, author = {Deepa Shukla}, title = {Synthetic Credit Score of Thin-File Consumers}, year = 2024, doi = {10.7910/DVN/6MLVVI}, url = {https://doi.org/10.7910/DVN/6MLVVI} }


๐Ÿง  Models Implemented

Model Description
Logistic Regression Linear baseline
Decision Tree Interpretable tree-based learner
Random Forest Ensemble method for robust performance
Gradient Boosting Fine-grained boosting classifier
Support Vector Machine Margin-based nonlinear classifier
Deep Neural Network (DNN) Multi-layer perceptron-based classifier

โš–๏ธ Fairness-Aware Evaluation

This work integrates fairness into machine learning model evaluation through:


๐Ÿ—‚๏ธ Repository Structure


๐Ÿ“ˆ Results Summary (Thesis Chapter 5)

Table No. Description
5.1 AUC-ROC Scores for Traditional Models
5.2 Performance on Alternative Data Features
5.3 Model Comparison (AUC, F1)
5.4 Fairness Metrics across Models
5.5 Bias Scores (Before vs After Mitigation)
5.6 Fairness-aware Model Performance (AUC, Bias %)

All tables are reproducible via linked notebooks and available in /tables/.


๐Ÿ“š How to Cite

Please cite this repository and dataset using the following format:

```bibtex @dataset{shukla_2024_synthetic, author = {Deepa Shukla}, title = {Synthetic Credit Score of Thin-File Consumers}, year = 2024, doi = {10.7910/DVN/6MLVVI}, url = {https://doi.org/10.7910/DVN/6MLVVI} }

@misc{shukla_2025_repo, author = {Deepa Shukla}, title = {ML-Credit-Scoring-ThinFile-Consumers}, year = 2025, howpublished = {\url{https://github.com/Deezpa/ML-Credit-Scoring-ThinFile-Consumers}}, note = {Version 1.0 โ€“ Final Thesis Implementation} }