GitHub - benitomartin/mlops-databricks-credit-default: End-to-end MLOps Credit Default Project using DABs

MLOps Credit Default ✈️

This is a personal MLOps project based on a Kaggle dataset for credit default predictions.

It was developed as part of the this End-to-end MLOps with Databricks course and you can walk through it together with this Medium publication.

Feel free to ⭐ and clone this repo 😉

Tech Stack

Project Structure

The project has been structured with the following folders and files:

.github/workflows: CI/CD configuration files
- cd.yml
- ci.yml
data: raw data
- data.csv
notebooks: notebooks for various stages of the project
- create_source_data: notebook for generating synthetic data
  - create_source_data_notebook.py
- feature_engineering: feature engineering and MLflow experiments
  - basic_mlflow_experiment_notebook.py
  - combined_mlflow_experiment_notebook.py
  - custom_mlflow_experiment_notebook.py
  - prepare_data_notebook.py
- model_feature_serving: notebooks for serving models and features
  - AB_test_model_serving_notebbok.py
  - feature_serving_notebook.py
  - model_serving_feat_lookup_notebook.py
  - model_serving_notebook.py
- monitoring: monitoring and alerts setup
  - create_alert.py
  - create_inference_data.py
  - lakehouse_monitoring.py
  - send_request_to_endpoint.py
src: source code for the project
- credit_default
  - data_cleaning.py
  - data_cleaning_spark.py
  - data_preprocessing.py
  - data_preprocessing_spark.py
  - utils.py
tests: unit tests for the project
- test_data_cleaning.py
- test_data_preprocessor.py
workflows: workflows for Databricks asset bundle
- deploy_model.py
- evaluate_model.py
- preprocess.py
- refresh_monitor.py
- train_model.py
.pre-commit-config.yaml: configuration for pre-commit hooks
Makefile: helper commands for installing requirements, formatting, testing, linting, and cleaning
project_config.yml: configuration settings for the project
databricks.yml: Databricks asset bundle configuration
bundle_monitoring.yml: monitoring settings for Databricks asset bundle

Project Set Up

The Python version used for this project is Python 3.11.

Clone the repo:

git clone https://github.com/benitomartin/mlops-databricks-credit-default.git

Create the virtual environment using uv with Python version 3.11 and install the requirements:

 uv venv -p 3.11.0 .venv
 source .venv/bin/activate
 uv pip install -r pyproject.toml --all-extras
 uv lock

Build the wheel package:
```
# Build
uv build
```

Install the Databricks extension for VS Code and Databricks CLI:

curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh

Authenticate on Databricks:

# Authentication
databricks auth login --configure-cluster --host <workspace-url>

# Profiles
databricks auth profiles
cat ~/.databrickscfg

After entering your information, the CLI will prompt you to save it under a Databricks configuration profile ~/.databrickscfg

Catalog Set Up

Once the project is set up, you need to create the volumes to store the data and the wheel package that will you have to install in the cluster:

catalog name: credit
schema_name: default

volume name: data and packages

# Create volumes
databricks volumes create credit default data MANAGED
databricks volumes create credit default packages MANAGED

# Push volumes
databricks fs cp data/data.csv dbfs:/Volumes/credit/default/data/data.csv
databricks fs cp dist/credit_default_databricks-0.0.1-py3-none-any.whl dbfs:/Volumes/credit/default/packages

# Show volumes
databricks fs ls dbfs:/Volumes/credit/default/data
databricks fs ls dbfs:/Volumes/credit/default/packages

Token Creation

Some project files require a Databricks authentication token. This token allows secure access to Databricks resources and APIs:

Create a token in the Databricks UI:
- Navigate to Settings --> User --> Developer --> Access tokens
- Generate a new personal access token

Create a secret scope for securely storing the token:

# Create Scope
databricks secrets create-scope secret-scope

# Add secret after running command
databricks secrets put-secret secret-scope databricks-token

# List secrets
databricks secrets list-secrets secret-scope

Note: For GitHub Actions (in cd.yml), the token must also be added as a GitHub Secret in your repository settings.

Now you can follow the code along the Medium publication or use it as supporting material if you enroll in the course. The blog does not contain an explanation of all files. Just the main ones used for the final deployment, but you can test out other files as well 🙂.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLOps Credit Default ✈️

Tech Stack

Project Structure

Project Set Up

Catalog Set Up

Token Creation

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
data		data
notebooks		notebooks
src		src
tests		tests
workflows		workflows
.env.sample		.env.sample
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Makefile		Makefile
README.md		README.md
bundle_monitoring.yml		bundle_monitoring.yml
databricks.yml		databricks.yml
project_config.yml		project_config.yml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
uv.lock		uv.lock

benitomartin/mlops-databricks-credit-default

Folders and files

Latest commit

History

Repository files navigation

MLOps Credit Default ✈️

Tech Stack

Project Structure

Project Set Up

Catalog Set Up

Token Creation

About

Topics

Resources

Stars

Watchers

Forks

Languages