Skip to content

parcheesime/Databricks-Portfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

=======

Databricks Notebooks Portfolio

Overview

This repository contains a collection of production level Databricks notebooks that I have developed as part of various data analysis work. These notebooks are provided as sample code to showcase my coding skills, problem-solving abilities, and familiarity with Databricks and related technologies. Some data has been removed.

Contents

Each notebook is documented with comments explaining the purpose of the code and the approach taken. Here is a brief overview of what each notebook demonstrates:

1. MC-API: NB_Pipeline_MailChimp_data_collection.ipynb

  • Purpose: This notebook outlines the process for collecting and processing data from MailChimp for an interactive dashboard for analysis.
  • Technologies used: SQL, PySpark, Databricks, MailChimp API
  • Key Concepts: Data collection, API integration, data transformation.

2. Event-Stream: NB_Pipeline_event_stream.ipynb

  • Purpose: This notebook demonstrates how to stream data from an S3 bucket, process the data in real-time, and load it into a Delta table for subsequent dashboard analysis.
  • Technologies used: SQL, PySpark, Databricks Delta Lake, AWS S3
  • Key Concepts: Real-time data streaming, Delta Lake integration, data visualization for client engagement analysis.

3. Backup and Recovery System: backup_recovery_system.ipynb

  • Purpose: This system automates the backup of data queries from Databricks to Amazon S3 and provides functionalities for reliable data recovery. It ensures data safety and accessibility for analytics and operational continuity.
  • Technologies used: Python, Pandas, Databricks API, Amazon S3, boto3
  • Key Concepts: Automated data backups, cloud-based storage solutions, data integrity and recovery, secure and scalable data handling.

Usage

These notebooks are intended for display purposes only and may require specific environment configurations to run successfully. They are not configured to run in environments outside of the original Databricks platform where they were developed.

Contributing

While this repository is primarily for showcasing purposes, feedback and suggestions are welcome.

License

This project is under the MIT License.

Contact

If you have any questions about the notebooks or would like to contact me regarding job opportunities or collaborations, please email me at [aletia.trepte@gmail.com].

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published