Skip to content

Databricks & Blueprint Hackathon - using databricks, spark structured streaming, delta, and azure devops to build automated deployment of notebooks and jobs.

Notifications You must be signed in to change notification settings

magrathj/Wild-West-Hackathon-Databricks-Structured-Streaming

Repository files navigation

Wild-West-Hackathon

Databricks & Blueprint Hackathon

Challenge Details

You are a data engineer for a sporting goods store. They want to send the data from their sales system to THE CLOUD. A developer on the sales system has written an application to send real time data in addition to the initial load from the sales system to an Azure Event Hub. Utilizing the Capture feature within Azure Event Hub, that data is being delivered directly to Azure Data Lake Storage. Unfortunately, the raw data is not very useable. It will be your job to munge the data into a useable format and land the data into Databricks Delta tables. Once you have figured out how to work with the data, you will need to go a step further and make sure that the data in Delta is always kept up to date with something called Structured Streaming. The streaming data is all either Inserts or Updates. You won't have to worry about deletes, but you will need to merge updated records into the Delta tables to represent current state.

You have the choice of working in Python or Scala for this challenge. One thing to keep in mind is that this is a canned hackathon. We provide you with the starting point and also have a specific expectation of the ending point. You are responsible for everything in between. The starting point is the "Hackathon Start Up" notebook in either the Python or Scala folder. The expectation for the ending point is streaming data into Delta tables created in the "Build Delta Tables" notebook. In that notebook you will create the Database for your team, and the code for creating one of the tables is provided. You will need to figure out the schemas for the remaining tables as part of the discovery process and populate the appropriate cells.

Set up databricks cli

    databricks configure --token --profile hackathon

View workspace

    databricks workspace list --profile hackathon

Architecture

Architecture

Setup

Stream

Stream

Application

Stream

Video

Architecture - https://youtu.be/6ouUu8God90
Application - https://youtu.be/EoeCTR36eKI

About

Databricks & Blueprint Hackathon - using databricks, spark structured streaming, delta, and azure devops to build automated deployment of notebooks and jobs.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published