Skip to content

Snowflake Avalanche Component for automatically materializing data into structured tables from Snowflake Connector Based Raw Tables

License

Notifications You must be signed in to change notification settings

ww-tech/avalanche

Repository files navigation

Avalanche

Avalanche is an automatic materialization process designed to transform raw, structured data. It ensures data completeness while providing a streamlined approach to managing Snowflake Kafka connector-based schemas. It is a Python-based solution that integrates with Snowflake, Kafka, and other data sources to facilitate the transformation and loading of data into structured formats. While not necessary, Avalanche was conceptualized in WW Tech as part of a broader stack like below

For Change Data Capture (CDC) Stack:

CDC Stack

For Application Events Stack:

Application Events

The recommendation for a production setup is similar to above , and thus this document assumes understanding Kafka , Snowflake Connector. For a non-standard setup , you could refer config/sample_nyc_taxi_data.yaml for guidance

Purpose:

Purpose of Avalanche is to transform raw, semi-structured data into relational, structured data. While doing so, it also checks for data completeness.

Diagram:

Getting Started

System Initialization:

This is a one-time setup where the base Avalanche system tables are created. These tables serve as the foundation for all Avalanche deployments. This step is executed using the initialize_system.py module and is required only once per new system setup.

Refer docs/system_initialization.md for details on how to run this script here

Deployment

Once the system is initialized, you are ready to deploy Avalanche service. Deployments are the core of Avalanche's functionality, allowing it to process data from various sources and materialize it into structured tables in Snowflake. Avalanche deployments are designed to materialize RAW tables (Snowflake Kafka connector-based schemas) into structured, queryable data tables. Each deployment is containerized and can be grouped by source (e.g., replicating an order transactions database).

Refer docs/avalanche_service_deployment.md for details on how to deploy Avalanche service here

Local Development Environment Setup

This section provides instructions for setting up a local development environment for Avalanche. It is designed to help developers quickly get started with Avalanche development and testing. It heavily relies on the make command to automate the setup process, including dependency installation, environment variable generation, and configuration. Refer docs/local_development_environment_setup.md for details on how to set up a local development environment here

Avalanche in the wild
Avalanche is currently being used to ingest Terabytes of data in WW supporting over 1400 topics spanning multiple data sources - Postgres, MySQL, Oracle, MongoDB, and schematized application events.

Additional Resources

This section provides references to additional components in the recommended stack

Contributors

Thanks to all the people who have contributed to this project! Maintainers:

Star Contributors:

Want to contribute? For the time being, the best way is to open an issue in the repo, and we will get back to you.

About

Snowflake Avalanche Component for automatically materializing data into structured tables from Snowflake Connector Based Raw Tables

Resources

License

Stars

Watchers

Forks

Packages

No packages published