Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Latest commit

 

History

History
110 lines (83 loc) · 4.62 KB

DESIGN.rst

File metadata and controls

110 lines (83 loc) · 4.62 KB

Agogosml Design

Data Pipeline Building Block

The following is a basic, data pipeline, building block consisting of an <input, app, output> sequence. The input/output containers implement connectors to various messaging services and act as a message proxy between the messaging services and the app container. This sequence removes the need to implement connectors to various messaging services, and focus on building the business logic inside the app container while implementing a loosely coupled services approach.

  • Input Container - Receive events from Azure Event Hub / Kafka.
  • Init Container - Loads artifacts before initializing the customer application. The Init container (Artifact Loader) loads the ML module to a temporary storage which is then accessible to the Customer Application.
  • Customer App - Models implemented in PySpark, Tensorflow, scikit-learn and R.
  • Output Container - send the results of the models to Azure Event Hub / Kafka or other data source.

Architecure Diagram - Basic Pipeline Building Block

CI/CD Pipeline

The CI/CD pipeline runs in Azure DevOps Pipelines. It consists of two separate pipelines, The Customer application and the Input/Output applications.

Each pipeline is triggered on code commit or PR:

  • Clones the latest (or specific) branch of a Github repository.
  • Build the repo.
  • Run linting and unit tests.
  • If all the tests have passed, push the new docker images to an Azure Container Registry.

Architecure Diagram - ci/cd

Data Pipeline - Production Architecture

The production architecture leverages the concept of Kubernetes pods to host and connect between the Input/Output containers and the customer container. Seperating the input container and application and output containers enables a rolling upgrade and Blue / Green deployment without downtime.

Architecure Diagram - production architecture

Data Pipeline - Test Architecture

The test architecture runs both unit tests and integration tests locally on the test machine - In our case, the Azure DevOps machine. It runs the tested containers as well as the testing helper containers locally, while the testing container are responsible for pushing data into the pipeline and checking that the data coming from the other end of the pipeline is as expected.

Architecure Diagram - test architecture

Data Pipeline Building Block Description

The following is a detailed description of the Data Pipeline Building Block and all the elements connecting between the different containers in this block.

Architecure Diagram - Pipeline Building Block Description

Release Pipeline

This Azure DevOps pipeline is responsible to build, validate and create new docker images of the data pipeline which will be deployed using another dedicated pipeline aka Release pipeline.

The release pipeline is focused on deploying the input, customer app and output containers and implementing B/G deployment only on the customer app.

B/G deployment enables the data scientist to verify the new model (Blue) without tearing down the old model (Green). Once the new model is approved, the old model will be deleted and the new model will be marked "Green"

The deployment is done via Helm chart and Azure DevOps release pipeline with manual approval process of the version.

Architecure Diagram - production architecture-bg