Skip to content

complete data pipeline to transport and transform creative data from csv file to a warehouse and run different data processing and machine learning tasks to draw insights out of the data.

License

Notifications You must be signed in to change notification settings

benbel376/end-to-end-ad-unit-data-MLOps-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Contributors Forks Stargazers Issues MIT License LinkedIn

Adludio Data Engineering and Machine Learning Challenge

The Data Pipeline

image

Table of contents

Overview

The goal of this challenge is to provide an implementation that satisfies the requirements for the tasks listed below and demonstrate coding style, software design and engineering skills. We will evaluate all submissions based on the following criteria

  • Understanding of the problem being asked (you can always ask by email if something is not clear in this description)
  • Attempting as many of the tasks as possible in the time given, and answering the questions asked. This is our main indicator of hard work.
  • Creative and Innovative analysis for informative insights
  • Your coding style as well as your software design and engineering skills
  • Your communication of the findings or results

About Adludio business

Adludio is an online mobile ad business. It provides the following service to its clients Design an interactive Ad - what is also called a “creative”. Serves these creatives to audiences on behalf of a client. In order to do that, adludio buys impressions from an open market through bidding.

Data

  • Please find the data sources in the dataset folder attached. Design data ( global_design_data.json): This data is found by analyzing the advertisements using computer vision. It constitutes the ad-unit components. Note that the unique identifier in this data is game_key Campaigns data (campaigns_inventory.csv) Campaign historical performance dataset. It contains historical inventories of the campaign created placed and also KPI events associated with it. The type column is the one you will find the KPI events. Briefs data (briefing.csv) Campaign & creative plan data. Creative Assets(Creative_assets_) Zipped File The data contains images for particular game keys. Use computer vision to extract features that enrich the already existing features in design data.

please check this dbt doc to get a beter insight into the data

  • [Source of data:]

Technology Used

  • DBT
  • Docker
  • Redash
  • Airflow
  • Postgres

Prerequistes

  • python 3.8
  • Docker
  • Docker Compose

Installation

  1. Clone and navigate to repo
    git clone https://github.com/benbel376/adludio_challenge.git
    cd adludio_challenge
  2. Run the docker containers in the following order
    ./setup
    cd docker
    docker-compose -f docker-compose-postgres.yml up --build
    docker-compose -f docker-compose-airflow.yml up --build
    docker-compose -f docker-compose-redash.yml run --rm server create_db
    docker-compose -f docker-compose-redash.yml up --build
  3. Access running applications
    Navigate to `http://localhost:8087/` on the browser to get airflow
    Navigate to `http://localhost:16534/` on the browser to get pgadmin
    Navigate to `http://localhost:11111/` on the browser to get redash

Usage

For Development, Please refer the image below to understand the folder structure

image

For Normal Usage, follow the seps below.

  • Start by running the "workflow" dag from within the airflow.
  • Once all tasks complete executing, you can verify that the data is successfully transfered to the warehouse, using either the pgadmin application or redash.

You can quickly run the queries found in redash_visual.sql to generate a dashboard in redash.

Short commings and Future Upgrades

  • create a better model for the data. For the json data, I have noticed now that a better structure would be the following: [game_key, color_enga, color_click, text_eng, text_clic, video_eng, ...] this way of organizing the data enables one to easily use the data for machine learning.
  • integrate the machine learning training code into the airflow pipeline for better automation
  • use more strong tests in dbt to make sure the data is they way it is expected.
  • Integrate the data extracted from the images into the pipeline so that it can be used during training and EDA.
  • Use dvc to orchestrate the machine learning pipeline, while airflow contols dvc.
  • use mlflow as a server for the models.
  • Given the limited time, I wasn't able to extract a large number of features and rows of data from teh images. I only extracted few for sample

Contributing

Any contributions you decide to make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch
  3. Commit your Changes
  4. Push to the Branch
  5. Open a Pull Request

License

Author

👤 Biniyam Belayneh

Acknowledgement

  • Thank you Adludio for this wonderful project.
  • Thank you 10 academy for preparing us for this kind of challenges.

Show your support

Give a ⭐ if you like this project!

About

complete data pipeline to transport and transform creative data from csv file to a warehouse and run different data processing and machine learning tasks to draw insights out of the data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published