Adludio Data Engineering and Machine Learning Challenge

The Data Pipeline

Table of contents

Overview
Data
Requirements
Technologies
Install
Usage
Licence
Author
Acknowledgement

Overview

The goal of this challenge is to provide an implementation that satisfies the requirements for the tasks listed below and demonstrate coding style, software design and engineering skills. We will evaluate all submissions based on the following criteria

Understanding of the problem being asked (you can always ask by email if something is not clear in this description)
Attempting as many of the tasks as possible in the time given, and answering the questions asked. This is our main indicator of hard work.
Creative and Innovative analysis for informative insights
Your coding style as well as your software design and engineering skills
Your communication of the findings or results

About Adludio business

Adludio is an online mobile ad business. It provides the following service to its clients Design an interactive Ad - what is also called a “creative”. Serves these creatives to audiences on behalf of a client. In order to do that, adludio buys impressions from an open market through bidding.

Data

Please find the data sources in the dataset folder attached. Design data ( global_design_data.json): This data is found by analyzing the advertisements using computer vision. It constitutes the ad-unit components. Note that the unique identifier in this data is game_key Campaigns data (campaigns_inventory.csv) Campaign historical performance dataset. It contains historical inventories of the campaign created placed and also KPI events associated with it. The type column is the one you will find the KPI events. Briefs data (briefing.csv) Campaign & creative plan data. Creative Assets(Creative_assets_) Zipped File The data contains images for particular game keys. Use computer vision to extract features that enrich the already existing features in design data.

please check this dbt doc to get a beter insight into the data

[Source of data:]

Technology Used

Prerequistes

python 3.8
Docker
Docker Compose

Installation

Clone and navigate to repo

git clone https://github.com/benbel376/adludio_challenge.git
cd adludio_challenge

Run the docker containers in the following order

./setup
cd docker
docker-compose -f docker-compose-postgres.yml up --build
docker-compose -f docker-compose-airflow.yml up --build
docker-compose -f docker-compose-redash.yml run --rm server create_db
docker-compose -f docker-compose-redash.yml up --build

Access running applications

Navigate to `http://localhost:8087/` on the browser to get airflow
Navigate to `http://localhost:16534/` on the browser to get pgadmin
Navigate to `http://localhost:11111/` on the browser to get redash

Usage

For Development, Please refer the image below to understand the folder structure

For Normal Usage, follow the seps below.

Start by running the "workflow" dag from within the airflow.
Once all tasks complete executing, you can verify that the data is successfully transfered to the warehouse, using either the pgadmin application or redash.

You can quickly run the queries found in redash_visual.sql to generate a dashboard in redash.

Short commings and Future Upgrades

create a better model for the data. For the json data, I have noticed now that a better structure would be the following: [game_key, color_enga, color_click, text_eng, text_clic, video_eng, ...] this way of organizing the data enables one to easily use the data for machine learning.
integrate the machine learning training code into the airflow pipeline for better automation
use more strong tests in dbt to make sure the data is they way it is expected.
Integrate the data extracted from the images into the pipeline so that it can be used during training and EDA.
Use dvc to orchestrate the machine learning pipeline, while airflow contols dvc.
use mlflow as a server for the models.
Given the limited time, I wasn't able to extract a large number of features and rows of data from teh images. I only extracted few for sample

Contributing

Any contributions you decide to make are greatly appreciated.

Fork the Project
Create your Feature Branch
Commit your Changes
Push to the Branch
Open a Pull Request

License

MIT License

Author

👤 Biniyam Belayneh

GitHub: Biniyam Belayneh
LinkedIn: Biniyam Belayneh

Acknowledgement

Thank you Adludio for this wonderful project.
Thank you 10 academy for preparing us for this kind of challenges.

Show your support

Give a ⭐ if you like this project!

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.dvc		.dvc
airflow		airflow
data		data
dbt		dbt
docker		docker
docs		docs
models		models
notebooks		notebooks
screenshots		screenshots
scripts		scripts
streamlit		streamlit
.dvcignore		.dvcignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adludio Data Engineering and Machine Learning Challenge

The Data Pipeline

Overview

About Adludio business

Data

Technology Used

Prerequistes

Installation

Usage

For Development, Please refer the image below to understand the folder structure

For Normal Usage, follow the seps below.

Short commings and Future Upgrades

Contributing

License

Author

Acknowledgement

Show your support

About

Releases

Packages

Languages

License

benbel376/end-to-end-ad-unit-data-MLOps-project

Folders and files

Latest commit

History

Repository files navigation

Adludio Data Engineering and Machine Learning Challenge

The Data Pipeline

Overview

About Adludio business

Data

Technology Used

Prerequistes

Installation

Usage

For Development, Please refer the image below to understand the folder structure

For Normal Usage, follow the seps below.

Short commings and Future Upgrades

Contributing

License

Author

Acknowledgement

Show your support

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages