Skip to content

My Setup Development Environment as Data Engineer

Notifications You must be signed in to change notification settings

longbuivan/dotfile

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

How to set up Development Environment as DataDev

Known as Data Tech-Stack for building OpenData platforms and applications.

Try to use this setup for your projects, doing custom configuration as your application requires.

Contents

Tutorial and Demo

Watch this video to get hands-on https://youtu.be/2nKRHoWemDQ

MacbookM1

General Settings

As Developer, spending a lot of time to mimic from internet and custom your personalized

Development

MacOS Package Management homebrew

For Windows or Linux users, you will need to install the following package management as your OS distro: choco, pacman, apt,...

Normally, I remove all unusual application on Dock/Toolbar/Desktop. Make it lean.

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"

Notes: Check python3 --version on your machine, if it doesn't have yet ?! Run brew install python3

Python Package Management

Most of Data Dev are installed and built using Python. PLEASE USE VENV and pip (virtual environment for different projects) instead global settings.

This is an example of libs I mostly use in Data Project

python3 -m pip install --user venv
python3 -m venv --help

pip3 install -r requirements.txt

Git

brew install git --cask

Terminal

Using --Oh my zsh!!-- , build comfortable Terminal Settings because you might work a lot with CLI.

sh -c "$(curl -fsSL https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh)"

Change theme settings by:

echo "ZSH_THEME="agnoster"" >> ~/.zshrc

Or visit this to change whatever you want https://github.com/ohmyzsh/ohmyzsh/wiki/Theme

Recommend to use iTerm2

Easily install iTerm2 by this brew install iTerm2

Plugins

In order to make your CLI more faster, check these out:

  1. Install plugins
git clone https://github.com/zsh-users/zsh-autosuggestions.git $ZSH_CUSTOM/plugins/zsh-autosuggestions

git clone https://github.com/zsh-users/zsh-syntax-highlighting.git $ZSH_CUSTOM/plugins/zsh-syntax-highlighting

git clone https://github.com/zdharma-continuum/fast-syntax-highlighting.git ${ZSH_CUSTOM:-$HOME/.oh-my-zsh/custom}/plugins/fast-syntax-highlighting

git clone --depth 1 -- https://github.com/marlonrichert/zsh-autocomplete.git $ZSH_CUSTOM/plugins/zsh-autocomplete
  1. Change config
 plugins=(
  git
  zsh-autosuggestions
  zsh-syntax-highlighting
  fast-syntax-highlighting
  zsh-autocomplete
 )
  1. Run source ~/.zshrc to activate change

Browser

Lightweight, Secure, Private with Min Browser

KeyCastr (Optional)

To display the keyboard on the screen

brew install keycastr --cask

Container Setup for Development

Navigate to ./containers/ and run docker-compose up

Container Setup for Data Engineering

Please remember to utilize your resources, Don't kill a machine

Devbox

  • Update on 2025-01-19, change to use devboxfor developement. devbox

Deployment

  • Vercel
  • Railway

Data Applications

When you're joining a project for building Data SaaS

1. Core Backend

2. Frontend

3. Additional Tools

Data Platforms

When you're joining a project for building Data PaaS

1. Storage

  • File object storage - Datalake (using for partition): MinIO
  • Non-Structured Data: MongoDB

2. Processing

3. Warehousing

Free Register and do mockup, install CLI

4. Programming

  • Programming OOP & Functions Scala
  • Backend & Infra Go
  • Fundamentals Data Engineering and Software Development

5. Protocol

6. Semantic

7. Cloud Providers

Any of cloud providers [AWS] , Azure, GCP

8. Infrastructure

Development

Supporting Development Tools

Monitoring Platforms

Project Structure

Create the folders as you need them

.
├── LICENSE
├── README.md : information about the project
├── app : contains data application
├── devops : contains cicd, infrastructure
├── docs : documentation
├── docusaurus : docs generator
├── run-stringx-platform.sh : master script to run the project
├── servers : contains data servers
└── venv-stringx : py virtual environment

If you want to get understanding what data engineering is and trying to achieve fundatmental knowledge about data engineering, check this Youtube Series:

https://youtu.be/5DEFgEBAuTA?si=KymSgjY-foD8q5I3

DataPods OSS

If you have been spent late night with data ingestion and data migration

  • This project data project includes lightweight k8s yaml files for creating the development environment, testing environment, Proof of Concept, Proof of Service or even support for Small Business.

  • With DataPods [DPs], maximize the number of times to provision services for creating data transformations.

  • Supported Scalability and Resiliency by features of Kubernetes.

  • Check out DataPods


I created this setting for my development setup, for contribution please create PR and update your preference

About

My Setup Development Environment as Data Engineer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published