Skip to content

Latest commit

 

History

History

airflow

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Airflow

Use Airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The Airflow scheduler executes your tasks on an array of workers while following the specified dependencies.

References

Installation

Prerequisites: You should allocate at least 4GB memory for the Docker Engine (ideally 8GB).

Local

  • Docker Desktop Running

Cloud

Tutorial

  1. Create a new directory

    mkdir -p ~/app
    cd ~/app
  2. Running Airflow in Docker - Refer

    a. You can check if you have enough memory by running this command

    docker run --rm "debian:bullseye-slim" bash -c 'numfmt --to iec $(echo $(($(getconf _PHYS_PAGES) * $(getconf PAGE_SIZE))))'

    b. Fetch docker-compose.yaml

    curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.5.1/docker-compose.yaml'

    c. Setting the right Airflow user

    mkdir -p ./dags ./logs ./plugins ./working_data
    echo -e "AIRFLOW_UID=$(id -u)" > .env

    d. Update the following in docker-compose.yml

    # Donot load examples
    AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
    
    # Additional python package
    _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:- pandas }
    
    # Output dir
    - ${AIRFLOW_PROJ_DIR:-.}/working_data:/opt/airflow/working_data
    
    # Change default admin credentials
    _AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow2}
    _AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow2}

    e. Initialize the database

    docker compose up airflow-init

    f. Running Airflow

    docker compose up

    Wait until terminal outputs

    app-airflow-webserver-1 | 127.0.0.1 - - [17/Feb/2023:09:34:29 +0000] "GET /health HTTP/1.1" 200 141 "-" "curl/7.74.0"

    g. Enable port forwarding

    h. Visit localhost:8080 login with credentials set on step 2.d

  3. Explore UI and add user Security > List Users

  4. Create a python script dags/sandbox.py

    • BashOperator
    • PythonOperator
    • Task Dependencies
    • Params
    • Crontab schedules

    You can have n number of scripts inside dags dir

  5. Stop docker containers

    docker compose down