This tutorial will guide users through creating automated, scalable, and distributed ML pipelines using DVC (Data Version Control) integrated with Ray.
- Run Distributed ML Pipeline with DVC DVC.
- Design distributed ML pipeliens with Ray Ray.
- Introduce Ray Tune for hyperparameter optimization
- Run distributed ML experiment with DVC and Ray on AWS
Branch | Details | Set up instructions |
---|---|---|
main |
For local Ray cluster setups. | Follow the Run DVC pipeline on Ray Cluster (local) section. |
cloud |
For cloud-based Ray cluster on AWS. | Follow the Run DVC pipeline on Ray Cluster (AWS) section. |
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Locally: cpu, single-node
export RAY_ENABLE_WINDOWS_OR_OSX_CLUSTER=1
ray start --head
To test running cluster you may run a test script
# Run a Python script with ray.init()
python src/test_scripts/hello_cluster.py
# Submit a script as a Job
ray job submit -- python src/test_scripts/hello_cluster.py
export PYTHONPATH=$PWD
dvc exp run
Remeber to stop remote cluster after work is done.
ray stop
ray up cluster.yaml
ray dashboard cluster.yaml
There are few options to interact with the Ray Cluster on AWS:
- Submit jobs (Ray scripts)
- Attach to an interactive screen session on the cluster
- Execute shell commands on the cluster
export RAY_ADDRESS='http://localhost:8265'
ray job submit --working-dir $PWD -- python src/test_scripts/mnist.py
If you would like to run an application interactively and see the output in real time (for example, during development or debugging), you can кun your script directly on a cluster node (e.g. after SSHing into the node using ray attach
ray attach cluster.yaml
Also, you may want to execute shell
commands on a cluster
ray exec cluster.yaml 'echo "hello world"'
Connect to the session on the cluster
ray attach cluster.yaml
Run ML pipeline with DVC on the cluster
# Navigate to the repo & run
cd tutorial-mnist-dvc-ray
# Run DVC experiment
export PYTHONPATH=/home/ray/tutorial-mnist-dvc-ray
dvc exp run
Execute dvc exp run
command on the cluster
ray exec cluster.yaml "cd tutorial-mnist-dvc-ray && \
export PYTHONPATH=/home/ray/tutorial-mnist-dvc-ray && \
dvc exp run"
Download (sync) the repository to results
ray rsync_down cluster.yaml '/home/ray/tutorial-mnist-dvc-ray/' $PWD
Commit results if needed.
Remember to stop remote cluster after work is done. Save money and the planet!
ray down cluster.yaml