-
Setup a Python 3.10 virtual environment
conda create -n myenv python=3.10 conda activate myenv
-
Install depencies
pip install -r requirements.txt
-
Ray must be installed. If not already installed, you can do so using pip:
pip install ray
-
Start the Ray cluster:
ray up ray-cluster-config.yaml --no-config-cache --log-color true -v -y
-
Forward port 10001 for submitting jobs:
ray attach ray-cluster-config.yaml -p 10001
-
Forward the dashboard for the cluster:
ray dashboard /home/joe/Projects/PlayPokemonRed/ray-cluster-config.yaml
You can monitor the system logs using the following commands:
-
Monitor output logs:
ray exec /home/joe/Projects/PlayPokemonRed/ray-cluster-config.yaml 'tail -n 100 -f /tmp/ray/session_latest/logs/monitor.out'
-
Monitor error logs:
ray exec /home/joe/Projects/PlayPokemonRed/ray-cluster-config.yaml 'tail -n 100 -f /tmp/ray/session_latest/logs/monitor.err'
-
Monitor general logs:
ray exec /home/joe/Projects/PlayPokemonRed/ray-cluster-config.yaml 'tail -n 100 -f /tmp/ray/session_latest/logs/monitor.log'
When you're done, you can tear down the cluster using the following command:
ray down ray-cluster-config.yaml -y
Cluster must be running Please note that on line 60 of train_ray.py the S3 bucket used is hard coded please replace with one you have access to.
- Run Command
python train_ray.py
-
Monitoring of resources, and logs can be done via the Ray dashboard avaible Here if you used the setup above.
-
To monitor checkpoints, track what part of the game the AI is exploring and provide organized access to metadata. You can run the custom webapp with the below commands.
cd src/monitor_app pip install -r requirements.txt python app.py
The app will be deployed here Please note that on line 27 of app.py the S3 bucket used is hard coded please replace with one you have access to. Please note that this webapp is extremely basic, and not representative of my webdev skills