Skip to content

Latest commit

 

History

History
73 lines (65 loc) · 2.49 KB

README.md

File metadata and controls

73 lines (65 loc) · 2.49 KB

Docker

Follow these instructions to set up and run our provided Docker image.

Set Up Docker Engine and Docker Compose

You'll need to install Docker Engine on your development system. Note that while Docker Engine is free to use, Docker Desktop may require you to purchase a license. See the Docker Engine Server installation instructions for details.

To build and run this workload inside a Docker Container, ensure you have Docker Compose installed on your machine. If you don't have this tool installed, consult the official Docker Compose installation documentation.

DOCKER_CONFIG=${DOCKER_CONFIG:-$HOME/.docker}
mkdir -p $DOCKER_CONFIG/cli-plugins
curl -SL https://github.com/docker/compose/releases/download/v2.7.0/docker-compose-linux-x86_64 -o $DOCKER_CONFIG/cli-plugins/docker-compose
chmod +x $DOCKER_CONFIG/cli-plugins/docker-compose
docker compose version

Set Up Docker Image

Build or Pull the provided docker images.

cd docker
docker compose build

OR

docker pull intel/ai-tools:tlt-0.5.0
docker pull intel/ai-tools:tlt-devel-0.5.0
docker pull intel/ai-tools:tlt-dist-0.5.0
docker pull intel/ai-tools:tlt-dist-devel-0.5.0

Use Docker Image

Utilize the TLT CLI without installation by using the provided docker image and docker compose.

docker compose run tlt-prod
# OR
docker compose run tlt-prod tlt --help

Kubernetes

1. Install Helm

curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 && \
chmod 700 get_helm.sh && \
./get_helm.sh

2. Setting up Training Operator

Install the standalone operator from GitHub or use a pre-existing Kubeflow configuration.

kubectl apply -k "github.com/kubeflow/training-operator/manifests/overlays/standalone"

OR

helm repo add cowboysysop https://cowboysysop.github.io/charts/
helm install <release name> cowboysysop/training-operator

3. Deploy TLT Distributed Job

For more customization information, see the chart README

export NAMESPACE=kubeflow
helm install --namespace ${NAMESPACE} --set ... tlt-distributed ./docker/chart

4. View

To view your workflow progress

kubectl get -o yaml mpijob tf-tlt-distributed -n ${NAMESPACE}

OR

kubectl logs tf-tlt-distributed-launcher -n ${NAMESPACE}