Skip to content

Commit

Permalink
Pre-build Legion library (#1042)
Browse files Browse the repository at this point in the history
* add optional flag for building legion only

* added build path and legion-only flag

* bug fix

* pass new variable with config file

* move nccl

* bug fix

* add cuda_arch list

* export position move

* cd into legion

* quick fix

* retrieve os version and cd directory

* using ubuntu

* directory fix

* bug fix

* add touch

* create the release to flexflow-third-party

* bug fix

* bug fix

* fix indentation

* fix

* bash launching

* bug fix

* bug fix

* extract tar file`

* bug fix

* add parameter

* bash fix

* python version

* bug fix

* bug fix

* bug fix

* bug fix

* bug fix

* build bash

* bug fix

* bug fix

* bug fix

* bug fix

* bug fix

* auto running docker container

* renew bash script

* bug fix

* bug fix

* bug fix

* non-running container

* bug fix

* make it easier to switch between inference and master branch

* multiple fixes

* bug fix

* bug fix

* add python version

* bug fix

* restore

* enable building docker images for different hip versions

* ignore shellcheck error code

* support hip compilation in inference cmake files

* fix

* workflow and hardcode

* bug fix

* fix

* cmake fix

* python versions

* cmake fixes

* cmake fixes

* move install

* order

* bug fix

* nested if condition fix

* update docker workflow and config scripts

* update scripts

* fix

* fix

* cleanup

* rocm 5.6 by default in workflow

* move outside

* update workflow

* incorp install.sh

* bug fix

* fix

* fix

* fix

* bg fix

* fix permissions

* bug fix

* bug fix

* bug fix

* bug fix

* updated

* bug fix

* fix workflow

* check

* check

* bug fix

* fix

* add python env

* fix

* cleanup

* update workflow

* newline

* added runner

* added endif

* Code Cleanup

* restore to self-hosted

* bug fix

* fix

* fix

* update workflow

* fixes

* fix cmake for hip rocm

---------

Co-authored-by: Gabriele Oliaro <goliaro@cs.cmu.edu>
  • Loading branch information
DerrickYLJ and goliaro authored Oct 23, 2023
1 parent caf5d61 commit dd9f62d
Show file tree
Hide file tree
Showing 9 changed files with 452 additions and 242 deletions.
75 changes: 75 additions & 0 deletions .github/workflows/helpers/prebuild_legion.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
#! /usr/bin/env bash
set -euo pipefail

# Parse input params
python_version=${python_version:-"empty"}
gpu_backend=${gpu_backend:-"empty"}
gpu_backend_version=${gpu_backend_version:-"empty"}

if [[ "${gpu_backend}" != @(cuda|hip_cuda|hip_rocm|intel) ]]; then
echo "Error, value of gpu_backend (${gpu_backend}) is invalid. Pick between 'cuda', 'hip_cuda', 'hip_rocm' or 'intel'."
exit 1
else
echo "Pre-building Legion with GPU backend: ${gpu_backend}"
fi

if [[ "${gpu_backend}" == "cuda" || "${FF_GPU_BACKEND}" == "hip_cuda" ]]; then
# Check that CUDA version is supported. Versions above 12.0 not supported because we don't publish docker images for it yet.
if [[ "$gpu_backend_version" != @(11.1|11.2|11.3|11.4|11.5|11.6|11.7|11.8|12.0) ]]; then
echo "cuda_version is not supported, please choose among {11.1|11.2|11.3|11.4|11.5|11.6|11.7|11.8|12.0}"
exit 1
fi
export cuda_version="$gpu_backend_version"
elif [[ "${gpu_backend}" == "hip_rocm" ]]; then
# Check that HIP version is supported
if [[ "$gpu_backend_version" != @(5.3|5.4|5.5|5.6) ]]; then
echo "hip_version is not supported, please choose among {5.3, 5.4, 5.5, 5.6}"
exit 1
fi
export hip_version="$gpu_backend_version"
else
echo "gpu backend: ${gpu_backend} and gpu_backend_version: ${gpu_backend_version} not yet supported."
exit 1
fi

# Cd into directory holding this script
cd "${BASH_SOURCE[0]%/*}"

export FF_GPU_BACKEND="${gpu_backend}"
export FF_CUDA_ARCH=all
export FF_HIP_ARCH=all
export BUILD_LEGION_ONLY=ON
export INSTALL_DIR="/usr/legion"
export python_version="${python_version}"

# Build Docker Flexflow Container
echo "building docker"
../../../docker/build.sh flexflow

# Cleanup any existing container with the same name
docker rm prelegion || true

# Create container to be able to copy data from the image
docker create --name prelegion flexflow-"${gpu_backend}"-"${gpu_backend_version}":latest

# Copy legion libraries to host
echo "extract legion library assets"
mkdir -p ../../../prebuilt_legion_assets
rm -rf ../../../prebuilt_legion_assets/tmp || true
docker cp prelegion:$INSTALL_DIR ../../../prebuilt_legion_assets/tmp


# Create the tarball file
cd ../../../prebuilt_legion_assets/tmp
export LEGION_TARBALL="legion_ubuntu-20.04_${gpu_backend}-${gpu_backend_version}_py${python_version}.tar.gz"

echo "Creating archive $LEGION_TARBALL"
tar -zcvf "../$LEGION_TARBALL" ./
cd ..
echo "Checking the size of the Legion tarball..."
du -h "$LEGION_TARBALL"


# Cleanup
rm -rf tmp/*
docker rm prelegion
84 changes: 84 additions & 0 deletions .github/workflows/prebuild-legion.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
name: "prebuild-legion"
on:
push:
branches:
- "inference"
paths:
- "cmake/**"
- "config/**"
- "deps/legion/**"
- ".github/workflows/helpers/install_dependencies.sh"
workflow_dispatch:
concurrency:
group: prebuild-legion-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

jobs:
prebuild-legion:
name: Prebuild Legion with CMake
runs-on: ubuntu-20.04
defaults:
run:
shell: bash -l {0} # required to use an activated conda environment
strategy:
matrix:
gpu_backend: ["cuda", "hip_rocm"]
gpu_backend_version: ["11.8", "5.6"]
python_version: "3.11"
exclude:
- gpu_backend: "cuda"
gpu_backend_version: "5.6"
- gpu_backend: "hip_rocm"
gpu_backend_version: "11.8"
fail-fast: false
steps:
- name: Checkout Git Repository
uses: actions/checkout@v3
with:
submodules: recursive

- name: Free additional space on runner
run: .github/workflows/helpers/free_space_on_runner.sh

- name: Build Legion
env:
FF_GPU_BACKEND: ${{ matrix.gpu_backend }}
run: .github/workflows/helpers/prebuild_legion.sh

- name: Archive compiled Legion library (CUDA)
env:
FF_GPU_BACKEND: ${{ matrix.gpu_backend }}
uses: actions/upload-artifact@v3
with:
name: legion_ubuntu-20.04_${{ matrix.gpu_backend }}-${{ matrix.gpu_backend_version }}_py${{ matrix.python_version }}
path: prebuilt_legion_assets/legion_ubuntu-20.04_${{ matrix.gpu_backend }}-${{ matrix.gpu_backend_version }}_py${{ matrix.python_version }}.tar.gz

create-release:
name: Create new release
runs-on: ubuntu-20.04
needs: prebuild-legion
steps:
- name: Checkout Git Repository
uses: actions/checkout@v3
- name: Free additional space on runner
run: .github/workflows/helpers/free_space_on_runner.sh
- name: Create folder for artifacts
run: mkdir artifacts unwrapped_artifacts
- name: Download artifacts
uses: actions/download-artifact@v3
with:
path: ./artifacts
- name: Display structure of downloaded files
working-directory: ./artifacts
run: ls -R
- name: Unwrap all artifacts
working-directory: ./artifacts
run: find . -maxdepth 2 -mindepth 2 -type f -name "*.tar.gz" -exec mv {} ../unwrapped_artifacts/ \;
- name: Get datetime
run: echo "RELEASE_DATETIME=$(date '+%Y-%m-%dT%H-%M-%S')" >> $GITHUB_ENV
- name: Release
env:
NAME: ${{ env.RELEASE_DATETIME }}
TAG_NAME: ${{ env.RELEASE_DATETIME }}
GITHUB_TOKEN: ${{ secrets.FLEXFLOW_TOKEN }}
run: gh release create $TAG_NAME ./unwrapped_artifacts/*.tar.gz --repo flexflow/flexflow-third-party
Loading

0 comments on commit dd9f62d

Please sign in to comment.