Merge branch 'inference' into legion_workflow

flexflow · Sep 10, 2023 · 8daa0f1 · 8daa0f1
2 parents 345aeb9 + 4adad7d
commit 8daa0f1
Show file tree

Hide file tree

Showing 160 changed files with 5,786 additions and 1,079 deletions.
diff --git a/.github/README.md b/.github/README.md
@@ -6,8 +6,9 @@
 
 ## News🔥:
 
+* [09/02/2023] Adding AMD GPU support, released Docker images for ROCM 5.3->5.6
 * [08/16/2023] Adding Starcoder model support
-* [08/14/2023] Released Dockerfile for different CUDA versions
+* [08/14/2023] Released Docker images for different CUDA versions
 
 ## What is FlexFlow Serve
 
@@ -42,13 +43,13 @@ pip install flexflow
 ```
 
 ### Try it in Docker
-If you run into any issue during the install, or if you would like to use the C++ API without needing to install from source, you can also use our pre-built Docker package for different CUDA versions and the `hip_rocm` backend. To download and run our pre-built Docker container:
+If you run into any issue during the install, or if you would like to use the C++ API without needing to install from source, you can also use our pre-built Docker package for different CUDA versions (NVIDIA backend) and multiple ROCM versions (AMD backend). To download and run our pre-built Docker container:
 
 ```bash
-docker run --gpus all -it --rm --shm-size=8g ghcr.io/flexflow/flexflow-cuda-11.8:latest
+docker run --gpus all -it --rm --shm-size=8g ghcr.io/flexflow/flexflow-cuda-12.0:latest
 ```
 
-To download a Docker container for a backend other than CUDA v11.8, you can replace the `cuda-11.8` suffix with any of the following backends: `cuda-11.1`, `cuda-11.2`, `cuda-11.3`, `cuda-11.5`, `cuda-11.6`, `cuda-11.7`, `cuda-11.8`, and `hip_rocm`). More info on the Docker images, with instructions to build a new image from source, or run with additional configurations, can be found [here](../docker/README.md).
+To download a Docker container for a backend other than CUDA v12.0, you can replace the `cuda-12.0` suffix with any of the following backends: `cuda-11.1`, `cuda-11.2`, `cuda-11.3`, `cuda-11.4`, `cuda-11.5`, `cuda-11.6`, `cuda-11.7`, `cuda-11.8`, and `hip_rocm-5.3`, `hip_rocm-5.4`, `hip_rocm-5.5`, `hip_rocm-5.6`). More info on the Docker images, with instructions to build a new image from source, or run with additional configurations, can be found [here](../docker/README.md).
 
 ### Build from source
 
@@ -209,7 +210,7 @@ Below is a list of models that we have explicitly tested and for which a SSM may
 | StarCoder-15.5B | bigcode/starcoder | |
 
 ### CPU Offloading
-FlexFlow Serve also offers offloading-based inference for running large models (e.g., llama-7B) on a single GPU. CPU offloading is a choice to save tensors in CPU memory, and only copy the tensor to GPU when doing calculation. Notice that now we selectively offload the largest weight tensors (weights tensor in Linear, Attention). Besides, since the small model occupies considerably less space, it it does not pose a bottleneck for GPU memory, the offloading will bring more runtime space and computational cost, so we only do the offloading for the large model. You can run the offloading example by enabling the `-offload` and `-offload-reserve-space-size` flags.
+FlexFlow Serve also offers offloading-based inference for running large models (e.g., llama-7B) on a single GPU. CPU offloading is a choice to save tensors in CPU memory, and only copy the tensor to GPU when doing calculation. Notice that now we selectively offload the largest weight tensors (weights tensor in Linear, Attention). Besides, since the small model occupies considerably less space, it it does not pose a bottleneck for GPU memory, the offloading will bring more runtime space and computational cost, so we only do the offloading for the large model. [TODO: update instructions] You can run the offloading example by enabling the `-offload` and `-offload-reserve-space-size` flags.
 
 ### Quantization
 FlexFlow Serve supports int4 and int8 quantization. The compressed tensors are stored on the CPU side. Once copied to the GPU, these tensors undergo decompression and conversion back to their original precision. Please find the compressed weight files in our s3 bucket, or use [this script](../inference/utils/compress_llama_weights.py) from [FlexGen](https://github.com/FMInference/FlexGen) project to do the compression manually.
@@ -221,10 +222,24 @@ We provide five prompt datasets for evaluating FlexFlow Serve: [Chatbot instruct
 
 FlexFlow Serve is under active development. We currently focus on the following tasks and strongly welcome all contributions from bug fixes to new features and extensions.
 
-* AMD support. We are actively working on supporting FlexFlow Serve on AMD GPUs and welcome any contributions to this effort. 
+* AMD benchmarking. We are actively working on benchmarking FlexFlow Serve on AMD GPUs and comparing it with the performance on NVIDIA GPUs.
+* Chatbot prompt templates and Multi-round conversations
+* Support for FastAPI server
+* Integration with LangChain for document question answering
 
 ## Acknowledgements
-This project is initiated by members from CMU, Stanford, and UCSD. We will be continuing developing and supporting FlexFlow Serve. 
+This project is initiated by members from CMU, Stanford, and UCSD. We will be continuing developing and supporting FlexFlow Serve. Please cite FlexFlow Serve as:
+
+``` bibtex
+@misc{miao2023specinfer,
+      title={SpecInfer: Accelerating Generative Large Language Model Serving with Speculative Inference and Token Tree Verification}, 
+      author={Xupeng Miao and Gabriele Oliaro and Zhihao Zhang and Xinhao Cheng and Zeyu Wang and Rae Ying Yee Wong and Alan Zhu and Lijie Yang and Xiaoxiang Shi and Chunan Shi and Zhuoming Chen and Daiyaan Arfeen and Reyna Abhyankar and Zhihao Jia},
+      year={2023},
+      eprint={2305.09781},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL}
+}
+```
 
 ## License
 FlexFlow uses Apache License 2.0.
diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
@@ -67,7 +67,7 @@ jobs:
         uses: conda-incubator/setup-miniconda@v2
         with:
           activate-environment: flexflow
-          environment-file: conda/environment.yml
+          environment-file: conda/flexflow.yml
           auto-activate-base: false
 
       - name: Build FlexFlow
@@ -131,15 +131,14 @@ jobs:
           cd build
           ./tests/unit/unit-test
 
-      - name: Check availability of Python flexflow.core module
+      - name: Check availability of flexflow modules in Python
         run: |
           if [[ "${FF_GPU_BACKEND}" == "cuda" ]]; then
             export LD_LIBRARY_PATH="$CUDA_PATH/lib64/stubs:$LD_LIBRARY_PATH"
           fi
           # Remove build folder to check that the installed version can run independently of the build files
           rm -rf build
-          export CPU_ONLY_TEST=1
-          python -c "import flexflow.core; exit()"
+          python -c "import flexflow.core; import flexflow.serve as ff; exit()"
 
   makefile-build:
     name: Build FlexFlow with the Makefile
@@ -186,5 +185,4 @@ jobs:
 
           cd python
           make -j $n_build_cores
-          export CPU_ONLY_TEST=1
           python -c 'import flexflow.core'
diff --git a/.github/workflows/docker-build.yml b/.github/workflows/docker-build.yml
@@ -63,6 +63,7 @@ jobs:
       cuda_version: ${{ matrix.gpu_backend_version }}
       hip_version: ${{ matrix.gpu_backend_version }}
       branch_name: ${{ github.head_ref || github.ref_name }}
+    timeout-minutes: 480
     steps:
       - name: Checkout Git Repository
         uses: actions/checkout@v3
@@ -100,17 +101,17 @@ jobs:
             echo "Skipping build to save time"
           fi
 
-      - name: Check availability of Python flexflow.core module
+      - name: Check availability of flexflow modules in Python
         if: ${{ matrix.gpu_backend == 'cuda' }}
         env:
           deploy_needed: ${{ ( github.event_name == 'push' || github.event_name == 'schedule' || github.event_name == 'workflow_dispatch' ) && env.branch_name == 'inference' }}
           build_needed: ${{ ( matrix.gpu_backend == 'hip_rocm' && matrix.gpu_backend_version == '5.6' ) || ( matrix.gpu_backend == 'cuda' && matrix.gpu_backend_version == '11.8' ) }}
         run: |
           if [[ $deploy_needed == "true" || $build_needed == "true" ]]; then
             if [[ $FF_GPU_BACKEND == "cuda" ]]; then
-              docker run --env CPU_ONLY_TEST=1 --entrypoint /bin/bash flexflow-${FF_GPU_BACKEND}-${gpu_backend_version}:latest -c "export LD_LIBRARY_PATH=/usr/local/cuda/lib64/stubs:$LD_LIBRARY_PATH; sudo ln -s /usr/local/cuda/lib64/stubs/libcuda.so /usr/local/cuda/lib64/stubs/libcuda.so.1; python -c 'import flexflow.core; exit()'"
+              docker run --entrypoint /bin/bash flexflow-${FF_GPU_BACKEND}-${gpu_backend_version}:latest -c "export LD_LIBRARY_PATH=/usr/local/cuda/lib64/stubs:$LD_LIBRARY_PATH; sudo ln -s /usr/local/cuda/lib64/stubs/libcuda.so /usr/local/cuda/lib64/stubs/libcuda.so.1; python -c 'import flexflow.core; import flexflow.serve as ff; exit()'"
             else
-              docker run --env CPU_ONLY_TEST=1 --entrypoint /bin/bash flexflow-${FF_GPU_BACKEND}-${gpu_backend_version}:latest -c "python -c 'import flexflow.core; exit()'"
+              docker run --entrypoint /bin/bash flexflow-${FF_GPU_BACKEND}-${gpu_backend_version}:latest -c "python -c 'import flexflow.core; import flexflow.serve as ff; exit()'"
             fi
           else
             echo "Skipping test to save time"

diff --git a/.github/workflows/helpers/install_cudnn.sh b/.github/workflows/helpers/install_cudnn.sh
@@ -44,7 +44,7 @@ elif [[ "$cuda_version" == "11.7" ]]; then
 elif [[ "$cuda_version" == "11.8" ]]; then
     CUDNN_LINK=https://developer.download.nvidia.com/compute/redist/cudnn/v8.7.0/local_installers/11.8/cudnn-linux-x86_64-8.7.0.84_cuda11-archive.tar.xz
     CUDNN_TARBALL_NAME=cudnn-linux-x86_64-8.7.0.84_cuda11-archive.tar.xz
-elif [[ "$cuda_version" == "11.8" ]]; then
+elif [[ "$cuda_version" == "12.0" ]]; then
     echo "CUDNN support for CUDA version 12.0 not yet added"
     exit 1
 fi

diff --git a/.github/workflows/helpers/install_dependencies.sh b/.github/workflows/helpers/install_dependencies.sh
@@ -7,7 +7,7 @@ cd "${BASH_SOURCE[0]%/*}"
 
 # General dependencies
 echo "Installing apt dependencies..."
-sudo apt-get update && sudo apt-get install -y --no-install-recommends wget binutils git zlib1g-dev libhdf5-dev && \
+sudo apt-get update && sudo apt-get install -y --no-install-recommends wget binutils git zlib1g-dev libhdf5-dev jq && \
     sudo rm -rf /var/lib/apt/lists/*
 
 FF_GPU_BACKEND=${FF_GPU_BACKEND:-"cuda"}
@@ -20,6 +20,8 @@ fi
 if [[ "$FF_GPU_BACKEND" == "cuda" || "$FF_GPU_BACKEND" = "hip_cuda" ]]; then
     # Install CUDNN
     ./install_cudnn.sh
+    # Install NCCL
+    ./install_nccl.sh
 fi
 # Install HIP dependencies if needed
 if [[ "$FF_GPU_BACKEND" == "hip_cuda" || "$FF_GPU_BACKEND" = "hip_rocm" ]]; then

diff --git a/.github/workflows/helpers/install_nccl.sh b/.github/workflows/helpers/install_nccl.sh
@@ -0,0 +1,51 @@
+#!/bin/bash
+set -euo pipefail
+set -x
+
+# Cd into directory holding this script
+cd "${BASH_SOURCE[0]%/*}"
+
+# Add NCCL key ring
+ubuntu_version=$(lsb_release -rs)
+ubuntu_version=${ubuntu_version//./}
+wget "https://developer.download.nvidia.com/compute/cuda/repos/ubuntu${ubuntu_version}/x86_64/cuda-keyring_1.0-1_all.deb"
+sudo dpkg -i cuda-keyring_1.0-1_all.deb
+sudo apt update -y
+rm -f cuda-keyring_1.0-1_all.deb
+
+# Install NCCL
+cuda_version=${1:-11.8.0}
+cuda_version=$(echo "${cuda_version}" | cut -f1,2 -d'.')
+echo "Installing NCCL for CUDA version: ${cuda_version} ..."
+
+# We need to run a different install command based on the CUDA version, otherwise running `sudo apt install libnccl2 libnccl-dev`
+# will automatically upgrade CUDA to the latest version.
+
+if [[ "$cuda_version" == "11.0" ]]; then
+    sudo apt install libnccl2=2.15.5-1+cuda11.0 libnccl-dev=2.15.5-1+cuda11.0
+elif [[ "$cuda_version" == "11.1" ]]; then
+    sudo apt install libnccl2=2.8.4-1+cuda11.1 libnccl-dev=2.8.4-1+cuda11.1
+elif [[ "$cuda_version" == "11.2" ]]; then
+    sudo apt install libnccl2=2.8.4-1+cuda11.2 libnccl-dev=2.8.4-1+cuda11.2
+elif [[ "$cuda_version" == "11.3" ]]; then
+    sudo apt install libnccl2=2.9.9-1+cuda11.3 libnccl-dev=2.9.9-1+cuda11.3
+elif [[ "$cuda_version" == "11.4" ]]; then
+    sudo apt install libnccl2=2.11.4-1+cuda11.4 libnccl-dev=2.11.4-1+cuda11.4
+elif [[ "$cuda_version" == "11.5" ]]; then
+    sudo apt install libnccl2=2.11.4-1+cuda11.5 libnccl-dev=2.11.4-1+cuda11.5
+elif [[ "$cuda_version" == "11.6" ]]; then
+    sudo apt install libnccl2=2.12.12-1+cuda11.6 libnccl-dev=2.12.12-1+cuda11.6
+elif [[ "$cuda_version" == "11.7" ]]; then
+    sudo apt install libnccl2=2.14.3-1+cuda11.7 libnccl-dev=2.14.3-1+cuda11.7
+elif [[ "$cuda_version" == "11.8" ]]; then
+    sudo apt install libnccl2=2.16.5-1+cuda11.8 libnccl-dev=2.16.5-1+cuda11.8
+elif [[ "$cuda_version" == "12.0" ]]; then
+    sudo apt install libnccl2=2.18.3-1+cuda12.0 libnccl-dev=2.18.3-1+cuda12.0
+elif [[ "$cuda_version" == "12.1" ]]; then
+    sudo apt install libnccl2=2.18.3-1+cuda12.1 libnccl-dev=2.18.3-1+cuda12.1
+elif [[ "$cuda_version" == "12.2" ]]; then
+    sudo apt install libnccl2=2.18.3-1+cuda12.2 libnccl-dev=2.18.3-1+cuda12.2
+else
+    echo "Installing NCCL for CUDA version ${cuda_version} is not supported"
+    exit 1
+fi
diff --git a/.github/workflows/pip-install.yml b/.github/workflows/pip-install.yml
@@ -69,9 +69,8 @@ jobs:
           # Remove build folder to check that the installed version can run independently of the build files
           rm -rf build
 
-      - name: Check availability of Python flexflow.core module
+      - name: Check availability of flexflow modules in Python
         run: |
           export LD_LIBRARY_PATH="$CUDA_PATH/lib64/stubs:$LD_LIBRARY_PATH"
           sudo ln -s "$CUDA_PATH/lib64/stubs/libcuda.so" "$CUDA_PATH/lib64/stubs/libcuda.so.1"
-          export CPU_ONLY_TEST=1
-          python -c "import flexflow.core; exit()"
+          python -c 'import flexflow.core; import flexflow.serve as ff; exit()'
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -1,6 +1,7 @@
 cmake_minimum_required(VERSION 3.10)
 project(FlexFlow)
 
+
 include(ExternalProject)
 
 # Set policy CMP0074 to eliminate cmake warnings
@@ -175,10 +176,6 @@ endif()
 # option for nccl
 option(FF_USE_NCCL "Run FlexFlow with NCCL" OFF)
 
-if (FF_GPU_BACKEND STREQUAL "hip_rocm" AND FF_USE_NCCL STREQUAL "ON")
-  message(FATAL_ERROR "NCCL: ON for FF_GPU_BACKEND: hip_rocm. hip_rocm backend must have NCCL disabled.")
-endif()
-
 # option for avx2
 option(FF_USE_AVX2 "Run FlexFlow with AVX2" OFF)
 
@@ -226,9 +223,6 @@ if(FF_GPU_BACKEND STREQUAL "hip_cuda" OR FF_GPU_BACKEND STREQUAL "hip_rocm")
   set(ROCM_PATH "/opt/rocm" CACHE STRING "Default ROCM installation directory.")
 endif()
 
-# ZLIB
-include(zlib)
-
 # CUDA
 if (FF_GPU_BACKEND STREQUAL "cuda" OR FF_GPU_BACKEND STREQUAL "hip_cuda")
   include(cuda)
@@ -244,6 +238,18 @@ if (FF_GPU_BACKEND STREQUAL "cuda" OR FF_GPU_BACKEND STREQUAL "hip_cuda")
   include(cudnn)
 endif()
 
+
+# NCCL
+if(FF_USE_NCCL)
+  if(FF_GPU_BACKEND STREQUAL "hip_cuda" OR FF_GPU_BACKEND STREQUAL "cuda")
+    include(nccl)
+  endif() 
+  list(APPEND FF_CC_FLAGS
+    -DFF_USE_NCCL)
+  list(APPEND FF_NVCC_FLAGS
+    -DFF_USE_NCCL)
+endif()
+
 # Inference tests
 if(INFERENCE_TESTS)
   list(APPEND FF_CC_FLAGS
@@ -376,11 +382,26 @@ if(NOT BUILD_LEGION_ONLY)
       LIST_DIRECTORIES False
       ${FLEXFLOW_ROOT}/src/*.cpp)
 
-    if(BUILD_SHARED_LIBS)
-      add_library(flexflow SHARED ${FLEXFLOW_GPU_SRC} ${FLEXFLOW_SRC})
-    else()
-      add_library(flexflow STATIC ${FLEXFLOW_GPU_SRC} ${FLEXFLOW_SRC})
+    target_include_directories(hip_device_nvidia SYSTEM INTERFACE ${HIP_INCLUDE_DIRS} ${ROCM_PATH}/include)
+    target_include_directories(hip_device_nvidia INTERFACE ${HIP_INCLUDE_DIRS} ${ROCM_PATH}/include)
+
+    add_compile_definitions(FF_USE_HIP_CUDA)
+
+    # Linking cuda: 
+    # We do not explicitly link cuda. hipcc when targeting nvidia will 
+    # use nvcc under the hood. nvcc when used for linking will handle 
+    # linking cuda dependencies
+    target_link_libraries(flexflow hip_device_nvidia)
+  elseif(FF_GPU_BACKEND STREQUAL "hip_rocm")
+    find_package(hipblas REQUIRED)
+    find_package(miopen REQUIRED)
+    if(FF_USE_NCCL)
+      find_package(rccl REQUIRED)
     endif()
+    # find_package(rocrand REQUIRED)
+    find_library(HIP_RAND_LIBRARY hiprand REQUIRED)
+
+    add_compile_definitions(FF_USE_HIP_ROCM)
 
     list(APPEND CMAKE_PREFIX_PATH ${ROCM_PATH}/hip ${ROCM_PATH})
 
@@ -440,14 +461,38 @@ if(NOT BUILD_LEGION_ONLY)
       # https://rocmdocs.amd.com/en/latest/Installation_Guide/Using-CMake-with-AMD-ROCm.html
       target_link_libraries(flexflow hip::device roc::hipblas MIOpen ${HIP_RAND_LIBRARY})
     endif()
-  else()
-    message(FATAL_ERROR "Unsupported FF_GPU_BACKEND for cmake: ${FF_GPU_BACKEND}")
-  endif()
 
-  if(FF_USE_NCCL)
-    add_dependencies(flexflow ${NCCL_NAME})
+    set_property(TARGET flexflow PROPERTY HIP_ARCHITECTURES "${HIP_ARCH_LIST}")
+
+    message(STATUS "FF_GPU_BACKEND: ${FF_GPU_BACKEND}")
+    message(STATUS "FF_HIP_ARCH: ${FF_HIP_ARCH}")
+    message(STATUS "HIP_ARCH_LIST: ${HIP_ARCH_LIST}")
+    get_property(CHECK_HIP_ARCHS TARGET flexflow PROPERTY HIP_ARCHITECTURES)
+    message(STATUS "CHECK_HIP_ARCHS: ${CHECK_HIP_ARCHS}")
+    message(STATUS "HIP_CLANG_PATH: ${HIP_CLANG_PATH}")
+
+    # The hip cmake config module defines three targets, 
+    # hip::amdhip64, hip::host, and hip::device.
+    #
+    # hip::host and hip::device are interface targets. hip::amdhip64 is an 
+    # imported target for libamdhip.
+    #
+    # You do not directly link to hip::amdhip64. hip::host links to hip::amdhip64
+    # and hip::device links to hip::host. Link to hip::host to just use hip without 
+    # compiling any GPU code. Link to hip::device to compile the GPU device code.
+    #
+    # Docs (outdated):
+    # https://rocmdocs.amd.com/en/latest/Installation_Guide/Using-CMake-with-AMD-ROCm.html
+    target_link_libraries(flexflow hip::device roc::hipblas MIOpen ${HIP_RAND_LIBRARY})
+    if(FF_USE_NCCL)
+        target_link_libraries(flexflow rccl)
+    endif()
   endif()
 
+if(FF_USE_NCCL AND (FF_GPU_BACKEND STREQUAL "hip_cuda" OR FF_GPU_BACKEND STREQUAL "cuda"))
+  add_dependencies(flexflow ${NCCL_NAME})
+endif()
+
   target_include_directories(flexflow PUBLIC ${FLEXFLOW_INCLUDE_DIRS})
   # LEGION_URL is defined if we found a precompiled Legion library to download
   if(LEGION_URL)

diff --git a/INSTALL.md b/INSTALL.md
@@ -30,7 +30,7 @@ If you are planning to build the Python interface, you will need to install seve
 
 The `conda` environment can be created and activated as:
 ```
-conda env create -f conda/environment.yml
+conda env create -f conda/flexflow.yml
 conda activate flexflow
 ```
 
@@ -42,7 +42,7 @@ You can configure a FlexFlow build by running the `config/config.linux` file in
 3. `FF_CUDA_ARCH` is used to set the architecture of targeted GPUs, for example, the value can be 60 if the GPU architecture is Pascal. To build for more than one architecture, pass a list of comma separated values (e.g. `FF_CUDA_ARCH=70,75`). To compile FlexFlow for all GPU architectures that are detected on the machine, pass `FF_CUDA_ARCH=autodetect` (this is the default value, so you can also leave `FF_CUDA_ARCH` unset. If you want to build for all GPU architectures compatible with FlexFlow, pass `FF_CUDA_ARCH=all`. **If your machine does not have any GPU, you have to set FF_CUDA_ARCH to at least one valid architecture code (or `all`)**, since the compiler won't be able to detect the architecture(s) automatically.
 4. `FF_USE_PYTHON` controls whether to build the FlexFlow Python interface.
 5. `FF_USE_NCCL` controls whether to build FlexFlow with NCCL support. By default, it is set to ON.
-6. `FF_LEGION_NETWORKS` is used to enable distributed run of FlexFlow. If you want to run FlexFlow on multiple nodes, follow instructions in [MULTI-NODE.md](MULTI-NODE.md) and set the corresponding parameters as follows:
+6. `FF_LEGION_NETWORKS` is used to enable distributed run of FlexFlow. If you want to run FlexFlow on multiple nodes, follow instructions in the [Multinode tutorial](https://flexflow.readthedocs.io/en/latest/multinode.html) and set the corresponding parameters as follows:
 * To build FlexFlow with GASNet, set `FF_LEGION_NETWORKS=gasnet` and `FF_GASNET_CONDUIT` as a specific conduit (e.g. `ibv`, `mpi`, `udp`, `ucx`) in `config/config.linux` when configuring the FlexFlow build. Set `FF_UCX_URL` when you want to customize the URL to download UCX.
 * To build FlexFlow with native UCX, set `FF_LEGION_NETWORKS=ucx` in `config/config.linux` when configuring the FlexFlow build. Set `FF_UCX_URL` when you want to customize the URL to download UCX.
 8. `FF_BUILD_EXAMPLES` controls whether to build all C++ example programs.