Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] add a big GPU marker to run memory-intensive tests separately on CI #9691

Merged
merged 38 commits into from
Oct 31, 2024
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
32e23d8
add a marker for big gpu tests
sayakpaul Oct 16, 2024
da92ca0
update
sayakpaul Oct 16, 2024
219a3cc
trigger on PRs temporarily.
sayakpaul Oct 16, 2024
c679563
onnx
sayakpaul Oct 16, 2024
a0bae4b
fix
sayakpaul Oct 16, 2024
95f396e
total memory
sayakpaul Oct 16, 2024
02f0aa3
fixes
sayakpaul Oct 16, 2024
9441016
reduce memory threshold.
sayakpaul Oct 16, 2024
15d1127
bigger gpu
sayakpaul Oct 16, 2024
6c82fd4
Merge branch 'main' into big-model-marker
sayakpaul Oct 16, 2024
676b8a5
empty
sayakpaul Oct 16, 2024
3b50732
g6e
sayakpaul Oct 16, 2024
9ef5435
Apply suggestions from code review
sayakpaul Oct 16, 2024
4ff06b4
address comments.
sayakpaul Oct 17, 2024
46cab82
fix
sayakpaul Oct 17, 2024
2b25688
fix
sayakpaul Oct 17, 2024
b0568da
fix
sayakpaul Oct 17, 2024
928dd73
fix
sayakpaul Oct 17, 2024
9020d8f
fix
sayakpaul Oct 17, 2024
2732720
okay
sayakpaul Oct 17, 2024
f265f7d
further reduce.
sayakpaul Oct 17, 2024
1755305
updates
sayakpaul Oct 17, 2024
fcb57ae
remove
sayakpaul Oct 17, 2024
6f477ac
updates
sayakpaul Oct 17, 2024
ff47576
updates
sayakpaul Oct 17, 2024
1ad8c64
updates
sayakpaul Oct 17, 2024
605a21d
updates
sayakpaul Oct 17, 2024
9e1cacb
fixes
sayakpaul Oct 17, 2024
0704d9a
fixes
sayakpaul Oct 17, 2024
c9fd1ab
updates.
sayakpaul Oct 17, 2024
f8086f6
Merge branch 'main' into big-model-marker
sayakpaul Oct 17, 2024
e31b0bd
Merge branch 'main' into big-model-marker
sayakpaul Oct 18, 2024
cf280ba
fix
sayakpaul Oct 18, 2024
5b9c771
Merge branch 'main' into big-model-marker
a-r-r-o-w Oct 19, 2024
0e07597
Merge branch 'main' into big-model-marker
sayakpaul Oct 22, 2024
4fcd223
Merge branch 'main' into big-model-marker
sayakpaul Oct 31, 2024
1302ecd
Merge branch 'main' into big-model-marker
sayakpaul Oct 31, 2024
2084be0
workflow fixes.
sayakpaul Oct 31, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 60 additions & 0 deletions .github/workflows/nightly_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ name: Nightly and release tests on main/release branch

on:
workflow_dispatch:
pull_request:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is temporary.

schedule:
- cron: "0 0 * * *" # every day at midnight

Expand All @@ -18,6 +19,7 @@ env:

jobs:
setup_torch_cuda_pipeline_matrix:
if: github.event_name == 'schedule'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temporary.

name: Setup Torch Pipelines CUDA Slow Tests Matrix
runs-on:
group: aws-general-8-plus
Expand Down Expand Up @@ -49,6 +51,7 @@ jobs:
path: reports

run_nightly_tests_for_torch_pipelines:
if: github.event_name == 'schedule'
name: Nightly Torch Pipelines CUDA Tests
needs: setup_torch_cuda_pipeline_matrix
strategy:
Expand Down Expand Up @@ -106,6 +109,7 @@ jobs:
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY

run_nightly_tests_for_other_torch_modules:
if: github.event_name == 'schedule'
name: Nightly Torch CUDA Tests
runs-on:
group: aws-g4dn-2xlarge
Expand Down Expand Up @@ -180,6 +184,61 @@ jobs:
pip install slack_sdk tabulate
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY

run_big_gpu_torch_tests:
name: Torch tests on big GPU (24GB)
sayakpaul marked this conversation as resolved.
Show resolved Hide resolved
strategy:
fail-fast: false
max-parallel: 8
sayakpaul marked this conversation as resolved.
Show resolved Hide resolved
sayakpaul marked this conversation as resolved.
Show resolved Hide resolved
runs-on:
group: aws-g6e-xlarge-plus
container:
image: diffusers/diffusers-pytorch-cuda
options: --shm-size "16gb" --ipc host --gpus 0
steps:
- name: Checkout diffusers
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: NVIDIA-SMI
run: nvidia-smi
- name: Install dependencies
run: |
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
python -m uv pip install -e [quality,test]
python -m uv pip install peft@git+https://github.com/huggingface/peft.git
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
python -m uv pip install pytest-reportlog
- name: Environment
run: |
python utils/print_env.py
- name: Selected Torch CUDA Test on big GPU
env:
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
CUBLAS_WORKSPACE_CONFIG: :16:8
run: |
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
-m "big_gpu_with_torch_cuda" \
--make-reports=tests_big_gpu_torch_cuda \
--report-log=tests_big_gpu_torch_cuda.log \
tests/
- name: Failure short reports
if: ${{ failure() }}
run: |
cat reports/tests_big_gpu_torch_cuda_stats.txt
cat reports/tests_big_gpu_torch_cuda_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: torch_cuda_big_gpu_test_reports
path: reports
- name: Generate Report and Notify Channel
if: always()
run: |
pip install slack_sdk tabulate
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY

run_flax_tpu_tests:
name: Nightly Flax TPU Tests
runs-on: docker-tpu
Expand Down Expand Up @@ -237,6 +296,7 @@ jobs:
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY

run_nightly_onnx_tests:
if: github.event_name == 'schedule'
name: Nightly ONNXRuntime CUDA tests on Ubuntu
runs-on:
group: aws-g4dn-2xlarge
Expand Down
21 changes: 21 additions & 0 deletions src/diffusers/utils/testing_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@
) > version.parse("4.33")

USE_PEFT_BACKEND = _required_peft_version and _required_transformers_version
BIG_GPU_MEMORY = 40
sayakpaul marked this conversation as resolved.
Show resolved Hide resolved

if is_torch_available():
import torch
Expand Down Expand Up @@ -307,6 +308,26 @@ def require_torch_accelerator_with_fp64(test_case):
)


def require_big_gpu_with_torch_cuda(test_case):
"""
Decorator marking a test that requires a bigger GPU (24GB) for execution. Some example pipelines: Flux, SD3, Cog,
etc.
"""
if not is_torch_available():
return unittest.skip("test requires PyTorch")(test_case)

import torch

if not torch.cuda.is_available():
return unittest.skip("test requires PyTorch CUDA")(test_case)

device_properties = torch.cuda.get_device_properties(0)
total_memory = device_properties.total_memory / (1024**3)
return unittest.skipUnless(
total_memory >= BIG_GPU_MEMORY, f"test requires a GPU with at least {BIG_GPU_MEMORY} GB memory"
)(test_case)


def require_torch_accelerator_with_training(test_case):
"""Decorator marking a test that requires an accelerator with support for training."""
return unittest.skipUnless(
Expand Down
6 changes: 4 additions & 2 deletions tests/pipelines/controlnet_flux/test_controlnet_flux.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
import unittest

import numpy as np
import pytest
import torch
from transformers import CLIPTextConfig, CLIPTextModel, CLIPTokenizer, T5EncoderModel, T5TokenizerFast

Expand All @@ -30,7 +31,7 @@
from diffusers.utils import load_image
from diffusers.utils.testing_utils import (
enable_full_determinism,
require_torch_gpu,
require_big_gpu_with_torch_cuda,
slow,
torch_device,
)
Expand Down Expand Up @@ -180,7 +181,8 @@ def test_xformers_attention_forwardGenerator_pass(self):


@slow
@require_torch_gpu
@require_big_gpu_with_torch_cuda
@pytest.mark.big_gpu_with_torch_cuda
class FluxControlNetPipelineSlowTests(unittest.TestCase):
pipeline_class = FluxControlNetPipeline

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
import unittest

import numpy as np
import pytest
import torch
from transformers import AutoTokenizer, CLIPTextConfig, CLIPTextModel, CLIPTokenizer, T5EncoderModel

Expand All @@ -14,7 +15,7 @@
)
from diffusers.utils.testing_utils import (
numpy_cosine_similarity_distance,
require_torch_gpu,
require_big_gpu_with_torch_cuda,
slow,
torch_device,
)
Expand Down Expand Up @@ -225,7 +226,8 @@ def test_fused_qkv_projections(self):


@slow
@require_torch_gpu
@require_big_gpu_with_torch_cuda
@pytest.mark.big_gpu_with_torch_cuda
class FluxControlNetImg2ImgPipelineSlowTests(unittest.TestCase):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this test was correctly done as it doesn't pass the controlnet module to the pipeline and it also uses very dummy inputs which I think should be avoided for an integration test. LMK if you think otherwise.

pipeline_class = FluxControlNetImg2ImgPipeline
repo_id = "black-forest-labs/FLUX.1-schnell"
Expand Down Expand Up @@ -261,7 +263,6 @@ def get_inputs(self, device, seed=0):
"generator": generator,
}

@unittest.skip("We cannot run inference on this model with the current CI hardware")
def test_flux_controlnet_img2img_inference(self):
pipe = self.pipeline_class.from_pretrained(self.repo_id, torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload()
Expand Down
6 changes: 4 additions & 2 deletions tests/pipelines/controlnet_sd3/test_controlnet_sd3.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
import unittest

import numpy as np
import pytest
import torch
from transformers import AutoTokenizer, CLIPTextConfig, CLIPTextModelWithProjection, CLIPTokenizer, T5EncoderModel

Expand All @@ -30,7 +31,7 @@
from diffusers.utils import load_image
from diffusers.utils.testing_utils import (
enable_full_determinism,
require_torch_gpu,
require_big_gpu_with_torch_cuda,
slow,
torch_device,
)
Expand Down Expand Up @@ -195,7 +196,8 @@ def test_xformers_attention_forwardGenerator_pass(self):


@slow
@require_torch_gpu
@require_big_gpu_with_torch_cuda
@pytest.mark.big_gpu_with_torch_cuda
class StableDiffusion3ControlNetPipelineSlowTests(unittest.TestCase):
pipeline_class = StableDiffusion3ControlNetPipeline

Expand Down
8 changes: 4 additions & 4 deletions tests/pipelines/flux/test_pipeline_flux.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,14 @@
import unittest

import numpy as np
import pytest
import torch
from transformers import AutoTokenizer, CLIPTextConfig, CLIPTextModel, CLIPTokenizer, T5EncoderModel

from diffusers import AutoencoderKL, FlowMatchEulerDiscreteScheduler, FluxPipeline, FluxTransformer2DModel
from diffusers.utils.testing_utils import (
numpy_cosine_similarity_distance,
require_torch_gpu,
require_big_gpu_with_torch_cuda,
slow,
torch_device,
)
Expand Down Expand Up @@ -191,7 +192,8 @@ def test_fused_qkv_projections(self):


@slow
@require_torch_gpu
@require_big_gpu_with_torch_cuda
@pytest.mark.big_gpu_with_torch_cuda
class FluxPipelineSlowTests(unittest.TestCase):
pipeline_class = FluxPipeline
repo_id = "black-forest-labs/FLUX.1-schnell"
Expand Down Expand Up @@ -220,8 +222,6 @@ def get_inputs(self, device, seed=0):
"generator": generator,
}

# TODO: Dhruv. Move large model tests to a dedicated runner)
@unittest.skip("We cannot run inference on this model with the current CI hardware")
def test_flux_inference(self):
pipe = self.pipeline_class.from_pretrained(self.repo_id, torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,14 @@
import unittest

import numpy as np
import pytest
import torch
from transformers import AutoTokenizer, CLIPTextConfig, CLIPTextModelWithProjection, CLIPTokenizer, T5EncoderModel

from diffusers import AutoencoderKL, FlowMatchEulerDiscreteScheduler, SD3Transformer2DModel, StableDiffusion3Pipeline
from diffusers.utils.testing_utils import (
numpy_cosine_similarity_distance,
require_torch_gpu,
require_big_gpu_with_torch_cuda,
slow,
torch_device,
)
Expand Down Expand Up @@ -226,7 +227,8 @@ def test_fused_qkv_projections(self):


@slow
@require_torch_gpu
@require_big_gpu_with_torch_cuda
@pytest.mark.big_gpu_with_torch_cuda
class StableDiffusion3PipelineSlowTests(unittest.TestCase):
pipeline_class = StableDiffusion3Pipeline
repo_id = "stabilityai/stable-diffusion-3-medium-diffusers"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import unittest

import numpy as np
import pytest
import torch
from transformers import AutoTokenizer, CLIPTextConfig, CLIPTextModelWithProjection, CLIPTokenizer, T5EncoderModel

Expand All @@ -16,7 +17,7 @@
from diffusers.utils.testing_utils import (
floats_tensor,
numpy_cosine_similarity_distance,
require_torch_gpu,
require_big_gpu_with_torch_cuda,
slow,
torch_device,
)
Expand Down Expand Up @@ -194,7 +195,8 @@ def test_multi_vae(self):


@slow
@require_torch_gpu
@require_big_gpu_with_torch_cuda
@pytest.mark.big_gpu_with_torch_cuda
class StableDiffusion3Img2ImgPipelineSlowTests(unittest.TestCase):
pipeline_class = StableDiffusion3Img2ImgPipeline
repo_id = "stabilityai/stable-diffusion-3-medium-diffusers"
Expand Down
4 changes: 4 additions & 0 deletions utils/print_env.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,10 @@
print("Cuda version:", torch.version.cuda)
print("CuDNN version:", torch.backends.cudnn.version())
print("Number of GPUs available:", torch.cuda.device_count())
if torch.cuda.is_available():
device_properties = torch.cuda.get_device_properties(0)
total_memory = device_properties.total_memory / (1024**3)
print(f"CUDA memory: {total_memory} GB")
except ImportError:
print("Torch version:", None)

Expand Down
Loading