Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tensor parallel inference unit tests #2232

Merged
merged 34 commits into from
Sep 9, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
770c59a
added mp test
mrwyattii Aug 17, 2022
706617f
Merge branch 'master' into add-mp-inf-tests
mrwyattii Aug 17, 2022
c0925a1
make text-gen compare DS output to baseline
mrwyattii Aug 18, 2022
36bb2c5
Merge branch 'master' into add-mp-inf-tests
mrwyattii Aug 18, 2022
f942b5a
Merge branch 'master' into add-mp-inf-tests
RezaYazdaniAminabadi Aug 23, 2022
adaabe0
remove non-gpt models from multigpu test
mrwyattii Aug 25, 2022
f9a7eae
add more models and fixes for OOM
mrwyattii Aug 26, 2022
606a669
added job ID to environment printout
mrwyattii Aug 26, 2022
aef7603
made multi-GPU tests sequential
mrwyattii Aug 26, 2022
27a9825
Merge branch 'master' into add-mp-inf-tests
mrwyattii Aug 26, 2022
7cb4136
update inference workflow with sequential tests
mrwyattii Aug 26, 2022
409af10
fix for multi-gpu
mrwyattii Aug 26, 2022
28c8102
fix for selecting inference sequential tests
mrwyattii Aug 26, 2022
185d4b9
change default to not include inference tests
mrwyattii Aug 26, 2022
5c84dca
restrict gptneox to half precision
mrwyattii Aug 26, 2022
6edb13b
increas world_size to 4 for multigpu
mrwyattii Aug 26, 2022
a75d5fe
Merge branch 'master' into add-mp-inf-tests
jeffra Aug 29, 2022
a79bb15
Merge branch 'master' into add-mp-inf-tests
jeffra Sep 1, 2022
2ab79f0
Merge branch 'master' into add-mp-inf-tests
samadejacobs Sep 1, 2022
19839cb
use fp16 revision of gpt-j
jeffra Sep 2, 2022
776be03
revert non-gptj
jeffra Sep 2, 2022
429c88d
use gpt-j fp16 revision
jeffra Sep 3, 2022
b6d2b3b
Merge branch 'master' into add-mp-inf-tests
jeffra Sep 3, 2022
af8d94d
Merge branch 'master' into add-mp-inf-tests
jeffra Sep 7, 2022
335406a
fix for broken tests
mrwyattii Sep 8, 2022
d0bb608
fix invalid model/task check
mrwyattii Sep 8, 2022
4678f05
correct gpt-neo test that should have been skipped
mrwyattii Sep 8, 2022
5b5e65a
add exception for bloom models failing to match text-gen output
mrwyattii Sep 8, 2022
19f2bf0
restrict number of tokens generated
mrwyattii Sep 8, 2022
7feac82
fix for opt models
mrwyattii Sep 8, 2022
cfdf629
Update test_inference.py
mrwyattii Sep 9, 2022
3e87d74
Merge branch 'master' into add-mp-inf-tests
mrwyattii Sep 9, 2022
42ccea5
formatting
mrwyattii Sep 9, 2022
db48d24
Merge branch 'master' into add-mp-inf-tests
mrwyattii Sep 9, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/amd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ jobs:
# Runs a single command using the runners shell
- name: environment
run: |
echo "JobID: $AISC_NODE_INSTANCE_ID"
rocm-smi --showhw
which python
python --version
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/nv-accelerate-v100.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ jobs:

- name: environment
run: |
echo "JobID: $AISC_NODE_INSTANCE_ID"
nvidia-smi
which python
python --version
Expand Down
4 changes: 3 additions & 1 deletion .github/workflows/nv-inference.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ jobs:

- name: environment
run: |
echo "JobID: $AISC_NODE_INSTANCE_ID"
nvidia-smi
which python
python --version
Expand All @@ -40,7 +41,7 @@ jobs:
git clone https://github.com/huggingface/transformers
cd transformers
# if needed switch to the last known good SHA until transformers@master is fixed
# git checkout 1cc453d33
git checkout v4.21.2
Comment on lines -43 to +44
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can change to this be latest master from HF, after we merge: #2291. But I want to re-test #2291 after this PR is merged.

git rev-parse --short HEAD
pip uninstall --yes transformers
pip install .
Expand All @@ -61,4 +62,5 @@ jobs:
if [[ -d ./torch-extensions ]]; then rm -rf ./torch-extensions; fi
cd tests
EXPECTED_TORCH=$(pip index versions torch | grep -oP -m1 "^\s*LATEST.*\s\K\d+\.\d+")
TRANSFORMERS_CACHE=/blob/transformers_cache/ TORCH_EXTENSIONS_DIR=./torch-extensions pytest --color=yes --durations=0 --verbose -m 'seq_inference' unit/ --torch_ver=$EXPECTED_TORCH --cuda_ver="11.3"
TRANSFORMERS_CACHE=/blob/transformers_cache/ TORCH_EXTENSIONS_DIR=./torch-extensions pytest --color=yes --durations=0 -n 4 --verbose -m 'inference' unit/ --torch_ver=$EXPECTED_TORCH --cuda_ver="11.3"
1 change: 1 addition & 0 deletions .github/workflows/nv-lightning-v100.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ jobs:

- name: environment
run: |
echo "JobID: $AISC_NODE_INSTANCE_ID"
nvidia-smi
which python
python --version
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/nv-nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ jobs:

- name: environment
run: |
echo "JobID: $AISC_NODE_INSTANCE_ID"
nvidia-smi
which python
python --version
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/nv-torch-latest-v100.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ jobs:

- name: environment
run: |
echo "JobID: $AISC_NODE_INSTANCE_ID"
nvidia-smi
which python
python --version
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/nv-torch-nightly-v100.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ jobs:

- name: environment
run: |
echo "JobID: $AISC_NODE_INSTANCE_ID"
nvidia-smi
which python
python --version
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/nv-torch12-p40.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ jobs:

- name: environment
run: |
echo "JobID: $AISC_NODE_INSTANCE_ID"
nvidia-smi
which python
python --version
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/nv-torch18-v100.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ jobs:

- name: environment
run: |
echo "JobID: $AISC_NODE_INSTANCE_ID"
nvidia-smi
which python
python --version
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/nv-transformers-v100.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ jobs:

- name: environment
run: |
echo "JobID: $AISC_NODE_INSTANCE_ID"
nvidia-smi
which python
python --version
Expand Down
3 changes: 2 additions & 1 deletion tests/pytest.ini
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
[pytest]
addopts = -m "not sequential and not nightly and not inference"
addopts = -m "not sequential and not nightly and not inference and not seq_inference"
markers =
sequential:Tests that need to be run sequentially
inference:Inference model tests
seq_inference:Inference model tests to run sequentially
nightly:Tests that should be run nightly
78 changes: 71 additions & 7 deletions tests/unit/inference/test_inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ def lm_eval_imports():
"distilgpt2",
"Norod78/hebrew-bad_wiki-gpt_neo-tiny",
"EleutherAI/gpt-j-6B",
"bigscience/bloom-350m",
"bigscience/bloom-560m",
]
_opt_models = [
"facebook/opt-125m", # 125m, 1.7B, ..., 175B variants have the same model architecture.
Expand Down Expand Up @@ -111,6 +111,7 @@ def enable_cuda_graph(request):
@pytest.fixture()
def invalid_model_task_config(model_w_task, dtype, enable_cuda_graph):
model, task = model_w_task
msg = ""
if pkg_version.parse(torch.__version__) <= pkg_version.parse("1.2"):
msg = "DS inference injection doesn't work well on older torch versions"
elif model not in pytest.all_models[task]:
Expand All @@ -120,10 +121,17 @@ def invalid_model_task_config(model_w_task, dtype, enable_cuda_graph):
elif enable_cuda_graph and pkg_version.parse(
torch.__version__) < pkg_version.parse("1.10"):
msg = "CUDA Graph is only available in torch versions >= 1.10"
elif ("gpt-j-6B" in model) and (dtype == torch.float):
elif "gpt-j-6B" in model:
if dtype != torch.half:
msg = f"Not enough GPU memory to run {model} with dtype {dtype}"
elif enable_cuda_graph:
msg = f"Not enough GPU memory to run {model} with CUDA Graph enabled"
elif "gpt-neox-20b" in model: # TODO: remove this when neox issues resolved
msg = "Skipping gpt-neox-20b for now"
elif ("gpt-neox-20b" in model) and (dtype != torch.half):
msg = f"Not enough GPU memory to run {model} with dtype {dtype}"
else:
msg = ""
elif ("bloom" in model) and (dtype != torch.half):
msg = f"Bloom models only support half precision, cannot use dtype {dtype}"
return msg


Expand Down Expand Up @@ -160,7 +168,7 @@ def query(model_w_task):
def inf_kwargs(model_w_task):
model, task = model_w_task
if task == "text-generation":
return {"do_sample": False}
return {"do_sample": False, "max_length": 20}
else:
return {}

Expand Down Expand Up @@ -228,7 +236,9 @@ def test(
local_rank = int(os.getenv("LOCAL_RANK", "0"))

if "gpt-j-6B" in model and dtype == torch.half:
_model = AutoModelForCausalLM.from_pretrained(model)
_model = AutoModelForCausalLM.from_pretrained(model,
revision="float16",
torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained(model)
_model.half()
pipe = pipeline(
Expand Down Expand Up @@ -269,7 +279,9 @@ def test(
torch.cuda.synchronize()
ds_time = time.time() - start

if task == "text-generation":
# facebook/opt* and some bigscient/bloom* models are not matching
# baseline exactly, adding an exception to them for now
if ("opt" in model) or ("bloom" in model):
bs_output = pipe(query, **inf_kwargs)

# These performance tests are only measuring the time for a single
Expand All @@ -278,6 +290,58 @@ def test(
assert assert_fn(bs_output, ds_output)


@pytest.mark.seq_inference
@pytest.mark.parametrize("model_w_task",
[("gpt2",
"text-generation"),
("EleutherAI/gpt-neox-20b",
"text-generation"),
("bigscience/bloom-3b",
"text-generation")],
ids=["gpt2",
"gpt-neox",
"bloom"])
mrwyattii marked this conversation as resolved.
Show resolved Hide resolved
class TestMPSize(DistributedTest):
world_size = 4

def test(
self,
model_w_task,
dtype,
enable_cuda_graph,
query,
inf_kwargs,
assert_fn,
invalid_model_task_config,
):
if invalid_model_task_config:
pytest.skip(invalid_model_task_config)

model, task = model_w_task
local_rank = int(os.getenv("LOCAL_RANK", "0"))

# We have to load these large models on CPU with pipeline because not
# enough GPU memory
pipe = pipeline(task, model=model, device=-1, framework="pt")
bs_output = pipe(query, **inf_kwargs)

pipe.model = deepspeed.init_inference(
pipe.model,
mp_size=self.world_size,
dtype=dtype,
replace_method="auto",
replace_with_kernel_inject=True,
enable_cuda_graph=enable_cuda_graph,
)
# Switch device to GPU so that input tensors are not on CPU
pipe.device = torch.device(f"cuda:{local_rank}")
ds_output = pipe(query, **inf_kwargs)

print(local_rank, "baseline", bs_output)
print(local_rank, "deepspeed", ds_output)
assert assert_fn(bs_output, ds_output)


@pytest.mark.nightly
@pytest.mark.parametrize(
"model_family, model_name",
Expand Down