Skip to content

Commit

Permalink
Update on "Add the option to turn on async-TP"
Browse files Browse the repository at this point in the history
This PR adds the option to turn on async-TP (`--experimental.enable_async_tensor_parallel`). The feature is currently implemented as compiler passes on relevant patterns, so the option is currently only effective when compile is enabled.

Some trace samples from llama3_70b with tp degree=8:

**all-gather -> qkv projection**
Baseline:
<img width="420" alt="image" src="https://github.com/pytorch/torchtitan/assets/4156752/df6980c3-4a2f-4455-bdd3-9079b538123f">

With async-TP:
<img width="513" alt="image" src="https://github.com/pytorch/torchtitan/assets/4156752/635c3dee-660d-4452-809b-32620343080a">

**ffn -> reduce-scater**
Baseline:
<img width="537" alt="image" src="https://github.com/pytorch/torchtitan/assets/4156752/6b045c84-48df-4798-a786-4f57e3f4345a">

With async-TP:
<img width="451" alt="image" src="https://github.com/pytorch/torchtitan/assets/4156752/63f13859-97f7-48ea-aef6-4e8861b207ac">

**all-gather -> ffn**
Baseline:
<img width="494" alt="image" src="https://github.com/pytorch/torchtitan/assets/4156752/b1636055-9b5b-43b1-b98e-b91f06af995e">

With async-TP:
<img width="536" alt="image" src="https://github.com/pytorch/torchtitan/assets/4156752/3edaedf4-3780-423d-ba86-5aa1cc5e69df">

[ghstack-poisoned]
  • Loading branch information
yifuwang committed Jun 26, 2024
1 parent 236aa92 commit 6fde13b
Showing 1 changed file with 0 additions and 5 deletions.
5 changes: 0 additions & 5 deletions train.py
Original file line number Diff line number Diff line change
Expand Up @@ -252,11 +252,6 @@ def loss_fn(pred, labels):
for m in model_parts
]

# for ease of testing TP in lieu of FSDP
if job_config.training.tensor_parallel_degree == world_size:
for model in model_parts:
model.to(torch.bfloat16)

init_device = "cpu" if job_config.checkpoint.create_seed_checkpoint else "cuda"
for model in model_parts:
model.to_empty(device=init_device)
Expand Down

0 comments on commit 6fde13b

Please sign in to comment.