Skip to content

Commit

Permalink
Update on "Reordered TP parallel plan to follow execution order"
Browse files Browse the repository at this point in the history
- Llama uses pre-norm (norm before attention and before FFN), so we can move these up.
- The root norm is before output, so we can swap this order too.



[ghstack-poisoned]
  • Loading branch information
awgu committed Jul 10, 2024
1 parent 6165d3d commit 74304ba
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions torchtitan/parallelisms/parallelize_llama.py
Original file line number Diff line number Diff line change
Expand Up @@ -332,6 +332,7 @@ def apply_tp(model, world_mesh, parallel_dims, job_config: JobConfig):
"""
Apply tensor parallelism.
"""

tp_mesh = world_mesh["tp"]
(
row_parallel_strategy,
Expand Down

0 comments on commit 74304ba

Please sign in to comment.