Add Dynamic Model Import and ModelSpec Definition #814

fegin · 2025-01-31T18:40:37Z

Stack from ghstack (oldest at bottom):

-> Add Dynamic Model Import and ModelSpec Definition #814

What does this PR do?

This PR introduce ModelSpec to decribe a model and how to parallelize a model.
All the models should define build_model_spec() or model_spec to be imported by the model module.
build_model_specs() is called in the trainer to get the model_specs and the result is used to get the corresponding model spec.
Users can also use --experimental.model_module_path to dynamically import a model that is not implemented by TorchTitan.

Why do we need this PR?
This allows users to use TorchTitan with a new model without intrusively change TorchTitan code.

Next steps

This PR only include the mode definitions, configurations, totkenizer, parallize_fn, and pipelining_fn. We may also want to extend ModelSpec to include optimizer and lr_scheduler
Current TorchTitan parallelize and pipelining_fn import ModelArgs which can cause circular imports. We should fix this issue.

What does this PR do?

Introduces ModelSpec to describe a model and how to parallelize it.
Requires all models to define build_model_spec() or model_spec, which will be imported by the model module.
Calls build_model_specs() in the trainer to obtain model_specs, which are then used to retrieve the corresponding model spec.
Allows users to dynamically import a model not implemented by TorchTitan using --experimental.model_module_path.

Why do we need this PR?
This PR enables users to integrate new models with TorchTitan without making intrusive changes to the TorchTitan codebase.

Next steps

This PR includes only the model definitions, configurations, tokenizer, parallelize_fn, and pipelining_fn. We may want to extend ModelSpec to include the optimizer and learning rate scheduler.
The current TorchTitan parallelize and pipelining_fn import ModelArgs, which can lead to circular imports. This issue needs to be addressed.

[ghstack-poisoned]

**What does this PR do?** 1. This PR introduce ModelSpec to decribe a model and how to parallelize a model. 2. All the models should define `build_model_spec()` or `model_spec` to be imported by the `model` module. 3. `build_model_specs()` is called in the trainer to get the `model_specs` and the result is used to get the corresponding model spec. 4. Users can also use `--experimental.model_module_path` to dynamically import a model that is not implemented by TorchTitan. **Why do we need this PR?** This allows users to use TorchTitan with a new model without intrusively change TorchTitan code. **Next steps** 1. This PR only include the mode definitions, configurations, totkenizer, parallize_fn, and pipelining_fn. We may also want to extend ModelSpec to include optimizer and lr_scheduler 2. Current TorchTitan parallelize and pipelining_fn import ModelArgs which can cause circular imports. We should fix this issue. **What does this PR do?** 1. Introduces `ModelSpec` to describe a model and how to parallelize it. 2. Requires all models to define `build_model_spec()` or `model_spec`, which will be imported by the model module. 3. Calls `build_model_specs()` in the trainer to obtain `model_specs`, which are then used to retrieve the corresponding model spec. 4. Allows users to dynamically import a model not implemented by TorchTitan using --experimental.model_module_path. **Why do we need this PR?** This PR enables users to integrate new models with TorchTitan without making intrusive changes to the TorchTitan codebase. **Next steps** 1. This PR includes only the model definitions, configurations, tokenizer, parallelize_fn, and pipelining_fn. We may want to extend ModelSpec to include the optimizer and learning rate scheduler. 2. The current TorchTitan parallelize and pipelining_fn import ModelArgs, which can lead to circular imports. This issue needs to be addressed. ghstack-source-id: f0847f5efebfdf8c6619f58c1b0131a233502eaf Pull Request resolved: #814

[ghstack-poisoned]

**What does this PR do?** 1. This PR introduce ModelSpec to decribe a model and how to parallelize a model. 2. All the models should define `build_model_spec()` or `model_spec` to be imported by the `model` module. 3. `build_model_specs()` is called in the trainer to get the `model_specs` and the result is used to get the corresponding model spec. 4. Users can also use `--experimental.model_module_path` to dynamically import a model that is not implemented by TorchTitan. **Why do we need this PR?** This allows users to use TorchTitan with a new model without intrusively change TorchTitan code. **Next steps** 1. This PR only include the mode definitions, configurations, totkenizer, parallize_fn, and pipelining_fn. We may also want to extend ModelSpec to include optimizer and lr_scheduler 2. Current TorchTitan parallelize and pipelining_fn import ModelArgs which can cause circular imports. We should fix this issue. **What does this PR do?** 1. Introduces `ModelSpec` to describe a model and how to parallelize it. 2. Requires all models to define `build_model_spec()` or `model_spec`, which will be imported by the model module. 3. Calls `build_model_specs()` in the trainer to obtain `model_specs`, which are then used to retrieve the corresponding model spec. 4. Allows users to dynamically import a model not implemented by TorchTitan using --experimental.model_module_path. **Why do we need this PR?** This PR enables users to integrate new models with TorchTitan without making intrusive changes to the TorchTitan codebase. **Next steps** 1. This PR includes only the model definitions, configurations, tokenizer, parallelize_fn, and pipelining_fn. We may want to extend ModelSpec to include the optimizer and learning rate scheduler. 2. The current TorchTitan parallelize and pipelining_fn import ModelArgs, which can lead to circular imports. This issue needs to be addressed. ghstack-source-id: 28259eb74975eeb7ad790a774b6e719f3aa19a31 Pull Request resolved: #814

[ghstack-poisoned]

**What does this PR do?** 1. This PR introduce ModelSpec to decribe a model and how to parallelize a model. 2. All the models should define `build_model_spec()` or `model_spec` to be imported by the `model` module. 3. `build_model_specs()` is called in the trainer to get the `model_specs` and the result is used to get the corresponding model spec. 4. Users can also use `--experimental.model_module_path` to dynamically import a model that is not implemented by TorchTitan. **Why do we need this PR?** This allows users to use TorchTitan with a new model without intrusively change TorchTitan code. **Next steps** 1. This PR only include the mode definitions, configurations, totkenizer, parallize_fn, and pipelining_fn. We may also want to extend ModelSpec to include optimizer and lr_scheduler 2. Current TorchTitan parallelize and pipelining_fn import ModelArgs which can cause circular imports. We should fix this issue. **What does this PR do?** 1. Introduces `ModelSpec` to describe a model and how to parallelize it. 2. Requires all models to define `build_model_spec()` or `model_spec`, which will be imported by the model module. 3. Calls `build_model_specs()` in the trainer to obtain `model_specs`, which are then used to retrieve the corresponding model spec. 4. Allows users to dynamically import a model not implemented by TorchTitan using --experimental.model_module_path. **Why do we need this PR?** This PR enables users to integrate new models with TorchTitan without making intrusive changes to the TorchTitan codebase. **Next steps** 1. This PR includes only the model definitions, configurations, tokenizer, parallelize_fn, and pipelining_fn. We may want to extend ModelSpec to include the optimizer and learning rate scheduler. 2. The current TorchTitan parallelize and pipelining_fn import ModelArgs, which can lead to circular imports. This issue needs to be addressed. ghstack-source-id: ba1389f57808b1c6b309f554a675523d09395b42 Pull Request resolved: #814

[ghstack-poisoned]

**What does this PR do?** 1. This PR introduce ModelSpec to decribe a model and how to parallelize a model. 2. All the models should define `build_model_spec()` or `model_spec` to be imported by the `model` module. 3. `build_model_specs()` is called in the trainer to get the `model_specs` and the result is used to get the corresponding model spec. 4. Users can also use `--experimental.model_module_path` to dynamically import a model that is not implemented by TorchTitan. **Why do we need this PR?** This allows users to use TorchTitan with a new model without intrusively change TorchTitan code. **Next steps** 1. This PR only include the mode definitions, configurations, totkenizer, parallize_fn, and pipelining_fn. We may also want to extend ModelSpec to include optimizer and lr_scheduler 2. Current TorchTitan parallelize and pipelining_fn import ModelArgs which can cause circular imports. We should fix this issue. **What does this PR do?** 1. Introduces `ModelSpec` to describe a model and how to parallelize it. 2. Requires all models to define `build_model_spec()` or `model_spec`, which will be imported by the model module. 3. Calls `build_model_specs()` in the trainer to obtain `model_specs`, which are then used to retrieve the corresponding model spec. 4. Allows users to dynamically import a model not implemented by TorchTitan using --experimental.model_module_path. **Why do we need this PR?** This PR enables users to integrate new models with TorchTitan without making intrusive changes to the TorchTitan codebase. **Next steps** 1. This PR includes only the model definitions, configurations, tokenizer, parallelize_fn, and pipelining_fn. We may want to extend ModelSpec to include the optimizer and learning rate scheduler. 2. The current TorchTitan parallelize and pipelining_fn import ModelArgs, which can lead to circular imports. This issue needs to be addressed. ghstack-source-id: a88ff3ebe5c869055dd3314fb1b791855fd0e0b2 Pull Request resolved: #814

[ghstack-poisoned]

**What does this PR do?** 1. This PR introduce ModelSpec to decribe a model and how to parallelize a model. 2. All the models should define `build_model_spec()` or `model_spec` to be imported by the `model` module. 3. `build_model_specs()` is called in the trainer to get the `model_specs` and the result is used to get the corresponding model spec. 4. Users can also use `--experimental.model_module_path` to dynamically import a model that is not implemented by TorchTitan. **Why do we need this PR?** This allows users to use TorchTitan with a new model without intrusively change TorchTitan code. **Next steps** 1. This PR only include the mode definitions, configurations, totkenizer, parallize_fn, and pipelining_fn. We may also want to extend ModelSpec to include optimizer and lr_scheduler 2. Current TorchTitan parallelize and pipelining_fn import ModelArgs which can cause circular imports. We should fix this issue. **What does this PR do?** 1. Introduces `ModelSpec` to describe a model and how to parallelize it. 2. Requires all models to define `build_model_spec()` or `model_spec`, which will be imported by the model module. 3. Calls `build_model_specs()` in the trainer to obtain `model_specs`, which are then used to retrieve the corresponding model spec. 4. Allows users to dynamically import a model not implemented by TorchTitan using --experimental.model_module_path. **Why do we need this PR?** This PR enables users to integrate new models with TorchTitan without making intrusive changes to the TorchTitan codebase. **Next steps** 1. This PR includes only the model definitions, configurations, tokenizer, parallelize_fn, and pipelining_fn. We may want to extend ModelSpec to include the optimizer and learning rate scheduler. 2. The current TorchTitan parallelize and pipelining_fn import ModelArgs, which can lead to circular imports. This issue needs to be addressed. ghstack-source-id: 362df77a3f6a2b9f3cff514938a415bfe25e2100 Pull Request resolved: #814

tianyu-l

Initial pass looks great. Had some suggestions on restructuring.

tianyu-l · 2025-01-31T20:03:05Z

torchtitan/models/llama/model.py

@@ -258,7 +259,7 @@ def init_weights(self, init_std: float):
            nn.init.trunc_normal_(linear.weight, mean=0.0, std=init_std)


-class TransformerBlock(nn.Module):
+class TransformerBlock(nn.Module, ModelProtocol):


should be on Transformer, not TransformerBlock

tianyu-l · 2025-01-31T20:10:52Z

torchtitan/models/__init__.py

-    "llama3": "tiktoken",
-}
+
+_model_specs_path: Set[str] = set()


nit

Suggested change

_model_specs_path: Set[str] = set()

_model_spec_paths: Set[str] = set()

tianyu-l · 2025-01-31T20:12:41Z

torchtitan/models/__init__.py

+        if os.path.isdir(path):
+            init_file = os.path.join(path, "__init__.py")
+            if os.path.isfile(init_file):
+                return _load_module_from_init(path)


sorry if this is noob question: don't we need to put this function before _load_module?

tianyu-l · 2025-01-31T20:19:40Z

torchtitan/config_manager.py

+        ):
+            from torchtitan.models import add_model_spec_path
+
+            add_model_spec_path(args_dict["experimental"]["model_module_path"])


may I ask why putting it here, instead of in build_model_specs? Is it because you think it's better to fail early? I think in general in torchtitan, we are following the idea that we try to put fail check close to where something is being used, whose main benefit is that the checking and functioning are less scattered.

tianyu-l · 2025-01-31T20:32:06Z

torchtitan/models/llama/__init__.py

+
+
+def build_model_spec() -> ModelSpec:
+    # Avoid circular import


I suggest we do restructuring of the repo to make it more logical. E.g. below is an example, can be renamed
torchtitan/examples includes llama, llama_multimodal, etc. as folders
torchtitan/example/llama includes model folder and parallelize_llama folder/file.

The original parallelisms folder can stay there including parallel_dims.py and common utils.

You'd avoid circular import, if we put ModelSpec in the llama folder, instead of in the model folder here.

Please add unit tests, and documentation with examples in docs/extension.md

fduwjj · 2025-01-31T21:34:33Z

torchtitan/config_manager.py

+            default="",
+            help="""
+                The --custom_model_path option allows to specify a custom path to a model module
+


nit: is this expected?

fduwjj · 2025-01-31T22:37:15Z

torchtitan/models/llama/model.py

 from torchtitan.models.norms import build_norm


 @dataclass
-class ModelArgs:
+class ModelArgs(BaseModelArgs):


Down the road we will have many models, like MM model. Do we want all model args to inherit this? Currently we use different model args for different model arch.

fduwjj · 2025-01-31T22:40:11Z

torchtitan/models/__init__.py

+    if os.path.exists(path):
+        if os.path.isdir(path):
+            init_file = os.path.join(path, "__init__.py")
+            if os.path.isfile(init_file):
+                return _load_module_from_init(path)


maybe let's do the assert first to avoid a nested if for better readability?

fduwjj · 2025-01-31T22:40:33Z

torchtitan/models/__init__.py

+    if spec is None:
+        raise ImportError(f"Could not create spec from '{init_file}'")


ditto: directly assert?

fduwjj · 2025-01-31T22:45:47Z

torchtitan/models/__init__.py

+    return module
+
+
+for _, name, _ in pkgutil.iter_modules(models.__path__):


why do we have a global for loop here?

I think this is for importing the models in models folder, i.e. "official" models in torchtitan

Update

df1bc6a

[ghstack-poisoned]

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 31, 2025

fegin requested review from tianyu-l, wconstab and fduwjj January 31, 2025 18:40

fegin changed the title ~~Allow users to use the customized model~~ Add Dynamic Model Import and ModelSpec Definition Jan 31, 2025

Update

dfc1649

[ghstack-poisoned]

Update

720f12a

[ghstack-poisoned]

Update

225bfcc

[ghstack-poisoned]

Update

650152e

[ghstack-poisoned]

tianyu-l reviewed Jan 31, 2025

View reviewed changes

fduwjj reviewed Jan 31, 2025

View reviewed changes

xffxff mentioned this pull request Feb 3, 2025

[RFC] Implement model-specific 4d parallelism fla-org/flash-linear-attention#148

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Dynamic Model Import and ModelSpec Definition #814

Add Dynamic Model Import and ModelSpec Definition #814

fegin commented Jan 31, 2025 •

edited

Loading

tianyu-l left a comment

tianyu-l Jan 31, 2025

tianyu-l Jan 31, 2025

tianyu-l Jan 31, 2025

tianyu-l Jan 31, 2025

tianyu-l Jan 31, 2025

tianyu-l Jan 31, 2025

fduwjj Jan 31, 2025

fduwjj Jan 31, 2025

fduwjj Jan 31, 2025

fduwjj Jan 31, 2025

fduwjj Jan 31, 2025

tianyu-l Jan 31, 2025

	_model_specs_path: Set[str] = set()
	_model_spec_paths: Set[str] = set()

		if spec is None:
		raise ImportError(f"Could not create spec from '{init_file}'")

		return module


		for _, name, _ in pkgutil.iter_modules(models.__path__):

Add Dynamic Model Import and ModelSpec Definition #814

Are you sure you want to change the base?

Add Dynamic Model Import and ModelSpec Definition #814

Conversation

fegin commented Jan 31, 2025 • edited Loading

tianyu-l left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fegin commented Jan 31, 2025 •

edited

Loading