add contributing guidelines #525

tianyu-l · 2024-08-17T00:04:28Z

Stack from ghstack (oldest at bottom):

-> add contributing guidelines #525

As titled. Hope these guidelines could help clarify what & how to contribute to torchtitan, and make the repo more self-service.

This PR also updates some other docs in terms of formatting and removing outdated info.

[ghstack-poisoned]

ghstack-source-id: 4c48f7af54500e4593550680c1b0688687e33e55 Pull Request resolved: #525

fegin · 2024-08-19T17:29:43Z

CONTRIBUTING.md

+  - 2D FSDP + TP – If 1D FSDP does not suffice to make comparisons due to limited scalability. For example, this should be the baseline when experimenting with 3D parallelisms on the Llama 3.1 405B model.
+
+#### Performance
+- Memory and WPS / MFU, which are available from logging, should meet expectations. If necessary, verify the numbers on jobs spanning multiple nodes (e.g. on 64 GPUs).


What's the expectations for the performance? Accuracy is easy to justify (the same or the loss curve is comparable). What's the WPS / MFU expectation? For example, if a technique can reduce the memory with no better or worse WPS / MFU, the feature should still be acceptable because it allows to train larger models.

My suggestions is to clarify the expectation of WPS / MFU. If a feature is intended to address memory issue, it may not be the best to compare with the best MFU.

What's the WPS / MFU expectation? For example, if a technique can reduce the memory with no better or worse WPS / MFU, the feature should still be acceptable because it allows to train larger models.

Agree that for different techniques, the expectation should be different. This doesn't mean we can accept arbitrary WPS/MFU regression as long as there is any memory reduction. E.g. what if a technique is "supposed" to show 80% memory reduction with 10% MFU regression, but the PR only shows 40% memory reduction with 5% MFU regression. Should we accept the PR? Clearly not.

IMO the bar might need to be set case-by-case, and as long as the PR owner can justify it (whether by comparing with other similar implementations, or achieving theoretically optimal performance, etc.), we should accept. It's like a research paper should have a solid experiment section, and reviewers can make the judgement.

My suggestions is to clarify the expectation of WPS / MFU. If a feature is intended to address memory issue, it may not be the best to compare with the best MFU.

Sounds good, let me add something.

As titled. Hope these guidelines could help clarify what & how to contribute to torchtitan, and make the repo more self-service. This PR also updates some other docs in terms of formatting and removing outdated info. [ghstack-poisoned]

ghstack-source-id: 1d1212ad0c9bedb813e640bab853df2433fcd415 Pull Request resolved: #525

As titled. Hope these guidelines could help clarify what & how to contribute to torchtitan, and make the repo more self-service. This PR also updates some other docs in terms of formatting and removing outdated info. [ghstack-poisoned]

ghstack-source-id: ab3631ecd961afcef64234ef74e965b3ffd4ce67 Pull Request resolved: #525

As titled. Hope these guidelines could help clarify what & how to contribute to torchtitan, and make the repo more self-service. This PR also updates some other docs in terms of formatting and removing outdated info. [ghstack-poisoned]

ghstack-source-id: 11610f286e9cf7b17ac781f1b02291bcc59254c1 Pull Request resolved: #525

CONTRIBUTING.md

wconstab · 2024-08-23T23:18:39Z

README.md

-* [torchtitan/checkpoint.py](https://github.com/pytorch/torchtitan/blob/main/torchtitan/checkpoint.py) - utils for saving/loading distributed checkpoints
-* [torchtitan/float8.py](https://github.com/pytorch/torchtitan/blob/main/torchtitan/float8.py) - utils for applying Float8 techniques
-* [torchtitan/models/llama/model.py](https://github.com/pytorch/torchtitan/blob/main/torchtitan/models/llama/model.py) - the Llama model definition (shared for Llama2 and Llama3 variants)
+* [train.py](train.py) - the main training loop and high-level setup code


oh- nice- didn't know you could link this way :)

wconstab · 2024-08-23T23:21:23Z

CONTRIBUTING.md

+
+### Principles of contribution
+
+- Apply PyTorch-native training techniques.


i wonder if we should justify more why we have this principle, or its enough to just state it...

because the repo is sitting under pytorch lol?
I think this is beyond the scope of this PR. E.g. in the README first sentence, we set the tone "torchtitan is a proof-of-concept for Large-scale LLM training using native PyTorch." Maybe we can rethink about that.

wconstab

overall LGTM!

As titled. Hope these guidelines could help clarify what & how to contribute to torchtitan, and make the repo more self-service. This PR also updates some other docs in terms of formatting and removing outdated info. [ghstack-poisoned]

ghstack-source-id: 3ece57ae6d8dbf7ff66e3c41f1804ddb08078ba4 Pull Request resolved: #525

add contributing guidelines

ed63fc8

[ghstack-poisoned]

tianyu-l added a commit that referenced this pull request Aug 17, 2024

add contributing guidelines

c39e7b0

ghstack-source-id: 4c48f7af54500e4593550680c1b0688687e33e55 Pull Request resolved: #525

facebook-github-bot added the CLA Signed label Aug 17, 2024

tianyu-l requested review from wconstab, wanchaol, gnadathur and awgu August 17, 2024 00:13

fegin reviewed Aug 19, 2024

View reviewed changes

Update on "add contributing guidelines"

315095f

As titled. Hope these guidelines could help clarify what & how to contribute to torchtitan, and make the repo more self-service. This PR also updates some other docs in terms of formatting and removing outdated info. [ghstack-poisoned]

tianyu-l added a commit that referenced this pull request Aug 19, 2024

add contributing guidelines

1830108

ghstack-source-id: 1d1212ad0c9bedb813e640bab853df2433fcd415 Pull Request resolved: #525

tianyu-l requested a review from fegin August 20, 2024 18:23

tianyu-l added a commit that referenced this pull request Aug 23, 2024

add contributing guidelines

cc62219

ghstack-source-id: ab3631ecd961afcef64234ef74e965b3ffd4ce67 Pull Request resolved: #525

Update on "add contributing guidelines"

c98e3d6

As titled. Hope these guidelines could help clarify what & how to contribute to torchtitan, and make the repo more self-service. This PR also updates some other docs in terms of formatting and removing outdated info. [ghstack-poisoned]

tianyu-l added a commit that referenced this pull request Aug 23, 2024

add contributing guidelines

150cc64

ghstack-source-id: 11610f286e9cf7b17ac781f1b02291bcc59254c1 Pull Request resolved: #525

wconstab reviewed Aug 23, 2024

View reviewed changes

CONTRIBUTING.md Outdated Show resolved Hide resolved

wconstab reviewed Aug 23, 2024

View reviewed changes

CONTRIBUTING.md Outdated Show resolved Hide resolved

wconstab reviewed Aug 23, 2024

View reviewed changes

wconstab approved these changes Aug 23, 2024

View reviewed changes

Update on "add contributing guidelines"

2e646e3

As titled. Hope these guidelines could help clarify what & how to contribute to torchtitan, and make the repo more self-service. This PR also updates some other docs in terms of formatting and removing outdated info. [ghstack-poisoned]

tianyu-l added a commit that referenced this pull request Aug 23, 2024

add contributing guidelines

559e2a8

ghstack-source-id: 3ece57ae6d8dbf7ff66e3c41f1804ddb08078ba4 Pull Request resolved: #525

tianyu-l merged commit 2e646e3 into gh/tianyu-l/20/base Aug 23, 2024
5 checks passed

tianyu-l added a commit that referenced this pull request Aug 23, 2024

add contributing guidelines

8c497b7

ghstack-source-id: 3ece57ae6d8dbf7ff66e3c41f1804ddb08078ba4 Pull Request resolved: #525

tianyu-l deleted the gh/tianyu-l/20/head branch August 23, 2024 23:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add contributing guidelines #525

add contributing guidelines #525

tianyu-l commented Aug 17, 2024 •

edited

Loading

fegin Aug 19, 2024

tianyu-l Aug 19, 2024

wconstab Aug 23, 2024

wconstab Aug 23, 2024

tianyu-l Aug 23, 2024

wconstab left a comment


		### Principles of contribution

		- Apply PyTorch-native training techniques.

add contributing guidelines #525

add contributing guidelines #525

Conversation

tianyu-l commented Aug 17, 2024 • edited Loading

fegin Aug 19, 2024

Choose a reason for hiding this comment

tianyu-l Aug 19, 2024

Choose a reason for hiding this comment

wconstab Aug 23, 2024

Choose a reason for hiding this comment

wconstab Aug 23, 2024

Choose a reason for hiding this comment

tianyu-l Aug 23, 2024

Choose a reason for hiding this comment

wconstab left a comment

Choose a reason for hiding this comment

tianyu-l commented Aug 17, 2024 •

edited

Loading