Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add contributing guidelines #525

Merged
merged 5 commits into from
Aug 23, 2024
Merged

Conversation

tianyu-l
Copy link
Contributor

@tianyu-l tianyu-l commented Aug 17, 2024

Stack from ghstack (oldest at bottom):

As titled. Hope these guidelines could help clarify what & how to contribute to torchtitan, and make the repo more self-service.

This PR also updates some other docs in terms of formatting and removing outdated info.

[ghstack-poisoned]
tianyu-l added a commit that referenced this pull request Aug 17, 2024
ghstack-source-id: 4c48f7af54500e4593550680c1b0688687e33e55
Pull Request resolved: #525
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 17, 2024
CONTRIBUTING.md Outdated
- 2D FSDP + TP – If 1D FSDP does not suffice to make comparisons due to limited scalability. For example, this should be the baseline when experimenting with 3D parallelisms on the Llama 3.1 405B model.

#### Performance
- Memory and WPS / MFU, which are available from logging, should meet expectations. If necessary, verify the numbers on jobs spanning multiple nodes (e.g. on 64 GPUs).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the expectations for the performance? Accuracy is easy to justify (the same or the loss curve is comparable). What's the WPS / MFU expectation? For example, if a technique can reduce the memory with no better or worse WPS / MFU, the feature should still be acceptable because it allows to train larger models.

My suggestions is to clarify the expectation of WPS / MFU. If a feature is intended to address memory issue, it may not be the best to compare with the best MFU.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the WPS / MFU expectation? For example, if a technique can reduce the memory with no better or worse WPS / MFU, the feature should still be acceptable because it allows to train larger models.

Agree that for different techniques, the expectation should be different. This doesn't mean we can accept arbitrary WPS/MFU regression as long as there is any memory reduction. E.g. what if a technique is "supposed" to show 80% memory reduction with 10% MFU regression, but the PR only shows 40% memory reduction with 5% MFU regression. Should we accept the PR? Clearly not.

IMO the bar might need to be set case-by-case, and as long as the PR owner can justify it (whether by comparing with other similar implementations, or achieving theoretically optimal performance, etc.), we should accept. It's like a research paper should have a solid experiment section, and reviewers can make the judgement.

My suggestions is to clarify the expectation of WPS / MFU. If a feature is intended to address memory issue, it may not be the best to compare with the best MFU.

Sounds good, let me add something.

As titled. Hope these guidelines could help clarify what & how to contribute to torchtitan, and make the repo more self-service.

This PR also updates some other docs in terms of formatting and removing outdated info.

[ghstack-poisoned]
tianyu-l added a commit that referenced this pull request Aug 19, 2024
ghstack-source-id: 1d1212ad0c9bedb813e640bab853df2433fcd415
Pull Request resolved: #525
@tianyu-l tianyu-l requested a review from fegin August 20, 2024 18:23

Verified

This commit was signed with the committer’s verified signature. The key has expired.
archseer Blaž Hrastnik
As titled. Hope these guidelines could help clarify what & how to contribute to torchtitan, and make the repo more self-service.

This PR also updates some other docs in terms of formatting and removing outdated info.

[ghstack-poisoned]
tianyu-l added a commit that referenced this pull request Aug 23, 2024
ghstack-source-id: ab3631ecd961afcef64234ef74e965b3ffd4ce67
Pull Request resolved: #525
As titled. Hope these guidelines could help clarify what & how to contribute to torchtitan, and make the repo more self-service.

This PR also updates some other docs in terms of formatting and removing outdated info.

[ghstack-poisoned]
tianyu-l added a commit that referenced this pull request Aug 23, 2024
ghstack-source-id: 11610f286e9cf7b17ac781f1b02291bcc59254c1
Pull Request resolved: #525
* [torchtitan/checkpoint.py](https://github.com/pytorch/torchtitan/blob/main/torchtitan/checkpoint.py) - utils for saving/loading distributed checkpoints
* [torchtitan/float8.py](https://github.com/pytorch/torchtitan/blob/main/torchtitan/float8.py) - utils for applying Float8 techniques
* [torchtitan/models/llama/model.py](https://github.com/pytorch/torchtitan/blob/main/torchtitan/models/llama/model.py) - the Llama model definition (shared for Llama2 and Llama3 variants)
* [train.py](train.py) - the main training loop and high-level setup code
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh- nice- didn't know you could link this way :)


### Principles of contribution

- Apply PyTorch-native training techniques.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if we should justify more why we have this principle, or its enough to just state it...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the repo is sitting under pytorch lol?
I think this is beyond the scope of this PR. E.g. in the README first sentence, we set the tone "torchtitan is a proof-of-concept for Large-scale LLM training using native PyTorch." Maybe we can rethink about that.

Copy link
Contributor

@wconstab wconstab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall LGTM!

As titled. Hope these guidelines could help clarify what & how to contribute to torchtitan, and make the repo more self-service.

This PR also updates some other docs in terms of formatting and removing outdated info.

[ghstack-poisoned]
tianyu-l added a commit that referenced this pull request Aug 23, 2024
ghstack-source-id: 3ece57ae6d8dbf7ff66e3c41f1804ddb08078ba4
Pull Request resolved: #525
@tianyu-l tianyu-l merged commit 2e646e3 into gh/tianyu-l/20/base Aug 23, 2024
5 checks passed
tianyu-l added a commit that referenced this pull request Aug 23, 2024
ghstack-source-id: 3ece57ae6d8dbf7ff66e3c41f1804ddb08078ba4
Pull Request resolved: #525
@tianyu-l tianyu-l deleted the gh/tianyu-l/20/head branch August 23, 2024 23:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants