-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about Project Status and Potential Contributions #1
Comments
@Hannibal046 Hi, yes, all of your understandings are correct.
Regarding post-training, I don't have extensive experience in this field yet. However, I'd be very glad if you could contribute in this area. I also plan to add support for post-training features in the future. |
@yzhangcs If you're planning to implement support for online data tokenization with shuffling, I'd like to share an elegant implementation from Meta Lingua for your reference. Their approach:
I'm not sure which specific features you need to implement, but relying solely on "2. online tokenization and reshuffling with a buffer" might not be sufficient for large-scale training. This is because some datasets from Hugging Face are chronologically ordered, and even with a large online buffer, the data would still be biased. I'm happy to help if you need any assistance! |
Thank you! I will be taking a look at it. |
I'd also like to ask if your experience extending from TorchTitan was smooth, or if you have any suggestions/requests for extensibility features. (e.g. making TorchTitan easier to build on). (I'm a TorchTitan developer) |
Hey @wconstab, thank you for developing this fantastic framework!
The goal of |
@yzhangcs Thank you for the nice feedbacks!
torchtitan currently depends on HF
In fact, DCP in general supports resharding (varying number of GPUs, or varying parallelisms) pretty well. It's the data loader which doesn't / makes it non-trivial to support resuming from resharding. However, if you don't need to load data in the same way (e.g. continue training using a new dataset), we currently have a PR to optionally not load data loader checkpoint. See pytorch/torchtitan#819
Any time. |
Hi Team,
First, I want to express my appreciation for maintaining this repository and fla. I'm finding both projects very valuable.
I have several questions about the project:
Project Status
Development Direction
Technical Architecture
From my understanding:
fla
for model definitionfla
's HuggingFace compatibility, it should work withlm-eval-harness
for evaluationCould you confirm if this understanding is correct?
Future Plans
Looking forward to your response and potentially contributing to the project.
Best regards
The text was updated successfully, but these errors were encountered: