-
Notifications
You must be signed in to change notification settings - Fork 505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to avoid compilation in a section of code? #7622
Comments
Great question. I have a couple questions and a couple suggestions Question
Suggestion
For nightly you can try use
since last nightl's nightly seems to be broken. Eager mode pretty much just compile op by op. It will compile each op once for each input shape, the overall compile time is usually lower. Let me know how above 2 suggestions work for you. |
Thank you for the instructions! The eager mode looks very promising, however, I'm unable to install the nightly
|
hmm that's weird, can you access |
I can access it, and it's 3.10. But the issue is still there
|
hmm I can't repo this, which is a bit wierd. Maybe manually renamed the whl? something like
|
Some updates. On reproducing the installation issue
Given a clean
|
On Eager Mode
Unfortunately, the code hangs and never reaches |
can you run with |
As I was facing the same issue with |
❓ Questions and Help
We are using Pytorch XLA w/ TPU to train a multi-modal language models.
We can make most of the code, such as image encoding and the forward pass in the LLM backbone, in a static shape, which XLA handles well. However, making the part that fuses image and text embeddings into the input embedding static is extremely challenging.
Currently, we use
mark_step
to isolate that section from the rest of the code, allowing it to recompile each time. Although this part is very computationally light, the recompilation is extremely slow and often consumes the majority of training time.We find documentation on this issue very hard to find, and we are exploring better solutions, such as running that part on the CPU, in eager mode, or not saving that part of the graph to avoid OOM errors during long training runs. We wonder if you have any suggestions/pointers on how to workaround this inefficiency?
Following is a pesudo code to illustrate our problem
The text was updated successfully, but these errors were encountered: