-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable llama_v2_7b_16h on optim benchmarks as they currently OOM #1792
Conversation
@janeyx99 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering how much value does it add since we already have llama_v2_7b_16h. My understanding is neither llama_v2_7b_16h nor llama_v2_7b_8h has train test, how it can be useful in the optimizer userbenchmark?
super().__init__(name="llama_v2_7b_8h", test=test, device=device, batch_size=batch_size, extra_args=extra_args) | ||
|
||
def train(self): | ||
return NotImplementedError("7b LLAMA model will OOM on CI GPU machines") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently our criteria to accept model is that the eager mode test does not OOM on CI GPU machines (A10G and A100, respectively). We cannot guarantee that they won't OOM on the optim userbenchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, this makes sense to me, which is why this required more of a review. I think the right course of action then is to disable llama_v2_7b_16h on optim benchmarks until train is supported.
dc9a444
to
4f19a13
Compare
@janeyx99 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We can add the test back when train is supported on this model.
Fixes #1791
Test plan:
https://github.com/pytorch/benchmark/actions/runs/5693564120 a run of the optim benchmarks