Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable llama_v2_7b_16h on optim benchmarks as they currently OOM #1792

Closed
wants to merge 2 commits into from

Conversation

janeyx99
Copy link
Contributor

@janeyx99 janeyx99 commented Jul 28, 2023

Fixes #1791

Test plan:
https://github.com/pytorch/benchmark/actions/runs/5693564120 a run of the optim benchmarks

@facebook-github-bot
Copy link
Contributor

@janeyx99 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@xuzhao9 xuzhao9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering how much value does it add since we already have llama_v2_7b_16h. My understanding is neither llama_v2_7b_16h nor llama_v2_7b_8h has train test, how it can be useful in the optimizer userbenchmark?

super().__init__(name="llama_v2_7b_8h", test=test, device=device, batch_size=batch_size, extra_args=extra_args)

def train(self):
return NotImplementedError("7b LLAMA model will OOM on CI GPU machines")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently our criteria to accept model is that the eager mode test does not OOM on CI GPU machines (A10G and A100, respectively). We cannot guarantee that they won't OOM on the optim userbenchmark.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, this makes sense to me, which is why this required more of a review. I think the right course of action then is to disable llama_v2_7b_16h on optim benchmarks until train is supported.

@janeyx99 janeyx99 changed the title Add llama_v2_7b_8h smaller model to avoid OOMs Disable llama_v2_7b_16h on optim benchmarks as they currently OOM Jul 28, 2023
@facebook-github-bot
Copy link
Contributor

@janeyx99 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@xuzhao9 xuzhao9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We can add the test back when train is supported on this model.

@facebook-github-bot
Copy link
Contributor

@janeyx99 merged this pull request in 561abe3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optim Perf Signal Detected by TorchBench CI on '8a24a912a5f545d18059b59629aa3598f3783f25'
3 participants