Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for larger scale submissions in Inference when moving from Preview to Available #176

Open
psyhtest opened this issue May 14, 2024 · 9 comments
Assignees
Labels
Next Meeting Item to be discussed in the next Working Group

Comments

@psyhtest
Copy link

A number of Preview systems in MLPerf Inference v4.0 used fewer cards than would be typical in production due to a limited availability of cards at the time. Rather than benchmarking the systems with exactly the same, atypical number of cards as in Preview, it would be desirable to benchmark them in a more typical configuration, with a higher number of cards. Of course, for Available submissions the performance per accelerator would still need to be demonstrated to be equal or better than in Preview submissions.

We have a similar provision in the submission policies, but at the moment it only covers Training:

On each of the benchmarks that are previewed and are Compatible, the Available submission must show equal or better performance (allowing for noise, for any changes to the benchmark definition) on all systems for Inference and across at least the smallest and the largest scale of the systems used for Preview submission on that benchmark for Training (e.g. Available Training submissions can be on scales smaller than the smallest and larger than the largest scale used for Preview submission).

@psyhtest psyhtest added the Next Meeting Item to be discussed in the next Working Group label May 14, 2024
@psyhtest psyhtest assigned mrmhodak and mrasquinha-g and unassigned mrmhodak May 14, 2024
@arjunsuresh
Copy link
Contributor

@psyhtest Previously we had similar discussions and there were issues raised on this (on a different context) when a single model is split across multiple GPUs and hence the performance per accelerator might be better on a larger scale system. So, a rule change for this might be tricky - but may be the WG can agree on the "similarity" of the available and preview systems?

@psyhtest
Copy link
Author

when a single model is split across multiple GPUs and hence the performance per accelerator might be better on a larger scale system

I agree how Offline may be affected, but could Server latency constraints counterbalance that?

@arjunsuresh
Copy link
Contributor

The same issue can happen for Server scenario too right? But if the model is not split across multiple GPUs may be we can do a rule proposal.

@arjunsuresh
Copy link
Contributor

If the preview submission was on say 4 accelerators and available submission is on 6 accelerators and as well as say 2 accelerators, and in both cases if the performance per accelerator is greater than that of the preview system, then I think the 4 accelerator submission may not be needed.

Also, another proposal can be doing just offline scenario (may be even open) for the same number of accelerators if a larger scale system is submitted as available.

@nv-ananjappa
Copy link

Instead of an amendment asking for permissions before submission, it might be worth having a change in the main rules itself to permit an available submission if both these conditions are satisfied:

  1. Number of accelerators used in available is equal or greater than preview system
    AND
  2. Scaling of the system performance in available over preview is linear or better given the number of accelerators in both.

@psyhtest
Copy link
Author

Another scenario to consider: having done a Preview submission with an old Available server (e.g. v5) equipped with new Preview accelerators, a submitter may want to do an Available submission with a newer server (e.g. v6) equipped with now Available accelerators.

@mrmhodak
Copy link
Contributor

WG notes:
Not permitted for power
Seek WG pre-approval for any other HW/system changes

@psyhtest to draft PR

@ashwin
Copy link

ashwin commented Jul 10, 2024

@psyhtest @mrmhodak @mrasquinha-g The deadline is close by. Can Inference submitters assume that this rule change applies to v4.1 and thus save some effort in their preview->available submissions? What is the conclusion?

@arjunsuresh
Copy link
Contributor

@ashwin I think it is better if the submitter requests and get a waiver from the WG for v4.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Next Meeting Item to be discussed in the next Working Group
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants