Allow for larger scale submissions in Inference when moving from Preview to Available #176

psyhtest · 2024-05-14T10:00:52Z

A number of Preview systems in MLPerf Inference v4.0 used fewer cards than would be typical in production due to a limited availability of cards at the time. Rather than benchmarking the systems with exactly the same, atypical number of cards as in Preview, it would be desirable to benchmark them in a more typical configuration, with a higher number of cards. Of course, for Available submissions the performance per accelerator would still need to be demonstrated to be equal or better than in Preview submissions.

We have a similar provision in the submission policies, but at the moment it only covers Training:

On each of the benchmarks that are previewed and are Compatible, the Available submission must show equal or better performance (allowing for noise, for any changes to the benchmark definition) on all systems for Inference and across at least the smallest and the largest scale of the systems used for Preview submission on that benchmark for Training (e.g. Available Training submissions can be on scales smaller than the smallest and larger than the largest scale used for Preview submission).

arjunsuresh · 2024-05-14T15:43:51Z

@psyhtest Previously we had similar discussions and there were issues raised on this (on a different context) when a single model is split across multiple GPUs and hence the performance per accelerator might be better on a larger scale system. So, a rule change for this might be tricky - but may be the WG can agree on the "similarity" of the available and preview systems?

psyhtest · 2024-05-14T15:52:01Z

when a single model is split across multiple GPUs and hence the performance per accelerator might be better on a larger scale system

I agree how Offline may be affected, but could Server latency constraints counterbalance that?

arjunsuresh · 2024-05-14T15:58:41Z

The same issue can happen for Server scenario too right? But if the model is not split across multiple GPUs may be we can do a rule proposal.

arjunsuresh · 2024-05-14T16:32:42Z

If the preview submission was on say 4 accelerators and available submission is on 6 accelerators and as well as say 2 accelerators, and in both cases if the performance per accelerator is greater than that of the preview system, then I think the 4 accelerator submission may not be needed.

Also, another proposal can be doing just offline scenario (may be even open) for the same number of accelerators if a larger scale system is submitted as available.

nv-ananjappa · 2024-05-15T21:24:10Z

Instead of an amendment asking for permissions before submission, it might be worth having a change in the main rules itself to permit an available submission if both these conditions are satisfied:

Number of accelerators used in available is equal or greater than preview system
AND
Scaling of the system performance in available over preview is linear or better given the number of accelerators in both.

psyhtest · 2024-05-28T15:59:15Z

Another scenario to consider: having done a Preview submission with an old Available server (e.g. v5) equipped with new Preview accelerators, a submitter may want to do an Available submission with a newer server (e.g. v6) equipped with now Available accelerators.

mrmhodak · 2024-05-28T16:16:08Z

WG notes:
Not permitted for power
Seek WG pre-approval for any other HW/system changes

@psyhtest to draft PR

ashwin · 2024-07-10T07:48:56Z

@psyhtest @mrmhodak @mrasquinha-g The deadline is close by. Can Inference submitters assume that this rule change applies to v4.1 and thus save some effort in their preview->available submissions? What is the conclusion?

arjunsuresh · 2024-07-10T08:38:53Z

@ashwin I think it is better if the submitter requests and get a waiver from the WG for v4.1.

psyhtest added the Next Meeting Item to be discussed in the next Working Group label May 14, 2024

psyhtest assigned mrmhodak and mrasquinha-g and unassigned mrmhodak May 14, 2024

psyhtest mentioned this issue Jun 11, 2024

Clarify allowed changes in the system scale for Inference #178

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow for larger scale submissions in Inference when moving from Preview to Available #176

Allow for larger scale submissions in Inference when moving from Preview to Available #176

psyhtest commented May 14, 2024

arjunsuresh commented May 14, 2024

psyhtest commented May 14, 2024

arjunsuresh commented May 14, 2024

arjunsuresh commented May 14, 2024

nv-ananjappa commented May 15, 2024

psyhtest commented May 28, 2024

mrmhodak commented May 28, 2024

ashwin commented Jul 10, 2024

arjunsuresh commented Jul 10, 2024

Allow for larger scale submissions in Inference when moving from Preview to Available #176

Allow for larger scale submissions in Inference when moving from Preview to Available #176

Comments

psyhtest commented May 14, 2024

arjunsuresh commented May 14, 2024

psyhtest commented May 14, 2024

arjunsuresh commented May 14, 2024

arjunsuresh commented May 14, 2024

nv-ananjappa commented May 15, 2024

psyhtest commented May 28, 2024

mrmhodak commented May 28, 2024

ashwin commented Jul 10, 2024

arjunsuresh commented Jul 10, 2024