Drop 'not evaluated' placeholder from dask.py #4393

ffineis · 2021-06-20T04:00:09Z

Summary

Drop 'not evaluated' placeholder string used for when the rank 0 Dask worker happens to have not received data for particular components of eval_set.

Motivation

This is a request for an improvement to the handling of multiple workers' eval_set data attributes (e.g. evals_result_) once #4392 is resolved, as opposed to using not evaluated as a placeholder for unevaluated eval_sets' missing training history.

Description

When a user is evaluating model training progress on multiple validation sets (meaning, len(eval_set) > 1) and those validation sets each comprise dask datasets with varying number of components, then it is possible for some worker(s) to be distributed only chunks of particular eval_sets. Put another way, when individual eval_sets are not the same size, there is no guarantee that every worker will be allocated parts from all individual eval_sets contained within eval_set.

The implementation of eval_set support in #4101 breaks up eval_set into smaller eval_sets that each worker reconstructs from its allocated list_of_parts. To ensure that each worker is aware that there are len(eval_set)-many original individual eval_sets, we use the None-padding technique. Here is an illustration:

Therefore, when there is variance in the list [X[0].npartitions for X, y in eval_set], it is possible that an individual eval_set is missing on a particular worker. When a worker receives all Nones for a particular component of eval_set, the corresponding value for this eval_set within best_score_ and evals_result_ is the string 'not evaluated'.

This informs the user that the "rank 0" worker that has their LightGBM estimator selected during _train was not distributed any chunks of the corresponding to the particular eval_set. In the illustration, mdl.best_score_['valid_0'] == 'not evaluated'.

This issue is closely related to #4392; if #4392 is resolved, then 'not resolved' may no longer apply (depending on the path of resolution), and in which case it can (and should!) be dropped from both dask.py and test_dask.py.

References

#4101
#4392

The text was updated successfully, but these errors were encountered:

StrikerRUS · 2021-07-12T11:14:56Z

Closed in favor of being in #2302. We decided to keep all feature requests in one place.

Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature.

ffineis mentioned this issue Jun 20, 2021

[dask] add support for eval sets and custom eval functions #4101

Merged

jameslamb added the dask label Jun 22, 2021

jameslamb mentioned this issue Jun 22, 2021

Use or return all workers eval_set evaluation data #4392

Closed

StrikerRUS mentioned this issue Jul 12, 2021

Feature Requests & Voting Hub #2302

Open

StrikerRUS added feature request help wanted labels Jul 12, 2021

StrikerRUS closed this as completed Jul 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drop 'not evaluated' placeholder from dask.py #4393

Drop 'not evaluated' placeholder from dask.py #4393

ffineis commented Jun 20, 2021

StrikerRUS commented Jul 12, 2021

Drop 'not evaluated' placeholder from dask.py #4393

Drop 'not evaluated' placeholder from dask.py #4393

Comments

ffineis commented Jun 20, 2021

Summary

Motivation

Description

References

StrikerRUS commented Jul 12, 2021