You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Drop 'not evaluated' placeholder string used for when the rank 0 Dask worker happens to have not received data for particular components of eval_set.
Motivation
This is a request for an improvement to the handling of multiple workers' eval_set data attributes (e.g. evals_result_) once #4392 is resolved, as opposed to using not evaluated as a placeholder for unevaluated eval_sets' missing training history.
Description
When a user is evaluating model training progress on multiple validation sets (meaning, len(eval_set) > 1) and those validation sets each comprise dask datasets with varying number of components, then it is possible for some worker(s) to be distributed only chunks of particular eval_sets. Put another way, when individual eval_sets are not the same size, there is no guarantee that every worker will be allocated parts from all individual eval_sets contained within eval_set.
The implementation of eval_set support in #4101 breaks up eval_set into smaller eval_sets that each worker reconstructs from its allocated list_of_parts. To ensure that each worker is aware that there are len(eval_set)-many original individual eval_sets, we use the None-padding technique. Here is an illustration:
Therefore, when there is variance in the list [X[0].npartitions for X, y in eval_set], it is possible that an individual eval_set is missing on a particular worker. When a worker receives all Nones for a particular component of eval_set, the corresponding value for this eval_set within best_score_ and evals_result_ is the string 'not evaluated'.
This informs the user that the "rank 0" worker that has their LightGBM estimator selected during _train was not distributed any chunks of the corresponding to the particular eval_set. In the illustration, mdl.best_score_['valid_0'] == 'not evaluated'.
This issue is closely related to #4392; if #4392 is resolved, then 'not resolved' may no longer apply (depending on the path of resolution), and in which case it can (and should!) be dropped from both dask.py and test_dask.py.
Closed in favor of being in #2302. We decided to keep all feature requests in one place.
Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature.
Summary
Drop
'not evaluated'
placeholder string used for when the rank 0 Dask worker happens to have not received data for particular components ofeval_set
.Motivation
This is a request for an improvement to the handling of multiple workers'
eval_set
data attributes (e.g.evals_result_
) once #4392 is resolved, as opposed to usingnot evaluated
as a placeholder for unevaluated eval_sets' missing training history.Description
When a user is evaluating model training progress on multiple validation sets (meaning,
len(eval_set) > 1
) and those validation sets each comprise dask datasets with varying number of components, then it is possible for some worker(s) to be distributed only chunks of particular eval_sets. Put another way, when individual eval_sets are not the same size, there is no guarantee that every worker will be allocated parts from all individual eval_sets contained withineval_set
.The implementation of
eval_set
support in #4101 breaks upeval_set
into smallereval_set
s that each worker reconstructs from its allocatedlist_of_parts
. To ensure that each worker is aware that there arelen(eval_set)
-many original individual eval_sets, we use theNone
-padding technique. Here is an illustration:Therefore, when there is variance in the list
[X[0].npartitions for X, y in eval_set]
, it is possible that an individual eval_set is missing on a particular worker. When a worker receives allNone
s for a particular component ofeval_set
, the corresponding value for this eval_set withinbest_score_
andevals_result_
is the string 'not evaluated'.This informs the user that the "rank 0" worker that has their LightGBM estimator selected during
_train
was not distributed any chunks of the corresponding to the particular eval_set. In the illustration,mdl.best_score_['valid_0'] == 'not evaluated'
.This issue is closely related to #4392; if #4392 is resolved, then 'not resolved' may no longer apply (depending on the path of resolution), and in which case it can (and should!) be dropped from both
dask.py
andtest_dask.py
.References
#4101
#4392
The text was updated successfully, but these errors were encountered: