You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Setting the range of the dataset to consider to the first 1k sample. Sample indexes are computed by concatenating the webdataset shards (from the split.yaml).
This feature only works for the innermost level, i.e. not for nested metadatasets (counting for that is not clear, because nested datasets would need to be concatenated, which is not the case for training).
Currently, for get_val_dataset, there already exists a limit option, which is different: That only limits the iterations (i.e. batches) to that number, which gives different samples than setting the sample range (range is based on sample index, the iterations may interleave different portions of the dataset).
The text was updated successfully, but these errors were encountered:
This feature could offer multiple use-cases:
This would be implemented like:
Setting the range of the dataset to consider to the first 1k sample. Sample indexes are computed by concatenating the webdataset shards (from the
split.yaml
).This feature only works for the innermost level, i.e. not for nested metadatasets (counting for that is not clear, because nested datasets would need to be concatenated, which is not the case for training).
Currently, for
get_val_dataset
, there already exists alimit
option, which is different: That only limits the iterations (i.e. batches) to that number, which gives different samples than setting the sample range (range is based on sample index, the iterations may interleave different portions of the dataset).The text was updated successfully, but these errors were encountered: