Skip to content

Commit

Permalink
Fix for Incorrect ex_iterable used with multi num_worker (#6582)
Browse files Browse the repository at this point in the history
Corrects an issue where `self._ex_iterable` was erroneously used instead of `ex_iterable`, when both Distributed Data Parallel (DDP) and multi num_worker are used concurrently. This improper usage led to the generation of incorrect `shards_indices`, subsequently causing issues with the control flow responsible for worker creation. The fix ensures the appropriate iterable is used, thus providing a more accurate determination of whether a new worker should be instantiated or not.
  • Loading branch information
kq-chen authored Mar 1, 2024
1 parent e5406f9 commit 31ae21f
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/datasets/iterable_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -1275,7 +1275,7 @@ def _iter_pytorch(self):
)
# split workload
_log_prefix = f"node#{self._distributed.rank} " if self._distributed else ""
shards_indices = self._ex_iterable.split_shard_indices_by_worker(worker_info.id, worker_info.num_workers)
shards_indices = ex_iterable.split_shard_indices_by_worker(worker_info.id, worker_info.num_workers)
if shards_indices:
logger.debug(
f"{_log_prefix}dataloader worker#{worker_info.id}, ': Starting to iterate over {len(shards_indices)}/{ex_iterable.n_shards} shards."
Expand Down

0 comments on commit 31ae21f

Please sign in to comment.