-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IterableDataset and Dataset return different batch sizes when using Trainer with multiple GPUs #5506
Comments
Hi ! Also we recently released if use_iterable_dataset:
num_shards = 100
dataset = dataset.to_iterable_dataset(num_shards=num_shards) |
This is the full set of training args passed. No training args were changed when switching dataset types. training_args = TrainingArguments(
output_dir="./checkpoints",
overwrite_output_dir=True,
num_train_epochs=1,
per_device_train_batch_size=256,
save_steps=2000,
save_total_limit=4,
prediction_loss_only=True,
report_to='none',
gradient_accumulation_steps=6,
fp16=True,
max_steps=60000,
lr_scheduler_type='linear',
warmup_ratio=0.1,
logging_steps=100,
weight_decay=0.01,
adam_beta1=0.9,
adam_beta2=0.98,
adam_epsilon=1e-6,
learning_rate=1e-4
) |
I think the issue comes from |
Makes sense. Given that it's a |
Describe the bug
I am training a Roberta model using 2 GPUs and the
Trainer
API with a batch size of 256.Initially I used a standard
Dataset
, but had issues with slow data loading. After reading this issue, I swapped to loading my dataset as contiguous shards and passing those to anIterableDataset
. I observed an unexpected drop in GPU memory utilization, and found the batch size returned from the model had been cut in half.When using
Trainer
with 2 GPUs and a batch size of 256,Dataset
returns a batch of size 512 (256 per GPU), whileIterableDataset
returns a batch size of 256 (256 total). My guess isIterableDataset
isn't accounting for multiple cards.Steps to reproduce the bug
Expected behavior
Expected
Dataset
andIterableDataset
to have the same batch size behavior. If the current behavior is intentional, the batch size printout at the start of training should be updated. Currently, both dataset classes result inTrainer
printing the same total batch size, even though the batch size sent to the GPUs are different.Environment info
datasets 2.7.1
transformers 4.25.1
The text was updated successfully, but these errors were encountered: