AttributeError: 'Accelerator' object has no attribute 'gather_for_metrics' #854

Arij-Aladel · 2022-11-15T17:26:04Z

System Info

python 3.9


accelerate.yaml     file 

> compute_environment: LOCAL_MACHINE
> deepspeed_config: {}
> distributed_type: MULTI_GPU
> fsdp_config: {}
> machine_rank: 0
> main_process_ip: null
> main_process_port: null
> main_training_function: main
> mixed_precision: 'no'
> num_machines: 1
> num_processes: 10
> use_cpu: false

the rest as in [requirements](https://github.com/huggingface/transformers/blob/main/examples/pytorch/summarization/requirements.txt) file

Information

The official example scripts
My own modified scripts

Tasks

One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)

Reproduction

Go to run_summarization_no_trainer.py

run

accelerate launch --config_file='./accelerate.yaml' run_summarization_notrainer.py --seed=42 --preprocessing_num_workers=1 --weight_decay='0.001' --output_dir="draft/" --per_device_train_batch_size=4 --per_device_eval_batch_size=8 --dataset_name="cnn_dailymail" --dataset_config "3.0.0" --num_train_epochs=10 --model_name_or_path 't5-small'

Expected behavior

run the script normally.

The error I got explained in this [issue](https://github.com/huggingface/transformers/issues/18189)

The text was updated successfully, but these errors were encountered:

muellerzr · 2022-11-15T17:29:01Z

@Arij-Aladel what version of accelerate do you have? (pip show accelerate) as it may be quite old by the looks of your accelerate env report. I'd recommend pip install accelerate -U potentially, as gather_for_metrics has been around since v0.12.0 in August

Arij-Aladel · 2022-11-15T17:29:47Z

the last one I guess it is exactly as requirements file of the example

muellerzr · 2022-11-15T17:31:02Z

The requirements file doesn't have a version specified, so it won't install a new version on the system if it exists. Can you please show the output of pip show accelerate? 😃

Arij-Aladel · 2022-11-15T17:31:54Z

Accelerate version: 0.12.0.dev0

Platform: Linux-5.15.0-52-generic-x86_64-with-glibc2.31

Python version: 3.9.13

Numpy version: 1.22.3

PyTorch version (GPU?): 1.12.0 (True)

Accelerate default config:
- compute_environment: LOCAL_MACHINE
- distributed_type: FSDP
- mixed_precision: no
- use_cpu: False
- num_processes: 1
- machine_rank: 0
- num_machines: 1
- main_process_ip: None
- main_process_port: None
- main_training_function: main
- deepspeed_config: {}
- fsdp_config: {'fsdp_auto_wrap_policy': 'TRANSFORMER_BASED_WRAP', 'fsdp_backward_prefetch_policy': 'BACKWARD_PRE', 'offload_params': False, 'sharding_strategy': 1, 'transformer_layer_cls_to_wrap': ''}

muellerzr · 2022-11-15T17:33:22Z

Since it's running dev, it may be a commit before gather_for_metrics was added. I'd recommend doing pip install accelerate==0.12.0 --force-reinstall --no-deps so you get the fully released v0.12.0 version or do -U to upgrade 😃

Arij-Aladel · 2022-11-15T19:07:05Z

Ok now another issue at this line I am getting the error

10%|███████████████████████ | 7178/71780 [32:54<4:56:14, 3.63it/s]
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 9826) of binary: /home/arij/anaconda3/envs/sum/bin/python
Traceback (most recent call last):
File "/home/arij/anaconda3/envs/sum/bin/torchrun", line 33, in
sys.exit(load_entry_point('torch==1.12.0', 'console_scripts', 'torchrun')())
File "/home/arij/anaconda3/envs/sum/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 345, in wrapper
return f(*args, **kwargs)
File "/home/arij/anaconda3/envs/sum/lib/python3.9/site-packages/torch/distributed/run.py", line 761, in main
run(args)
File "/home/arij/anaconda3/envs/sum/lib/python3.9/site-packages/torch/distributed/run.py", line 752, in run
elastic_launch(
File "/home/arij/anaconda3/envs/sum/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/arij/anaconda3/envs/sum/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

This is because after gathering on this line we get emplty lists

muellerzr · 2022-11-15T19:14:48Z

Can you open a separate issue on the transformers repo for this and @ mention me? Thanks!

Arij-Aladel · 2022-11-16T08:15:55Z

@muellerzr I have already opened it and mention this issue I will mention you there too.

muellerzr · 2022-11-18T15:34:42Z

@Arij-Aladel is it safe to say this can be closed now? 😄

muellerzr mentioned this issue Nov 15, 2022

Update reqs to include min gather_for_metrics Accelerate version huggingface/transformers#20242

Merged

5 tasks

Arij-Aladel closed this as completed Nov 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError: 'Accelerator' object has no attribute 'gather_for_metrics' #854

AttributeError: 'Accelerator' object has no attribute 'gather_for_metrics' #854

Arij-Aladel commented Nov 15, 2022 •

edited

Loading

muellerzr commented Nov 15, 2022

Arij-Aladel commented Nov 15, 2022

muellerzr commented Nov 15, 2022

Arij-Aladel commented Nov 15, 2022

muellerzr commented Nov 15, 2022

Arij-Aladel commented Nov 15, 2022

muellerzr commented Nov 15, 2022

Arij-Aladel commented Nov 16, 2022 •

edited

Loading

muellerzr commented Nov 18, 2022

AttributeError: 'Accelerator' object has no attribute 'gather_for_metrics' #854

AttributeError: 'Accelerator' object has no attribute 'gather_for_metrics' #854

Comments

Arij-Aladel commented Nov 15, 2022 • edited Loading

System Info

Information

Tasks

Reproduction

Expected behavior

muellerzr commented Nov 15, 2022

Arij-Aladel commented Nov 15, 2022

muellerzr commented Nov 15, 2022

Arij-Aladel commented Nov 15, 2022

muellerzr commented Nov 15, 2022

Arij-Aladel commented Nov 15, 2022

muellerzr commented Nov 15, 2022

Arij-Aladel commented Nov 16, 2022 • edited Loading

muellerzr commented Nov 18, 2022

Arij-Aladel commented Nov 15, 2022 •

edited

Loading

Arij-Aladel commented Nov 16, 2022 •

edited

Loading