07 Jun 19:33

sgugger

9765b84

v0.20.0: MPS and fp4 support on Big Model Inference, 4-bit QLoRA, Intel GPU, Distributed Inference, and much more!

Big model inference

Support has been added to run device_map="auto" on the MPS device. Big model inference also work with models loaded in 4 bits in Transformers.

Add mps support to big inference modeling by @SunMarc in #1545
Adds fp4 support for model dispatching by @younesbelkada in #1505

4-bit QLoRA Support

4-bit QLoRA via bitsandbytes (4-bit base model + LoRA) by @TimDettmers in #1458

Distributed Inference Utilities

This version introduces a new Accelerator.split_between_processes utility to help with performing distributed infernece with non-tensorized or non-dataloader workflows. Read more here

Introduce XPU support for Intel GPU

Intel GPU support initialization by @abhilash1910 in #1118

Add support for the new PyTorch XLA TPU runtime

Accelerate now supports the latest TPU runtimes #1393, #1385

A new optimizer method: `LocalSGD`

This is a new wrapper around SGD which enables efficient multi-GPU training in the case when no fast interconnect is possible by @searchivarius in #1378

Papers with 🤗 Accelerate

We now have an entire section of the docs dedicated to official paper implementations and citations using the framework #1399, see it live here

Breaking changes

logging_dir has been fully deprecated, please use project_dir or a Project_configuration

What's new?

use existing mlflow experiment if exists by @Rusteam in #1403
changes required for DS integration by @pacman100 in #1406
fix deepspeed failing tests by @pacman100 in #1411
Make mlflow logging dir optional by @mattplo-decath in #1413
Fix bug on ipex for diffusers by @abhilash1910 in #1426
Improve Slack Updater by @muellerzr in #1433
Let quality yell at the user if it's a version difference by @muellerzr in #1438
Ensure that it gets installed by @muellerzr in #1439
[core] Introducing CustomDtype enum for custom dtypes by @younesbelkada in #1434
Fix XPU by @muellerzr in #1440
Make sure torch compiled model can also be unwrapped by @patrickvonplaten in #1437
fixed: ZeroDivisionError: division by zero by @sreio in #1436
fix potential OOM when resuming with multi-GPU training by @exhyy in #1444
Fixes in infer_auto_device_map by @sgugger in #1441
Raise error when logging improperly by @muellerzr in #1446
Fix ci by @muellerzr in #1447
Distributed prompting/inference utility by @muellerzr in #1410
Add to by @muellerzr in #1448
split_between_processes by @stevhliu in #1449
[docs] Replace state.rank -> process_index by @pcuenca in #1450
Auto multigpu logic by @muellerzr in #1452
Update with cli instructions by @muellerzr in #1453
Adds in_order argument that defaults to False, to log in order. by @JulesGM in #1262
fix error for CPU DDP using trainer api. by @sywangyi in #1455
Refactor and simplify xpu device in state by @abhilash1910 in #1456
Document how to use commands with python module instead of argparse by @muellerzr in #1457
4-bit QLoRA via bitsandbytes (4-bit base model + LoRA) by @TimDettmers in #1458
Fix skip first batch being perminant by @muellerzr in #1466
update conversion of layers to retain original data type. by @avisinghal6 in #1467
Check for xpu specifically by @muellerzr in #1472
update register_empty_buffer to match torch args by @NouamaneTazi in #1465
Update gradient accumulation docs, and remove redundant example by @iantbutler01 in #1461
Imrpove sagemaker by @muellerzr in #1470
Split tensors as part of split_between_processes by @muellerzr in #1477
Move to device by @muellerzr in #1478
Fix gradient state bugs in multiple dataloader by @Ethan-yt in #1483
Add rdzv-backend by @muellerzr in #1490
Only use IPEX if available by @muellerzr in #1495
Update README.md by @lyhue1991 in #1493
Let gather_for_metrics always run by @muellerzr in #1496
Use empty like when we only need to create buffers by @thomasw21 in #1497
Allow key skipping in big model inference by @sgugger in #1491
fix crash when ipex is installed and torch has no xpu by @sywangyi in #1502
[bnb] Add fp4 support for dispatch by @younesbelkada in #1505
Fix 4bit model on multiple devices by @SunMarc in #1506
adjust overriding of model's forward function by @prathikr in #1492
Add assertion when call prepare with deepspeed config. by @tensimiku in #1468
NVME path support for deepspeed by @abhilash1910 in #1484
should set correct dtype to ipex optimize and use amp logic in native… by @sywangyi in #1511
Swap env vars for XPU and IPEX + CLI by @muellerzr in #1513
Fix a bug when parameters tied belong to the same module by @sgugger in #1514
Fixup deepspeed/cli tests by @muellerzr in #1526
Refactor mp into its own wrapper by @muellerzr in #1527
Check tied parameters by @SunMarc in #1529
Raise ValueError on iterable dataset if we've hit the end and attempting to go beyond it by @muellerzr in #1531
Officially support naive PP for quantized models + PEFT by @younesbelkada in #1523
remove ipexplugin, let ACCELERATE_USE_IPEX/ACCELERATE_USE_XPU control the ipex and xpu by @sywangyi in #1503
Prevent using extra VRAM for static device_map by @LSerranoPEReN in #1536
Update deepspeed.mdx by @LiamSwayne in #1541
Update performance.mdx by @LiamSwayne in #1543
Update deferring_execution.mdx by @LiamSwayne in #1544
Apply deprecations by @muellerzr in #1537
Add mps support to big inference modeling by @SunMarc in #1545
[documentation] grammar fixes in gradient_synchronization.mdx by @LiamSwayne in #1547
Eval mode by @muellerzr in #1540
Update migration.mdx by @LiamSwayne in #1549

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@will-cromar
- Support TPU v4 with new PyTorch/XLA TPU runtime (#1393)
- Support TPU v2 and v3 on new PyTorch/XLA TPU runtime (#1385)
@searchivarius
- Adding support for local SGD. (#1378)
@abhilash1910
- Intel GPU support initialization (#1118)
- Fix bug on ipex for diffusers (#1426)
- Refactor and simplify xpu device in state (#1456)
- NVME path support for deepspeed (#1484)
@sywangyi
- fix error for CPU DDP using trainer api. (#1455)
- fix crash when ipex is installed and torch has no xpu (#1502)
- should set correct dtype to ipex optimize and use amp logic in native… (#1511)
- remove ipexplugin, let ACCELERATE_USE_IPEX/ACCELERATE_USE_XPU control the ipex and xpu (#1503)
@Ethan-yt
- Fix gradient state bugs in multiple dataloader (#1483)

Contributors

searchivarius, pcuenca, and 26 other contributors

Assets 2

08 May 12:38

muellerzr

v0.19.0

baebae3

v0.19.0: IPEX Support, Foundations for Transformers Integration, FP8 for Ada Lovelace GPUs, and Squashed Bugs

What's New

Support for Intel IPEX support has been added, check out the how-to guide now!
Various modifications have been added to begin work on having 🤗 Accelerate be the foundation for the Trainer, keep an eye on the repos to see how our progress is coming along!
FP8 training is now supported on Ada Lovelance GPUs
The wandb integration now supports logging of images and tables through tracker.log_images and tracker.log_tables respectively
Many, many squashed bugs! (see the full detailed report for just what they were)
17 new contributors to the framework, congratulations to all who took their first step! 🚀

What's Changed

Fix pypi image by @muellerzr in #1249
raise error when dataloader with None as batch_size when using DS by @pacman100 in #1250
Handle missing deepspeed config by @HeyangQin in #1251
[core] Add Quantization support for dispatch_model by @younesbelkada in #1237
Check attribute 'overflow' exists in optimizer. by @tensimiku in #1259
ipex intel extension for pytorch integration by @sywangyi in #1255
fix issue template by @stas00 in #1264
Change error raised to ValueError by @sgugger in #1267
Fix reduce operation by @xyfJASON in #1268
Raise import error if fp8 not available in has_transfomer_engine_layers by @muellerzr in #1283
Add missing FP8 options to CLI by @muellerzr in #1284
Update quicktour.mdx by @StandardAI in #1273
Minor fix whitespace colon by @guspan-tanadi in #1272
fix attribute error in DataloaderShared by @ZhiyuanChen in #1278
Fix TypeError bug in honor_type by @muellerzr in #1285
Raise more explicit error when transformer_engine isn't installed by @muellerzr in #1287
Expound error on recursively_apply by @muellerzr in #1286
Only check for dtype if it has it in get_state_dict by @muellerzr in #1288
[bnb] fix bnb slow test by @younesbelkada in #1292
Raise better error on notebook_launcher by @muellerzr in #1293
Make note about grad accum and prec in performance documentation by @muellerzr in #1296
fix for load_checkpoint_and_dispatch(device_map=None) by @anentropic in #1297
Set the state device dependant to Accelerator on multigpu by @muellerzr in #1220
add usage guide for ipex plugin by @sywangyi in #1270
Simplify MPS implementation by @sgugger in #1308
Bug fix in setattr by @aashiqmuhamed in #1312
Allow xpu backend by @muellerzr in #1313
Default to nccl by @muellerzr in #1314
offload the previous module hook before the current module is moved to… by @williamberman in #1315
Ensure that dynamo is compatible with mixed precision by @muellerzr in #1318
Upgrade torch version on main tests by @muellerzr in #1323
Add test flag and import check for dynamo by @muellerzr in #1322
ensure module prefixes only match that module by @xloem in #1319
remove repetitive entries from device lists by @xloem in #1321
Fix failing test on main by @muellerzr in #1332
Verbosity, Progress Bar for Loading by @xloem in #1329
Skip failing torch 2.0+ test by @muellerzr in #1339
Remove unused amp import util by @muellerzr in #1340
Fix nested context manager for main_process_first() by @flukeskywalker in #1304
Small progress bar fix by @xloem in #1341
Pop more backend options by @muellerzr in #1342
Support FP8 mixed precision training for Ada Lovelace GPUs by @Dango233 in #1348
using deepspeed.comm for distrbiuted init by @pacman100 in #1352
[bnb] Fix bnb slow test by @younesbelkada in #1355
Better check for packages availability by @apbard in #1356
fix: typing issues, and replace deprecated python typing (Optional, Union) to | by @kiyoon in #1363
Fix default FSDP_MIN_NUM_PARAMS so it's an int by @sam-hieken in #1367
Special transformers case from args by @muellerzr in #1364
Improve accelerate env reporting by @muellerzr in #1376
Seperate out contextmanager generation by @muellerzr in #1379
delete textfile after tests are done by @muellerzr in #1381
Fix flakey thread issue by @muellerzr in #1387
fix config bug for 'mixed_precision' from 'yaml.safe_load()' by @ys-eric-choi in #1386
Log Images and other types to wandb by @tcapelle in #962
Bump torch version by @muellerzr in #1392
Fix gather_obj by @muellerzr in #1391
Update training_zoo.mdx by @yuvalkirstain in #1397

New Contributors

@HeyangQin made their first contribution in #1251
@tensimiku made their first contribution in #1259
@xyfJASON made their first contribution in #1268
@StandardAI made their first contribution in #1273
@guspan-tanadi made their first contribution in #1272
@anentropic made their first contribution in #1297
@aashiqmuhamed made their first contribution in #1312
@williamberman made their first contribution in #1315
@xloem made their first contribution in #1319
@flukeskywalker made their first contribution in #1304
@Dango233 made their first contribution in #1348
@apbard made their first contribution in #1356
@kiyoon made their first contribution in #1363
@sam-hieken made their first contribution in #1367
@ys-eric-choi made their first contribution in #1386
@tcapelle made their first contribution in #962
@yuvalkirstain made their first contribution in #1397

Full Changelog: v0.18.0...v0.19.0

Contributors

anentropic, xloem, and 22 other contributors

Assets 2

24 Mar 14:52

muellerzr

v0.18.0

ecd1288

v0.18.0: GradientState enhancements and Big Model Inference Fixes

What's Changed

A new GradientAccumulationPlugin has been added to handle more configurations with the GradientState. Specifically you can optionally disable having Accelerate automatically adjust the length of the scheduler relative to gradient accumulation steps through it. Otherwise Accelerate will now automatically handle ensuring that the schedulers built for non-gradient accumulation will work during gradient accumulation
Some fixes related to the launch configuration and TPU launches were adjusted, and the dynamo_backend warning has been silenced.
Big model inference saw a number of fixes related to linear layers, drop_last on linear layers, tied weight loading, and handling of multiple tied parameters
A new integration example with RunhouseML has been added, read more here: https://github.com/huggingface/accelerate/tree/main/examples#simple-multi-gpu-hardware-launcher

Breaking Changes

find_tied_parameters now deals with groups of tied parameters (instead of only pairs of them). As a result it now returns a list of list of strings instead of a dictionary.

What's New?

Add documentation around FSDP state dict save behavior by @VikParuchuri in #1181
add use_orig_params to FullyShardedDataParallelPlugin by @pacman100 in #1184
Only convert linear layers with weights multiple of 16 by @sgugger in #1188
Set drop last to ensure modulo16 restriction for fp8 by @ksivaman in #1189
Accelerator should not call to on modules that wraps accelerate loaded models by @younesbelkada in #1172
Fixup passing overlapping args to the script by @muellerzr in #1198
Make the Scheduler adjust the steps taken relative to the gradient accumulation steps by @muellerzr in #1187
Fix tied weights load by @sgugger in #1204
Better error message when using multi-GPU and Accelerate on torch <1.9.1 by @muellerzr in #1203
Fix typo in TPU config by @muellerzr in #1202
Fix example in accumulate method documentation by @VikParuchuri in #1211
ds offload optim fix to use CPUAdam by @pacman100 in #1208
Move when the GradientState test is performed so that it is not None by @muellerzr in #1219
Fix bug in loading launch config by @neumyor in #1218
Fix get_logger kwarg documentation issue by @bcol23 in #1222
docs: add finetuner to ppl who use accelerate by @bwanglzu in #1224
Silence dynamo_backend by @muellerzr in #1226
Add additional check when patching env by @Chris-hughes10 in #1229
Make grad accum steps mutable on the Accelerator object by @muellerzr in #1233
devcontainer: "extensions" has been removed and replaced by customizations by @dbpprt in #1075
remove empty dicts while saving accelerate config by @pacman100 in #1236
backfill ds plugin attributes when using ds_config by @pacman100 in #1235
Change multinode to multigpu in notebook tutorial by @muellerzr in #1247
Hardware Auto-Setup Example/Tutorial for Distributed Launch by @carolineechen in #1227
Handle multiple tied parameters by @sgugger in #1241

New Contributors

@hackpert made their first contribution in #1180
@VikParuchuri made their first contribution in #1181
@ksivaman made their first contribution in #1189
@neumyor made their first contribution in #1218
@bcol23 made their first contribution in #1222
@bwanglzu made their first contribution in #1224
@carolineechen made their first contribution in #1227

Full Changelog: v0.17.1...v0.18.0

Contributors

VikParuchuri, hackpert, and 11 other contributors

Assets 2

13 Mar 21:02

sgugger

v0.17.1

c266cf0

v0.17.1: Patch release

Fix CPU error always being raised by @muellerzr in #1175
fixed typo in launch.py tpu_pod_launcher by @hackpert in #1180
Support special mapping of dtypes when preparing device map by @sgugger in #1179

Contributors

hackpert, muellerzr, and sgugger

Assets 2

09 Mar 18:22

sgugger

v0.17.0

1a63f7d

v0.17.0: PyTorch 2.0 support, Process Control Enhancements, TPU pod support and FP8 mixed precision training

PyTorch 2.0 support

This release fully supports the upcoming PyTorch 2.0 release. You can choose to use torch.compile or not and then customize the options in accelerate.config or via a TorchDynamoPlugin.

update support for torch dynamo compile by @pacman100 in #1150

Process Control Enhancements

This release adds a new PartialState, which contains most of the capabilities of the AcceleratorState however it is designed to be used by the user to assist in any process control mechanisms around it. With this, users also now do not need to have if accelerator.state.is_main_process when utilizing classes such as the Tracking API, as these now will automatically use only the main process for their work by default.

Refactor process executors to be in AcceleratorState by @muellerzr in #1039

TPU Pod Support (Experimental)

Launching from TPU pods is now supported, please see this issue for more information

Introduce TPU Pod launching to accelerate launch by @muellerzr in #1049

FP8 mixed precision training (Experimental)

This release adds experimental support for FP8 mixed precision training, which requires the transformer-engine library as well as a Hopper GPU (or higher).

Fp8 integration by @sgugger in #1086

What's new?

v0.17.0.dev0 by @sgugger (direct commit on main)
Deepspeed param check by @dhar174 in #1015
enabling mps device by default and removing related config by @pacman100 in #1030
fix: links to gradient synchronization by @prassanna-ravishankar in #1035
do not scale gradient in bf16 mode by @kashif in #1036
Pass keywords arguments of backward function deeper to DeepSpeed by @DistinctVision in #1037
Add daily slack notifier for nightlies by @muellerzr in #1042
Make sure direct parameters are properly set on device by @sgugger in #1043
Add cpu_offload_with_hook by @sgugger in #1045
Update quality tools to 2023 by @sgugger in #1046
Load tensors directly on device by @sgugger in #1028
Fix cpu_offload_with_hook code snippet by @pcuenca in #1047
Use create_task by @muellerzr in #1052
Fix args by adding in the defaults by @muellerzr in #1053
deepspeed hidden_size auto value default fixes by @pacman100 in #1060
Introduce PartialState by @muellerzr in #1055
Flag for deprecation by @muellerzr in #1061
Try with this by @muellerzr in #1062
Update integrations by @muellerzr in #1063
Swap utils over to use PartialState by @muellerzr in #1065
update fsdp docs and removing deepspeed version pinning by @pacman100 in #1059
Fix/implement process-execution decorators on the Accelerator by @muellerzr in #1070
Refactor state and make PartialState first class citizen by @muellerzr in #1071
Add error if passed --config_file does not exist by @muellerzr in #1074
SageMaker image_uri is now optional by @ in #1077
Allow custom SageMaker Estimator arguments by @ in #1080
Fix tpu_cluster arg by @muellerzr in #1081
Update complete_cv_example.py by @fcossio in #1082
Added SageMaker local mode config section by @ in #1084
Fix config by @muellerzr in #1090
adds missing "lfs" in pull by @CSchoel in #1091
add multi_cpu support to reduce by @alex-hh in #1094
Update README.md by @BM-K in #1100
Tracker rewrite and lazy process checker by @muellerzr in #1079
Update performance.mdx by @fcossio in #1107
Attempt to unwrap tracker. by @pcuenca in #1109
TensorBoardTracker: wrong arg def by @stas00 in #1111
Actually raise if exception by @muellerzr in #1124
Add test for ops and fix reduce by @muellerzr in #1122
Deep merge SageMaker additional_args, allowing more flexible configuration and env variable support by @dbpprt in #1113
Move dynamo.optimize to the end of model preparation by @ymwangg in #1128
Refactor launch for greater extensibility by @Yard1 in #1123
[Big model loading] Correct GPU only loading by @patrickvonplaten in #1121
Add tee and role to launch by @muellerzr in #1132
Expand warning and grab all GPUs available by default by @muellerzr in #1134
Fix multinode with GPU ids when each node has 1 by @muellerzr in #1127
deepspeed dataloader prepare fix by @pacman100 in #1126
fix ds dist init kwargs issue by @pacman100 in #1138
fix lr scheduler issue by @pacman100 in #1140
fsdp bf16 enable autocast by @pacman100 in #1125
Fix notebook_launcher by @muellerzr in #1141
fix partial state by @pacman100 in #1144
FSDP enhancements and fixes by @pacman100 in #1145
Fixed typos in notebook by @SamuelLarkin in #1146
Include a note in the gradient synchronization docs on "what can go wrong" and show the timings by @muellerzr in #1153
[Safetensors] Relax missing metadata constraint by @patrickvonplaten in #1151
Solve arrow keys being environment dependant for accelerate config by @p1atdev (direct commit on main)
Load custom state to cpu by @Guangxuan-Xiao in #1156
📝 add a couple more trackers to the docs by @nateraw in #1158
Let GradientState know active dataloaders and reset the remainder by @muellerzr in #1162
Attempt to fix import error when PyTorch is build without torch.distributed module by @mfuntowicz in #1108
[Accelerator] Fix issue with 8bit models by @younesbelkada in #1155
Document skip_first_batches in the checkpoint usage guides by @muellerzr in #1164
Fix what files get deleted through total_limit by @muellerzr in #1165
Remove outdated command directions and use in tests by @muellerzr in #1166

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@Yard1
- Refactor launch for greater extensibility (#1123)

Contributors

kashif, pcuenca, and 21 other contributors

Assets 2

31 Jan 19:53

sgugger

v0.16.0

36beea9

v0.16.0: Improved and Interactive Documentation, DataLoader Improvements

New code exploration doc tool

A new interactive tool has been introduced to the documentation to help users quickly learn how to utilize features of the framework before providing more details on them as shown below:

Not only does it provide a code diff, but it also includes an explanation and links to more resources the user should check out to learn more:

Try it out today in the docs

Add in code exploration tool to docs by @muellerzr in #1014
Light vs dark theme based on pick by @muellerzr in #1023

Skip batches in dataloaders

When resuming training, you can more efficiently skip batches in your dataloader with the new skip_first_batches function (also available as a method on your Accelerator).

Efficiently skip batches in a dataloader by @sgugger in #1002

DeepSpeed integration enhancements:

A new ZeRO-3 init context manager is added to provide granular control to users in situations involving nested/multiple models. Refactoring of DeepSpeed Config file support to remove ambiguity between it and Accelerate config.

Adding support for auto entries in the DeeSpeed config file to be filled via the accelerate launch command. Try it out today by referring to the section Things to note when using DeepSpeed Config File

ds zero-3 init context manager by @pacman100 in #932
raise error for duplicate accelerate config values when using deepspeed_config_file by @pacman100 in #941

What's new?

Flag to silence subprocess.CalledProcessError in launch by @Cyberes in #902
Add usage examples by @muellerzr in #904
Expand sanity checks by @muellerzr in #905
Fix conditional by @muellerzr in #907
fix issue that amp bf16 does not work for cpu in env with cuda. by @sywangyi in #906
fsdp enhancements by @pacman100 in #911
Fix typos accelerate -> accelerator by @pcuenca in #915
🚨🚨🚨 Act on deprecations 🚨🚨🚨 by @muellerzr in #917
fix accelerate test failure with cpu config by @sywangyi in #909
Introduce project_dir and limit the number of saved checkpoints by @muellerzr in #916
Specify inference by @muellerzr in #921
Support init_on_device by @thomasw21 in #926
ds-z3-init and prepending ds env variables with ACCELERATE_ by @pacman100 in #928
Honor model dtype in load_checkpoint by @sgugger in #920
ds zero-3 init context manager by @pacman100 in #932
Fix silly typo by @tornikeo in #939
add mixed_precision_type property to AcceleratorState by @pacman100 in #935
fix batch size in prepare_dataloader for iterable datasets by @sanderland in #937
fix mp related test fails by @pacman100 in #943
Fix tracker by @muellerzr in #942
Fix offload when weights are on the GPU by @sgugger in #945
raise error for duplicate accelerate config values when using deepspeed_config_file by @pacman100 in #941
Add is_initialized method and refactor by @muellerzr in #949
Fix DeepSpeed tests by @muellerzr in #950
Don't automatically offload buffers when loading checkpoints by @sgugger in #951
Typo fix in src/accelerate/utils/modeling.py by @ryderwishart in #955
support master port when using ds multi-node launcher by @pacman100 in #959
Allowing encoded configuration for DeepSpeed by @cli99 in #895
Update README.md by @Don9wanKim in #968
Raise minimum version for distrib launch by @muellerzr in #978
Fix tied parameters test in big model inference by @sgugger in #979
Fix type error on line 36 by @dhar174 in #981
Ensure that last batch doesn't get dropped if perfectly even in gather_for_metrics by @muellerzr in #982
Skip wandb test for now by @muellerzr in #984
Fix test for converting tensor to proper dtype by @sgugger in #983
in sync with trfs, removing style_doc utils and using doc-builder instead by @pacman100 in #988
Add new release_memory util by @muellerzr in #990
adding support for kwargs in load_state by @pacman100 in #989
Fix scheduler incorrect steps when gradient accumulation enabled by @markovalexander in #999
Fix parameters tying in dispatch_model by @sgugger in #1000
improve deepspeed notes by @stas00 in #1003
Update toctree by @muellerzr in #1008
Add styleguide by @muellerzr in #1007
Maintain accumulation steps by @muellerzr in #1011
Saving and loading state hooks by @patrickvonplaten in #991
Fix test introduced in PR and introduce AcceleratorTestCase by @muellerzr in #1016
Allow the torch device to be set with an env var by @Yard1 in #1009
Fix import of LrScheduler by @sgugger in #1017
Don't force mixed precision as no in examples by @sgugger in #1018
Include steppage in performance docs by @muellerzr in #1013
Fix env var by @muellerzr in #1024
Change default for keep_fp32_wrapper by @muellerzr in #1025
Fix slow test by keeping tied weights on the same GPU by @sgugger in #1026
Start of adding examples by @muellerzr in #1001
More improvements to docstrings + examples by @muellerzr in #1010
With example by @muellerzr in #1027
sagemaker launcher fixes by @pacman100 in #1031

Contributors

pcuenca, dhar174, and 15 other contributors

Assets 2

02 Dec 16:03

sgugger

v0.15.0

cf22df9

v0.15.0: Pytorch 2.0 stack support

PyTorch 2.0 stack support

We are very excited by the newly announced PyTorch 2.0 stack and you can try it using Accelerate on any model by using the dynamo_backend argument of the Accelerator, or when filling your config with accelerate config.

Note that to get the best performance, we recommend:

using an Ampere GPU (or more recent)
sticking to fixed shaped for now

Add support for torch dynamo by @sgugger in #829

New CLI commands

Added two new commands, accelerate config update and accelerate config default. The first will update a config file to have the latest keys added from latter releases of Accelerate, and the second will create a default configuration file automatically mimicking write_default_config() introduced in #851 and #853 by @muellerzr
Also introduced a filterable help for accelerate launch which will show options relevant to the choices shown, such as accelerate launch --multi_gpu will show launch parameters relevant to multi-gpu training.

What's new?

fix 🐛 by @pacman100 in #836
Deepspeed example should use gather_for_metrics by @HammadB in #821
Highlight selection with pretty colors by @muellerzr in #839
Add join_uneven_inputs context manager to Accelerator by @Chris-hughes10 in #820
Introduce default-config command by @muellerzr in #840
Fix log error and add log level to get_logger by @muellerzr in #842
Fix if/else by @muellerzr in #849
Fix complete_cv example by @muellerzr in #848
Refactor Accelerate config and introduce a multi-argument CLI interface by @muellerzr in #851
Clean up, add update command by @muellerzr in #853
Revert "Update pr docs actions by @mishig25 in #827)"
Switch default log to warn by @muellerzr in #859
Remove mixed precision hook as part of the unwrap_model by @muellerzr in #860
update deepspeed error message wrt batch_size by @pacman100 in #861
fix failing deepspeed test by @pacman100 in #868
Even more log level refined, leave alone if not explicitly set by @muellerzr in #871
Solve pickling issues by @muellerzr in #872
Spring cleaning by @muellerzr in #865
fixing lr_scheduler prepare issue when using pytorch nightly by @pacman100 in #878
fix fsdp state_dict_config because of PyTorch changes by @pacman100 in #877
Update deprecated logging warn by @SHi-ON in #881
fix a bug by @xiaohu2015 in #887
Allow safetensors offload by @sgugger in #873
fixing lr scheduler for pytorch nightly by @pacman100 in #884
Prefix all accelerate env vars with ACCELERATE by @muellerzr in #890
fix prefix issues in tests by @pacman100 in #891
Fix windows cli selector by @muellerzr in #893
Better description for improper kwargs by @muellerzr in #894
Support bfloat16 in load_offloaded_weight by @sgugger in #892

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@Chris-hughes10
- Add join_uneven_inputs context manager to Accelerator (#820)

Contributors

HammadB, muellerzr, and 6 other contributors

Assets 2

08 Nov 19:36

sgugger

v0.14.0

4e2c511

v0.14.0: Megatron-LM integration and support for PyTorch 1.13

Megatron LM integration

Accelerate now supports Megatron-LM for the three model classes (BERT, GPT-2 and T5). You can learn more in the documentation.

Megatron-LM integration by @pacman100 in #667
ensure megatron is 2.2.0+ by @jeffra in #755
updating docs to use fork of megatron-lm and minor example/docs fix by @pacman100 in #766
adding support to return logits and generate for Megatron-LM GPT models by @pacman100 in #819

PyTorch 1.13 support

Fixes a bug that returned SIGKILL errors on Windows.

Isolate distrib_run by @muellerzr in #828

Kaggle support with the `notebook_launcher`

With Kaggle now giving instances with two T4 GPUs, Accelerate can leverage this to do multi-gpu training from the notebook

Work in kaggle! by @muellerzr in #783

What's new?

Add non_blocking kwarg to send_to_device() by @NouamaneTazi in #607
[ds launcher] un-hijack PYTHONPATH by @stas00 in #741
Fix num_processes is not defined by @muellerzr in #746
[Device map] nn.Parameter don't have children by @patrickvonplaten in #747
Use HTML relative paths for tiles by @lewtun in #749
Add gpu_ids to SageMakerConfig though it should never be set by @muellerzr in #751
Change num_cpu_threads_per_process default by @muellerzr in #753
Return unclipped gradient from grad_clip_norm_ by @samuelstevens in #756
refactor by @pacman100 in #758
update docs by @pacman100 in #759
Only wrap modules in DDP if they require grad by @samuelstevens in #761
Move io_same_device hook to before attach_align_device hook on cpu_offload and disk_offload. by @piEsposito in #768
Regression cli tests by @muellerzr in #772
Fix number of devices in get_balanced_memory by @sgugger in #774
Fix all github actions issues + depreciations by @muellerzr in #773
Fix flakey wandb test by @muellerzr in #775
Add defaults for launchers by @muellerzr in #778
Allow BatchSamplerShard to not even out batches by @sgugger in #776
Make rich toggleable and seperate out a new environment utility file by @muellerzr in #779
Add same_network + docs by @muellerzr in #780
fix transformers tests by @ArthurZucker in #777
Add Dev Container configuration by @Chris-hughes10 in #782
separate dataloader generator from sampler generator by @pacman100 in #789
Consider top-level buffers when computing infer_auto_device_map by @younesbelkada in #792
Add even_batches keyword to Accelerator by @Chris-hughes10 in #781
Fix device_map="auto" on CPU-only envs by @sgugger in #797
Fix extraction of state dict in offload by @sgugger in #795
fix: add pdsh as default launcher by @zanussbaum in #800
Deal with optimizer.differentiable in PyTorch 1.13.0 by @comaniac in #803
Introduce a pod-config command by @muellerzr in #802
Refactor CLI to improve readability by @muellerzr in #810
adding support to pickle and unpickle AcceleratedOptimizer by @pacman100 in #811
add recurse argument in remove_hook_from_module by @younesbelkada in #812
Act on deprecations by @muellerzr in #813
Mlflow-tracker-v2 🔥 by @nbroad1881 in #794
Update CLI docs and use mps rather than mps_device by @muellerzr in #814
Rename pod-config to tpu-config + docs by @muellerzr in #818
Update docs by @muellerzr in #823
rename sklearn to proper dep by @muellerzr in #825
Rename by @muellerzr in #824
Update pr docs actions by @mishig25 in #827

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@Chris-hughes10
- Add Dev Container configuration (#782)
- Add even_batches keyword to Accelerator (#781)

Contributors

jeffra, muellerzr, and 15 other contributors

Assets 2

17 Oct 15:13

sgugger

v0.13.2

8d0a3ee

v0.13.2 Patch release

[Device map] nn.Parameter don't have children in #747 by @patrickvonplaten

Contributors

patrickvonplaten

Assets 2

07 Oct 16:33

sgugger

v0.13.1

0f3828a

v0.13.1 Patch release

Fix num_processes is not defined #746 by @muellerzr

Contributors

muellerzr

Assets 2

Releases: huggingface/accelerate

v0.20.0: MPS and fp4 support on Big Model Inference, 4-bit QLoRA, Intel GPU, Distributed Inference, and much more!

Big model inference

4-bit QLoRA Support

Distributed Inference Utilities

Introduce XPU support for Intel GPU

Add support for the new PyTorch XLA TPU runtime

A new optimizer method: LocalSGD

Papers with 🤗 Accelerate

Breaking changes

What's new?

Significant community contributions

Contributors

v0.19.0: IPEX Support, Foundations for Transformers Integration, FP8 for Ada Lovelace GPUs, and Squashed Bugs

What's New

What's Changed

New Contributors

Contributors

v0.18.0: GradientState enhancements and Big Model Inference Fixes

What's Changed

Breaking Changes

What's New?

New Contributors

Contributors

v0.17.1: Patch release

Contributors

v0.17.0: PyTorch 2.0 support, Process Control Enhancements, TPU pod support and FP8 mixed precision training

PyTorch 2.0 support

Process Control Enhancements

TPU Pod Support (Experimental)

FP8 mixed precision training (Experimental)

What's new?

Significant community contributions

Contributors

v0.16.0: Improved and Interactive Documentation, DataLoader Improvements

New code exploration doc tool

Skip batches in dataloaders

DeepSpeed integration enhancements:

What's new?

Contributors

v0.15.0: Pytorch 2.0 stack support

PyTorch 2.0 stack support

New CLI commands

What's new?

Significant community contributions

Contributors

v0.14.0: Megatron-LM integration and support for PyTorch 1.13

Megatron LM integration

PyTorch 1.13 support

Kaggle support with the notebook_launcher

What's new?

Significant community contributions

Contributors

v0.13.2 Patch release

Contributors

v0.13.1 Patch release

Contributors

A new optimizer method: `LocalSGD`

Kaggle support with the `notebook_launcher`