Releases: huggingface/accelerate
v0.20.0: MPS and fp4 support on Big Model Inference, 4-bit QLoRA, Intel GPU, Distributed Inference, and much more!
Big model inference
Support has been added to run device_map="auto"
on the MPS device. Big model inference also work with models loaded in 4 bits in Transformers.
- Add mps support to big inference modeling by @SunMarc in #1545
- Adds fp4 support for model dispatching by @younesbelkada in #1505
4-bit QLoRA Support
- 4-bit QLoRA via bitsandbytes (4-bit base model + LoRA) by @TimDettmers in #1458
Distributed Inference Utilities
This version introduces a new Accelerator.split_between_processes
utility to help with performing distributed infernece with non-tensorized or non-dataloader workflows. Read more here
Introduce XPU support for Intel GPU
- Intel GPU support initialization by @abhilash1910 in #1118
Add support for the new PyTorch XLA TPU runtime
A new optimizer method: LocalSGD
- This is a new wrapper around SGD which enables efficient multi-GPU training in the case when no fast interconnect is possible by @searchivarius in #1378
Papers with 🤗 Accelerate
- We now have an entire section of the docs dedicated to official paper implementations and citations using the framework #1399, see it live here
Breaking changes
logging_dir
has been fully deprecated, please use project_dir
or a Project_configuration
What's new?
- use existing mlflow experiment if exists by @Rusteam in #1403
- changes required for DS integration by @pacman100 in #1406
- fix deepspeed failing tests by @pacman100 in #1411
- Make mlflow logging dir optional by @mattplo-decath in #1413
- Fix bug on ipex for diffusers by @abhilash1910 in #1426
- Improve Slack Updater by @muellerzr in #1433
- Let quality yell at the user if it's a version difference by @muellerzr in #1438
- Ensure that it gets installed by @muellerzr in #1439
- [
core
] IntroducingCustomDtype
enum for custom dtypes by @younesbelkada in #1434 - Fix XPU by @muellerzr in #1440
- Make sure torch compiled model can also be unwrapped by @patrickvonplaten in #1437
- fixed: ZeroDivisionError: division by zero by @sreio in #1436
- fix potential OOM when resuming with multi-GPU training by @exhyy in #1444
- Fixes in infer_auto_device_map by @sgugger in #1441
- Raise error when logging improperly by @muellerzr in #1446
- Fix ci by @muellerzr in #1447
- Distributed prompting/inference utility by @muellerzr in #1410
- Add to by @muellerzr in #1448
- split_between_processes by @stevhliu in #1449
- [docs] Replace
state.rank
->process_index
by @pcuenca in #1450 - Auto multigpu logic by @muellerzr in #1452
- Update with cli instructions by @muellerzr in #1453
- Adds
in_order
argument that defaults to False, to log in order. by @JulesGM in #1262 - fix error for CPU DDP using trainer api. by @sywangyi in #1455
- Refactor and simplify xpu device in state by @abhilash1910 in #1456
- Document how to use commands with python module instead of argparse by @muellerzr in #1457
- 4-bit QLoRA via bitsandbytes (4-bit base model + LoRA) by @TimDettmers in #1458
- Fix skip first batch being perminant by @muellerzr in #1466
- update conversion of layers to retain original data type. by @avisinghal6 in #1467
- Check for xpu specifically by @muellerzr in #1472
- update
register_empty_buffer
to match torch args by @NouamaneTazi in #1465 - Update gradient accumulation docs, and remove redundant example by @iantbutler01 in #1461
- Imrpove sagemaker by @muellerzr in #1470
- Split tensors as part of
split_between_processes
by @muellerzr in #1477 - Move to device by @muellerzr in #1478
- Fix gradient state bugs in multiple dataloader by @Ethan-yt in #1483
- Add rdzv-backend by @muellerzr in #1490
- Only use IPEX if available by @muellerzr in #1495
- Update README.md by @lyhue1991 in #1493
- Let gather_for_metrics always run by @muellerzr in #1496
- Use empty like when we only need to create buffers by @thomasw21 in #1497
- Allow key skipping in big model inference by @sgugger in #1491
- fix crash when ipex is installed and torch has no xpu by @sywangyi in #1502
- [
bnb
] Add fp4 support for dispatch by @younesbelkada in #1505 - Fix 4bit model on multiple devices by @SunMarc in #1506
- adjust overriding of model's forward function by @prathikr in #1492
- Add assertion when call prepare with deepspeed config. by @tensimiku in #1468
- NVME path support for deepspeed by @abhilash1910 in #1484
- should set correct dtype to ipex optimize and use amp logic in native… by @sywangyi in #1511
- Swap env vars for XPU and IPEX + CLI by @muellerzr in #1513
- Fix a bug when parameters tied belong to the same module by @sgugger in #1514
- Fixup deepspeed/cli tests by @muellerzr in #1526
- Refactor mp into its own wrapper by @muellerzr in #1527
- Check tied parameters by @SunMarc in #1529
- Raise ValueError on iterable dataset if we've hit the end and attempting to go beyond it by @muellerzr in #1531
- Officially support naive PP for quantized models + PEFT by @younesbelkada in #1523
- remove ipexplugin, let ACCELERATE_USE_IPEX/ACCELERATE_USE_XPU control the ipex and xpu by @sywangyi in #1503
- Prevent using extra VRAM for static device_map by @LSerranoPEReN in #1536
- Update deepspeed.mdx by @LiamSwayne in #1541
- Update performance.mdx by @LiamSwayne in #1543
- Update deferring_execution.mdx by @LiamSwayne in #1544
- Apply deprecations by @muellerzr in #1537
- Add mps support to big inference modeling by @SunMarc in #1545
- [documentation] grammar fixes in gradient_synchronization.mdx by @LiamSwayne in #1547
- Eval mode by @muellerzr in #1540
- Update migration.mdx by @LiamSwayne in #1549
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @will-cromar
- @searchivarius
- Adding support for local SGD. (#1378)
- @abhilash1910
- @sywangyi
- @Ethan-yt
- Fix gradient state bugs in multiple dataloader (#1483)
v0.19.0: IPEX Support, Foundations for Transformers Integration, FP8 for Ada Lovelace GPUs, and Squashed Bugs
What's New
- Support for Intel IPEX support has been added, check out the how-to guide now!
- Various modifications have been added to begin work on having 🤗 Accelerate be the foundation for the
Trainer
, keep an eye on the repos to see how our progress is coming along! - FP8 training is now supported on Ada Lovelance GPUs
- The
wandb
integration now supports logging of images and tables throughtracker.log_images
andtracker.log_tables
respectively - Many, many squashed bugs! (see the full detailed report for just what they were)
- 17 new contributors to the framework, congratulations to all who took their first step! 🚀
What's Changed
- Fix pypi image by @muellerzr in #1249
- raise error when dataloader with None as batch_size when using DS by @pacman100 in #1250
- Handle missing deepspeed config by @HeyangQin in #1251
- [
core
] Add Quantization support fordispatch_model
by @younesbelkada in #1237 - Check attribute 'overflow' exists in optimizer. by @tensimiku in #1259
- ipex intel extension for pytorch integration by @sywangyi in #1255
- fix issue template by @stas00 in #1264
- Change error raised to ValueError by @sgugger in #1267
- Fix reduce operation by @xyfJASON in #1268
- Raise import error if fp8 not available in
has_transfomer_engine_layers
by @muellerzr in #1283 - Add missing FP8 options to CLI by @muellerzr in #1284
- Update quicktour.mdx by @StandardAI in #1273
- Minor fix whitespace colon by @guspan-tanadi in #1272
- fix attribute error in DataloaderShared by @ZhiyuanChen in #1278
- Fix TypeError bug in honor_type by @muellerzr in #1285
- Raise more explicit error when transformer_engine isn't installed by @muellerzr in #1287
- Expound error on
recursively_apply
by @muellerzr in #1286 - Only check for dtype if it has it in get_state_dict by @muellerzr in #1288
- [
bnb
] fix bnb slow test by @younesbelkada in #1292 - Raise better error on
notebook_launcher
by @muellerzr in #1293 - Make note about grad accum and prec in performance documentation by @muellerzr in #1296
- fix for load_checkpoint_and_dispatch(device_map=None) by @anentropic in #1297
- Set the state device dependant to Accelerator on multigpu by @muellerzr in #1220
- add usage guide for ipex plugin by @sywangyi in #1270
- Simplify MPS implementation by @sgugger in #1308
- Bug fix in setattr by @aashiqmuhamed in #1312
- Allow xpu backend by @muellerzr in #1313
- Default to nccl by @muellerzr in #1314
- offload the previous module hook before the current module is moved to… by @williamberman in #1315
- Ensure that dynamo is compatible with mixed precision by @muellerzr in #1318
- Upgrade torch version on main tests by @muellerzr in #1323
- Add test flag and import check for dynamo by @muellerzr in #1322
- ensure module prefixes only match that module by @xloem in #1319
- remove repetitive entries from device lists by @xloem in #1321
- Fix failing test on main by @muellerzr in #1332
- Verbosity, Progress Bar for Loading by @xloem in #1329
- Skip failing torch 2.0+ test by @muellerzr in #1339
- Remove unused amp import util by @muellerzr in #1340
- Fix nested context manager for main_process_first() by @flukeskywalker in #1304
- Small progress bar fix by @xloem in #1341
- Pop more backend options by @muellerzr in #1342
- Support FP8 mixed precision training for Ada Lovelace GPUs by @Dango233 in #1348
- using deepspeed.comm for distrbiuted init by @pacman100 in #1352
- [
bnb
] Fix bnb slow test by @younesbelkada in #1355 - Better check for packages availability by @apbard in #1356
- fix: typing issues, and replace deprecated python typing (Optional, Union) to
|
by @kiyoon in #1363 - Fix default FSDP_MIN_NUM_PARAMS so it's an int by @sam-hieken in #1367
- Special transformers case from args by @muellerzr in #1364
- Improve
accelerate env
reporting by @muellerzr in #1376 - Seperate out contextmanager generation by @muellerzr in #1379
- delete textfile after tests are done by @muellerzr in #1381
- Fix flakey thread issue by @muellerzr in #1387
- fix config bug for 'mixed_precision' from 'yaml.safe_load()' by @ys-eric-choi in #1386
- Log Images and other types to wandb by @tcapelle in #962
- Bump torch version by @muellerzr in #1392
- Fix gather_obj by @muellerzr in #1391
- Update training_zoo.mdx by @yuvalkirstain in #1397
New Contributors
- @HeyangQin made their first contribution in #1251
- @tensimiku made their first contribution in #1259
- @xyfJASON made their first contribution in #1268
- @StandardAI made their first contribution in #1273
- @guspan-tanadi made their first contribution in #1272
- @anentropic made their first contribution in #1297
- @aashiqmuhamed made their first contribution in #1312
- @williamberman made their first contribution in #1315
- @xloem made their first contribution in #1319
- @flukeskywalker made their first contribution in #1304
- @Dango233 made their first contribution in #1348
- @apbard made their first contribution in #1356
- @kiyoon made their first contribution in #1363
- @sam-hieken made their first contribution in #1367
- @ys-eric-choi made their first contribution in #1386
- @tcapelle made their first contribution in #962
- @yuvalkirstain made their first contribution in #1397
Full Changelog: v0.18.0...v0.19.0
v0.18.0: GradientState enhancements and Big Model Inference Fixes
What's Changed
- A new
GradientAccumulationPlugin
has been added to handle more configurations with theGradientState
. Specifically you can optionally disable havingAccelerate
automatically adjust the length of the scheduler relative to gradient accumulation steps through it. Otherwise Accelerate will now automatically handle ensuring that the schedulers built for non-gradient accumulation will work during gradient accumulation - Some fixes related to the launch configuration and TPU launches were adjusted, and the
dynamo_backend
warning has been silenced. - Big model inference saw a number of fixes related to linear layers,
drop_last
on linear layers, tied weight loading, and handling of multiple tied parameters - A new integration example with RunhouseML has been added, read more here: https://github.com/huggingface/accelerate/tree/main/examples#simple-multi-gpu-hardware-launcher
Breaking Changes
find_tied_parameters
now deals with groups of tied parameters (instead of only pairs of them). As a result it now returns a list of list of strings instead of a dictionary.
What's New?
- Add documentation around FSDP state dict save behavior by @VikParuchuri in #1181
- add
use_orig_params
to FullyShardedDataParallelPlugin by @pacman100 in #1184 - Only convert linear layers with weights multiple of 16 by @sgugger in #1188
- Set drop last to ensure modulo16 restriction for fp8 by @ksivaman in #1189
- Accelerator should not call
to
on modules that wrapsaccelerate
loaded models by @younesbelkada in #1172 - Fixup passing overlapping args to the script by @muellerzr in #1198
- Make the Scheduler adjust the steps taken relative to the gradient accumulation steps by @muellerzr in #1187
- Fix tied weights load by @sgugger in #1204
- Better error message when using multi-GPU and Accelerate on torch <1.9.1 by @muellerzr in #1203
- Fix typo in TPU config by @muellerzr in #1202
- Fix example in accumulate method documentation by @VikParuchuri in #1211
- ds offload optim fix to use CPUAdam by @pacman100 in #1208
- Move when the GradientState test is performed so that it is not None by @muellerzr in #1219
- Fix bug in loading launch config by @neumyor in #1218
- Fix get_logger kwarg documentation issue by @bcol23 in #1222
- docs: add finetuner to ppl who use accelerate by @bwanglzu in #1224
- Silence dynamo_backend by @muellerzr in #1226
- Add additional check when patching env by @Chris-hughes10 in #1229
- Make grad accum steps mutable on the Accelerator object by @muellerzr in #1233
- devcontainer: "extensions" has been removed and replaced by customizations by @dbpprt in #1075
- remove empty dicts while saving accelerate config by @pacman100 in #1236
- backfill ds plugin attributes when using ds_config by @pacman100 in #1235
- Change multinode to multigpu in notebook tutorial by @muellerzr in #1247
- Hardware Auto-Setup Example/Tutorial for Distributed Launch by @carolineechen in #1227
- Handle multiple tied parameters by @sgugger in #1241
New Contributors
- @hackpert made their first contribution in #1180
- @VikParuchuri made their first contribution in #1181
- @ksivaman made their first contribution in #1189
- @neumyor made their first contribution in #1218
- @bcol23 made their first contribution in #1222
- @bwanglzu made their first contribution in #1224
- @carolineechen made their first contribution in #1227
Full Changelog: v0.17.1...v0.18.0
v0.17.1: Patch release
v0.17.0: PyTorch 2.0 support, Process Control Enhancements, TPU pod support and FP8 mixed precision training
PyTorch 2.0 support
This release fully supports the upcoming PyTorch 2.0 release. You can choose to use torch.compile
or not and then customize the options in accelerate.config
or via a TorchDynamoPlugin
.
- update support for torch dynamo compile by @pacman100 in #1150
Process Control Enhancements
This release adds a new PartialState
, which contains most of the capabilities of the AcceleratorState
however it is designed to be used by the user to assist in any process control mechanisms around it. With this, users also now do not need to have if accelerator.state.is_main_process
when utilizing classes such as the Tracking
API, as these now will automatically use only the main process for their work by default.
- Refactor process executors to be in AcceleratorState by @muellerzr in #1039
TPU Pod Support (Experimental)
Launching from TPU pods is now supported, please see this issue for more information
- Introduce TPU Pod launching to
accelerate launch
by @muellerzr in #1049
FP8 mixed precision training (Experimental)
This release adds experimental support for FP8 mixed precision training, which requires the transformer-engine library as well as a Hopper GPU (or higher).
What's new?
- v0.17.0.dev0 by @sgugger (direct commit on main)
- Deepspeed param check by @dhar174 in #1015
- enabling
mps
device by default and removing related config by @pacman100 in #1030 - fix: links to gradient synchronization by @prassanna-ravishankar in #1035
- do not scale gradient in bf16 mode by @kashif in #1036
- Pass keywords arguments of backward function deeper to DeepSpeed by @DistinctVision in #1037
- Add daily slack notifier for nightlies by @muellerzr in #1042
- Make sure direct parameters are properly set on device by @sgugger in #1043
- Add
cpu_offload_with_hook
by @sgugger in #1045 - Update quality tools to 2023 by @sgugger in #1046
- Load tensors directly on device by @sgugger in #1028
- Fix cpu_offload_with_hook code snippet by @pcuenca in #1047
- Use create_task by @muellerzr in #1052
- Fix args by adding in the defaults by @muellerzr in #1053
- deepspeed
hidden_size
auto value default fixes by @pacman100 in #1060 - Introduce PartialState by @muellerzr in #1055
- Flag for deprecation by @muellerzr in #1061
- Try with this by @muellerzr in #1062
- Update integrations by @muellerzr in #1063
- Swap utils over to use PartialState by @muellerzr in #1065
- update fsdp docs and removing deepspeed version pinning by @pacman100 in #1059
- Fix/implement process-execution decorators on the Accelerator by @muellerzr in #1070
- Refactor state and make
PartialState
first class citizen by @muellerzr in #1071 - Add error if passed --config_file does not exist by @muellerzr in #1074
- SageMaker image_uri is now optional by @ in #1077
- Allow custom SageMaker Estimator arguments by @ in #1080
- Fix tpu_cluster arg by @muellerzr in #1081
- Update complete_cv_example.py by @fcossio in #1082
- Added SageMaker local mode config section by @ in #1084
- Fix config by @muellerzr in #1090
- adds missing "lfs" in pull by @CSchoel in #1091
- add multi_cpu support to reduce by @alex-hh in #1094
- Update README.md by @BM-K in #1100
- Tracker rewrite and lazy process checker by @muellerzr in #1079
- Update performance.mdx by @fcossio in #1107
- Attempt to unwrap tracker. by @pcuenca in #1109
- TensorBoardTracker: wrong arg def by @stas00 in #1111
- Actually raise if exception by @muellerzr in #1124
- Add test for ops and fix reduce by @muellerzr in #1122
- Deep merge SageMaker
additional_args
, allowing more flexible configuration andenv
variable support by @dbpprt in #1113 - Move dynamo.optimize to the end of model preparation by @ymwangg in #1128
- Refactor
launch
for greater extensibility by @Yard1 in #1123 - [Big model loading] Correct GPU only loading by @patrickvonplaten in #1121
- Add tee and role to launch by @muellerzr in #1132
- Expand warning and grab all GPUs available by default by @muellerzr in #1134
- Fix multinode with GPU ids when each node has 1 by @muellerzr in #1127
- deepspeed dataloader prepare fix by @pacman100 in #1126
- fix ds dist init kwargs issue by @pacman100 in #1138
- fix lr scheduler issue by @pacman100 in #1140
- fsdp bf16 enable autocast by @pacman100 in #1125
- Fix notebook_launcher by @muellerzr in #1141
- fix partial state by @pacman100 in #1144
- FSDP enhancements and fixes by @pacman100 in #1145
- Fixed typos in notebook by @SamuelLarkin in #1146
- Include a note in the gradient synchronization docs on "what can go wrong" and show the timings by @muellerzr in #1153
- [Safetensors] Relax missing metadata constraint by @patrickvonplaten in #1151
- Solve arrow keys being environment dependant for accelerate config by @p1atdev (direct commit on main)
- Load custom state to cpu by @Guangxuan-Xiao in #1156
- 📝 add a couple more trackers to the docs by @nateraw in #1158
- Let GradientState know active dataloaders and reset the remainder by @muellerzr in #1162
- Attempt to fix import error when PyTorch is build without
torch.distributed
module by @mfuntowicz in #1108 - [
Accelerator
] Fix issue with 8bit models by @younesbelkada in #1155 - Document skip_first_batches in the checkpoint usage guides by @muellerzr in #1164
- Fix what files get deleted through
total_limit
by @muellerzr in #1165 - Remove outdated command directions and use in tests by @muellerzr in #1166
Significant community contributions
The following contributors have made significant changes to the library over the last release:
v0.16.0: Improved and Interactive Documentation, DataLoader Improvements
New code exploration doc tool
A new interactive tool has been introduced to the documentation to help users quickly learn how to utilize features of the framework before providing more details on them as shown below:
Not only does it provide a code diff, but it also includes an explanation and links to more resources the user should check out to learn more:
Try it out today in the docs
- Add in code exploration tool to docs by @muellerzr in #1014
- Light vs dark theme based on pick by @muellerzr in #1023
Skip batches in dataloaders
When resuming training, you can more efficiently skip batches in your dataloader with the new skip_first_batches
function (also available as a method on your Accelerator
).
DeepSpeed integration enhancements:
A new ZeRO-3 init context manager is added to provide granular control to users in situations involving nested/multiple models. Refactoring of DeepSpeed Config file support to remove ambiguity between it and Accelerate config.
Adding support for auto
entries in the DeeSpeed config file to be filled via the accelerate launch
command. Try it out today by referring to the section Things to note when using DeepSpeed Config File
- ds zero-3 init context manager by @pacman100 in #932
- raise error for duplicate accelerate config values when using
deepspeed_config_file
by @pacman100 in #941
What's new?
- Flag to silence subprocess.CalledProcessError in launch by @Cyberes in #902
- Add usage examples by @muellerzr in #904
- Expand sanity checks by @muellerzr in #905
- Fix conditional by @muellerzr in #907
- fix issue that amp bf16 does not work for cpu in env with cuda. by @sywangyi in #906
- fsdp enhancements by @pacman100 in #911
- Fix typos accelerate -> accelerator by @pcuenca in #915
- 🚨🚨🚨 Act on deprecations 🚨🚨🚨 by @muellerzr in #917
- fix accelerate test failure with cpu config by @sywangyi in #909
- Introduce
project_dir
and limit the number of saved checkpoints by @muellerzr in #916 - Specify inference by @muellerzr in #921
- Support
init_on_device
by @thomasw21 in #926 - ds-z3-init and prepending ds env variables with
ACCELERATE_
by @pacman100 in #928 - Honor model dtype in
load_checkpoint
by @sgugger in #920 - ds zero-3 init context manager by @pacman100 in #932
- Fix silly typo by @tornikeo in #939
- add
mixed_precision_type
property toAcceleratorState
by @pacman100 in #935 - fix batch size in prepare_dataloader for iterable datasets by @sanderland in #937
- fix mp related test fails by @pacman100 in #943
- Fix tracker by @muellerzr in #942
- Fix offload when weights are on the GPU by @sgugger in #945
- raise error for duplicate accelerate config values when using
deepspeed_config_file
by @pacman100 in #941 - Add is_initialized method and refactor by @muellerzr in #949
- Fix DeepSpeed tests by @muellerzr in #950
- Don't automatically offload buffers when loading checkpoints by @sgugger in #951
- Typo fix in src/accelerate/utils/modeling.py by @ryderwishart in #955
- support master port when using ds multi-node launcher by @pacman100 in #959
- Allowing encoded configuration for DeepSpeed by @cli99 in #895
- Update README.md by @Don9wanKim in #968
- Raise minimum version for distrib launch by @muellerzr in #978
- Fix tied parameters test in big model inference by @sgugger in #979
- Fix type error on line 36 by @dhar174 in #981
- Ensure that last batch doesn't get dropped if perfectly even in gather_for_metrics by @muellerzr in #982
- Skip wandb test for now by @muellerzr in #984
- Fix test for converting tensor to proper dtype by @sgugger in #983
- in sync with trfs, removing style_doc utils and using doc-builder instead by @pacman100 in #988
- Add new release_memory util by @muellerzr in #990
- adding support for kwargs in
load_state
by @pacman100 in #989 - Fix scheduler incorrect steps when gradient accumulation enabled by @markovalexander in #999
- Fix parameters tying in dispatch_model by @sgugger in #1000
- improve deepspeed notes by @stas00 in #1003
- Update toctree by @muellerzr in #1008
- Add styleguide by @muellerzr in #1007
- Maintain accumulation steps by @muellerzr in #1011
- Saving and loading state hooks by @patrickvonplaten in #991
- Fix test introduced in PR and introduce AcceleratorTestCase by @muellerzr in #1016
- Allow the torch device to be set with an env var by @Yard1 in #1009
- Fix import of LrScheduler by @sgugger in #1017
- Don't force mixed precision as no in examples by @sgugger in #1018
- Include steppage in performance docs by @muellerzr in #1013
- Fix env var by @muellerzr in #1024
- Change default for keep_fp32_wrapper by @muellerzr in #1025
- Fix slow test by keeping tied weights on the same GPU by @sgugger in #1026
- Start of adding examples by @muellerzr in #1001
- More improvements to docstrings + examples by @muellerzr in #1010
- With example by @muellerzr in #1027
- sagemaker launcher fixes by @pacman100 in #1031
v0.15.0: Pytorch 2.0 stack support
PyTorch 2.0 stack support
We are very excited by the newly announced PyTorch 2.0 stack and you can try it using Accelerate on any model by using the dynamo_backend
argument of the Accelerator
, or when filling your config with accelerate config
.
Note that to get the best performance, we recommend:
- using an Ampere GPU (or more recent)
- sticking to fixed shaped for now
New CLI commands
- Added two new commands,
accelerate config update
andaccelerate config default
. The first will update a config file to have the latest keys added from latter releases of Accelerate, and the second will create a default configuration file automatically mimickingwrite_default_config()
introduced in #851 and #853 by @muellerzr - Also introduced a filterable help for
accelerate launch
which will show options relevant to the choices shown, such asaccelerate launch --multi_gpu
will show launch parameters relevant to multi-gpu training.
What's new?
- fix 🐛 by @pacman100 in #836
- Deepspeed example should use gather_for_metrics by @HammadB in #821
- Highlight selection with pretty colors by @muellerzr in #839
- Add
join_uneven_inputs
context manager to Accelerator by @Chris-hughes10 in #820 - Introduce
default-config
command by @muellerzr in #840 - Fix log error and add log level to get_logger by @muellerzr in #842
- Fix if/else by @muellerzr in #849
- Fix complete_cv example by @muellerzr in #848
- Refactor Accelerate config and introduce a multi-argument CLI interface by @muellerzr in #851
- Clean up, add update command by @muellerzr in #853
- Revert "Update pr docs actions by @mishig25 in #827)"
- Switch default log to warn by @muellerzr in #859
- Remove mixed precision hook as part of the unwrap_model by @muellerzr in #860
- update deepspeed error message wrt
batch_size
by @pacman100 in #861 - fix failing deepspeed test by @pacman100 in #868
- Even more log level refined, leave alone if not explicitly set by @muellerzr in #871
- Solve pickling issues by @muellerzr in #872
- Spring cleaning by @muellerzr in #865
- fixing lr_scheduler prepare issue when using pytorch nightly by @pacman100 in #878
- fix fsdp state_dict_config because of PyTorch changes by @pacman100 in #877
- Update deprecated logging warn by @SHi-ON in #881
- fix a bug by @xiaohu2015 in #887
- Allow safetensors offload by @sgugger in #873
- fixing lr scheduler for pytorch nightly by @pacman100 in #884
- Prefix all accelerate env vars with ACCELERATE by @muellerzr in #890
- fix prefix issues in tests by @pacman100 in #891
- Fix windows cli selector by @muellerzr in #893
- Better description for improper kwargs by @muellerzr in #894
- Support bfloat16 in load_offloaded_weight by @sgugger in #892
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @Chris-hughes10
- Add
join_uneven_inputs
context manager to Accelerator (#820)
- Add
v0.14.0: Megatron-LM integration and support for PyTorch 1.13
Megatron LM integration
Accelerate now supports Megatron-LM for the three model classes (BERT, GPT-2 and T5). You can learn more in the documentation.
- Megatron-LM integration by @pacman100 in #667
- ensure megatron is 2.2.0+ by @jeffra in #755
- updating docs to use fork of megatron-lm and minor example/docs fix by @pacman100 in #766
- adding support to return logits and generate for Megatron-LM GPT models by @pacman100 in #819
PyTorch 1.13 support
Fixes a bug that returned SIGKILL errors on Windows.
- Isolate distrib_run by @muellerzr in #828
Kaggle support with the notebook_launcher
With Kaggle now giving instances with two T4 GPUs, Accelerate can leverage this to do multi-gpu training from the notebook
- Work in kaggle! by @muellerzr in #783
What's new?
- Add
non_blocking
kwarg tosend_to_device()
by @NouamaneTazi in #607 - [ds launcher] un-hijack PYTHONPATH by @stas00 in #741
- Fix num_processes is not defined by @muellerzr in #746
- [Device map] nn.Parameter don't have children by @patrickvonplaten in #747
- Use HTML relative paths for tiles by @lewtun in #749
- Add gpu_ids to SageMakerConfig though it should never be set by @muellerzr in #751
- Change num_cpu_threads_per_process default by @muellerzr in #753
- Return unclipped gradient from grad_clip_norm_ by @samuelstevens in #756
- refactor by @pacman100 in #758
- update docs by @pacman100 in #759
- Only wrap modules in DDP if they require grad by @samuelstevens in #761
- Move io_same_device hook to before attach_align_device hook on cpu_offload and disk_offload. by @piEsposito in #768
- Regression cli tests by @muellerzr in #772
- Fix number of devices in get_balanced_memory by @sgugger in #774
- Fix all github actions issues + depreciations by @muellerzr in #773
- Fix flakey wandb test by @muellerzr in #775
- Add defaults for launchers by @muellerzr in #778
- Allow BatchSamplerShard to not even out batches by @sgugger in #776
- Make rich toggleable and seperate out a new environment utility file by @muellerzr in #779
- Add same_network + docs by @muellerzr in #780
- fix transformers tests by @ArthurZucker in #777
- Add Dev Container configuration by @Chris-hughes10 in #782
- separate dataloader generator from sampler generator by @pacman100 in #789
- Consider top-level buffers when computing
infer_auto_device_map
by @younesbelkada in #792 - Add
even_batches
keyword to Accelerator by @Chris-hughes10 in #781 - Fix device_map="auto" on CPU-only envs by @sgugger in #797
- Fix extraction of state dict in offload by @sgugger in #795
- fix: add pdsh as default launcher by @zanussbaum in #800
- Deal with optimizer.differentiable in PyTorch 1.13.0 by @comaniac in #803
- Introduce a pod-config command by @muellerzr in #802
- Refactor CLI to improve readability by @muellerzr in #810
- adding support to pickle and unpickle
AcceleratedOptimizer
by @pacman100 in #811 - add
recurse
argument inremove_hook_from_module
by @younesbelkada in #812 - Act on deprecations by @muellerzr in #813
- Mlflow-tracker-v2 🔥 by @nbroad1881 in #794
- Update CLI docs and use mps rather than mps_device by @muellerzr in #814
- Rename pod-config to tpu-config + docs by @muellerzr in #818
- Update docs by @muellerzr in #823
- rename sklearn to proper dep by @muellerzr in #825
- Rename by @muellerzr in #824
- Update pr docs actions by @mishig25 in #827
Significant community contributions
The following contributors have made significant changes to the library over the last release:
v0.13.2 Patch release
- [Device map] nn.Parameter don't have children in #747 by @patrickvonplaten
v0.13.1 Patch release
- Fix num_processes is not defined #746 by @muellerzr