Skip to content

v0.13.1

Compare
Choose a tag to compare
@bandish-shah bandish-shah released this 07 Mar 03:11
· 944 commits to dev since this release

🚀 Composer v0.13.1

Introducing the composer PyPi package!

Composer v0.13.1 is released!

Composer can also now be installed using the new composer PyPi package via pip:

pip install composer==0.13.1

The legacy package name still works via pip:

pip install mosaicml==0.13.1

Note: The mosaicml==0.13.0 PyPi package was yanked due to some minor packaging issues discovered after release. The package was re-released as Composer v0.13.1, thus these release notes contain details for both v0.13.0 and v0.13.1.

New Features

  1. 🤙 New and Updated Callbacks

    • New HealthChecker Callback (#2002)

      The callback will log a warning if the GPUs on a given node appear to be in poor health (low utilization). The callback can also be configured to send a Slack message!

      from composer import Trainer
      from composer.callbacks import HealthChecker
      
      # Warn if GPU utilization difference drops below 10%
      health_checker = HealthChecker(
          threshold = 10
      )
      
      # Construct Trainer
      trainer = Trainer(
          ...,
          callbacks=health_checker,
      )
      
      # Train!
      trainer.fit()
    • Updated MemoryMonitor to use GigaBytes (GB) units (#1940)

    • New RuntimeEstimator Callback (#1991)

      Estimate the remaining runtime of your job! Approximates the time remaining by observing the throughput and comparing to the number of batches remaining.

      from composer import Trainer
      from composer.callbacks import RuntimeEstimator
      
      # Construct trainer with RuntimeEstimator callback
      trainer = Trainer(
          ...,
          callbacks=RuntimeEestimator(),
      )
      
      # Train!
      trainer.fit()
    • Updated SpeedMonitor throughput metrics (#1987)

      Expands throughput metrics to track relative to several different time units and per device:

      • throughput/batches_per_sec and throughput/device/batches_per_sec
      • throughput/tokens_per_sec and throughput/device/tokens_per_sec
      • throughput/flops_per_sec and throughput/device/flops_per_sec
      • throughput/device/samples_per_sec

      Also adds throughput/device/mfu metric to compute per device MFU. Simply enable the SpeedMonitor callback per usual to log these new metrics! Please see SpeedMonitor documentation for more information.

  2. ⣿ FSDP Sharded Checkpoints (#1902)

    Users can now specify the state_dict_type in the fsdp_config dictionary to enable sharded checkpoints. For example:

    from composer import Trainer
    
    fsdp_confnig = {
        'sharding_strategy': 'FULL_SHARD',
        'state_dict_type': 'local',
    }
    
    trainer = Trainer(
        ...,
        fsdp_config=fsdp_config,
        save_folder='checkpoints',
        save_filename='ba{batch}_rank{rank}.pt',
        save_interval='10ba',
    )

    Please see the PyTorch FSDP docs and Composer's Distributed Training notes for more information.

  3. 🤗 HuggingFace Improvements

    • Update HuggingFaceModel class to support encoder-decoder batches without decoder_input_ids (#1950)
    • Allow evaluation metrics to be passed to HuggingFaceModel directly (#1971)
    • Add a utility function to load a Composer checkpoint of a HuggingFaceModel and write out the expected config.json and pytorch_model.bin in the HuggingFace pretrained folder (#1974)
  4. 🛟 Nvidia H100 Alpha Support - Added amp_fp8 data type

    In preparation for H100's arrival, we've added the amp_fp8 precision type. Currently setting amp_fp8 specifies a new precision context using transformer_engine.pytorch.fp8_autocast. For more details, please see Nvidia's new Transformer Engine and the specific fp8 recipe we utilize.

    from composer import Trainer
    
    trainer = Trainer(
        ...,
        precision='amp_fp8',
    )

API changes

  • The torchmetrics package has been upgraded to 0.11.x.

    The torchmetrics.Accuracy metric now requires a task argument which can take on a value of binary, multiclass or multilabel. Please see Torchmetrics Accuracy docs for details.

    Additonally, since specifying value='multiclass' requires an additional field of num_classes to be specified, we've had to update ComposerClassifier to accept the additional num_classes argument. Please see PR's #2017 and #2025 for additional details

  • Surgery algorithms used in functional form return a value of None (#1543)

Deprecations

  • Deprecate HFCrossEntropy and Perplexity (#1857)
  • Remove Jenkins CI (#1943, #1954)
  • Change Deprecation Warnings to Warnings for specifying ProgressBarLogger and ConsoleLogger to loggers (#1846)

Bug Fixes

  • Fixed an issue introduced in 0.12.1 where HuggingFaceModel crashes if config.return_dict = False (#1948)
  • Refactor EMA to improve memory efficiency (#1941)
  • Make wandb checkpoint logging compatible with wandb model registry (#1973)
  • Fix ICL race conditions (#1978)
  • Update epoch metric name to trainer/epoch (#1986)
  • reset scaler (#1999)
  • Bug/sync optimization logger across ranks (#1970)
  • Update Docker images to fix resolve vulnerability scan issues (#2007)
  • Fix eval duplicate logging issue (#2018)
  • extend test and patch bug (#2028)
  • Protect for missing slack_sdk import (#2031)

Known Issues

  • Docker Image Security Vulnerability
    • CVE-2022-45907: The mosaicml/pytorch:1.12.1*, mosaicml/pytorch:1.11.0*, mosaicml/pytorch_vision:1.12.1* and mosaicml/pytorch_vision:1.11.0* images are impacted and currently supported for legacy use cases. We recommend users upgrade to images with PyTorch >1.13. The affected images will be removed in the next Composer release.

What's Changed

New Contributors

Full Changelog: v0.12.1...v0.13.1