Skip to content

Commit

Permalink
Hugging face trainer callback integration (#33)
Browse files Browse the repository at this point in the history
Co-authored-by: Jae-Won Chung <jwnchung@umich.edu>
  • Loading branch information
parthraut and jaywonchung authored Feb 16, 2024
1 parent 8bcf4f7 commit 5c41ab4
Show file tree
Hide file tree
Showing 8 changed files with 947 additions and 2 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,4 @@ dist/
*.json
**/.DS_Store
.cache/
env/
15 changes: 15 additions & 0 deletions docs/getting_started/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,21 @@ for epoch in range(100):
The [`GlobalPowerLimitOptimizer`][zeus.optimizer.power_limit.GlobalPowerLimitOptimizer] supports multiple [`OptimumSelector`][zeus.optimizer.power_limit.OptimumSelector]s that chooses one power limit among all the profiled power limits.
Selectors that are current implemented are [`Energy`][zeus.optimizer.power_limit.Energy], [`Time`][zeus.optimizer.power_limit.Time], [`ZeusCost`][zeus.optimizer.power_limit.ZeusCost] and [`MaxSlowdownConstraint`][zeus.optimizer.power_limit.MaxSlowdownConstraint].

### `HFGlobalPowerLimitOptimizer`
For easy use with [HuggingFace πŸ€— Transformers](https://huggingface.co/docs/transformers/en/index), [`HFGlobalPowerLimitOptimizer`][zeus.optimizer.power_limit.HFGlobalPowerLimitOptimizer] is a drop-in compatible [HuggingFace πŸ€— Trainer Callback](https://huggingface.co/docs/transformers/en/main_classes/callback). When initializing a [HuggingFace πŸ€— Trainer](https://huggingface.co/docs/transformers/main_classes/trainer), initialize and pass in [`HFGlobalPowerLimitOptimizer`][zeus.optimizer.power_limit.HFGlobalPowerLimitOptimizer] as shown below:

```python
monitor = ZeusMonitor()
optimizer = HFGlobalPowerLimitOptimizer(monitor)

# Initialize HuggingFace πŸ€— Trainer
trainer = Trainer(
...,
callbacks=[optimizer], # Add the `HFGlobalPowerLimitOptimizer` callback
)
```
Refer to our [HuggingFace πŸ€— example training code for fine-tuning using HFGlobalPowerLimitOptimizer](https://github.com/ml-energy/zeus/tree/master/examples/huggingface/) for complete running examples for single-GPU and multi-GPU training using [HuggingFace πŸ€— Trainer](https://huggingface.co/docs/transformers/main_classes/trainer).

## Recurring jobs

!!! Info
Expand Down
47 changes: 47 additions & 0 deletions examples/huggingface/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Integrating Zeus with HuggingFace πŸ€—

This example will demonstrate how to integrate Zeus with `HuggingFace πŸ€— Trainer` using `HFGlobalPowerLimitOptimizer`.

[`run_clm.py`](run_clm.py) was adapted from [HuggingFace πŸ€—'s example training code for fine-tuning language models](https://github.com/huggingface/transformers/tree/f3aa7db439a2a3942f76c115197fe953984ac334/examples/pytorch/language-modeling).

## Dependencies

Use the included requirements.txt file to include all extra dependencies:
```sh
pip install -r requirements.txt
```

## `ZeusMonitor` and `HFGlobalPowerLimitOptimizer`

- [`ZeusMonitor`](http://ml.energy/zeus/reference/monitor/#zeus.monitor.ZeusMonitor): Measures the GPU time and energy consumption of arbitrary code blocks.
- [`HFGlobalPowerLimitOptimizer`](https://ml.energy/zeus/reference/optimizer/power_limit/#zeus.optimizer.power_limit.HFGlobalPowerLimitOptimizer): Online-profiles each power limit with `ZeusMonitor` and finds the cost-optimal power limit. Calls GlobalPowerLimitOptimizer under the hood.

## Integration with HuggingFace πŸ€— Trainer
For easy use with [HuggingFace πŸ€— Transformers](https://huggingface.co/docs/transformers/en/index), [`HFGlobalPowerLimitOptimizer`](https://ml.energy/zeus/reference/optimizer/power_limit/#zeus.optimizer.power_limit.HFGlobalPowerLimitOptimizer) is a drop-in compatible [HuggingFace πŸ€— Trainer Callback](https://huggingface.co/docs/transformers/en/main_classes/callback). When initializing a [HuggingFace πŸ€— Trainer](https://huggingface.co/docs/transformers/main_classes/trainer), initialize and pass in [`HFGlobalPowerLimitOptimizer`](https://ml.energy/zeus/reference/optimizer/power_limit/#zeus.optimizer.power_limit.HFGlobalPowerLimitOptimizer) as shown below:

```python
monitor = ZeusMonitor()
optimizer = HFGlobalPowerLimitOptimizer(monitor)

# Initialize HuggingFace πŸ€— Trainer
trainer = Trainer(
...,
callbacks=[optimizer], # Add the `HFGlobalPowerLimitOptimizer` callback
)
```

## Running the Example

By default, `Trainer` will make use of all available GPUs. If you would like to use only a subset of the GPUs, specify the `CUDA_VISIBLE_DEVICES` environment variable, which Zeus will also automatically respect.

```bash
python run_clm.py \
--model_name_or_path gpt2 \
--dataset_name wikitext \
--dataset_config_name wikitext-2-raw-v1 \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 8 \
--do_train \
--do_eval \
--output_dir /tmp/test-clm
```
8 changes: 8 additions & 0 deletions examples/huggingface/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
accelerate >= 0.12.0
torch >= 1.3
datasets >= 1.8.0
sentencepiece != 0.1.92
protobuf
evaluate
scikit-learn
transformers>=4.37.2
Loading

0 comments on commit 5c41ab4

Please sign in to comment.