Skip to content
This repository has been archived by the owner on Nov 1, 2024. It is now read-only.

Commit

Permalink
Sweep code for studying model population stats (2 of 2) (#144)
Browse files Browse the repository at this point in the history
Summary:
This is a *major update* and introduces powerful new functionality to pycls.

The pycls codebase now provides powerful support for studying *design spaces* and more generally *population statistics* of models as introduced in [On Network Design Spaces for Visual Recognition](https://arxiv.org/abs/1905.13214) and [Designing Network Design Spaces](https://arxiv.org/abs/2003.13678). This idea is that instead of planning a single pycls job (e.g., testing a specific model configuration), one can study the behavior of an entire population of models. This allows for quite powerful and succinct experimental design, and elevates the study of individual model behavior to the study of the behavior of model populations. Please see [`SWEEP_INFO`](docs/SWEEP_INFO.md) for details.

This is commit 2 of 2 for the sweep code. It is focused on sweep analysis, sweep examples, and documentation.

Pull Request resolved: #144

Reviewed By: rajprateek

Differential Revision: D28586390

Pulled By: pdollar

fbshipit-source-id: 55856f9aaf7ae49243f4870c787a144b03e5d2a9

Co-authored-by: Raj Prateek Kosaraju <rajprateek@users.noreply.github.com>
Co-authored-by: Piotr Dollar <699682+pdollar@users.noreply.github.com>
  • Loading branch information
3 people authored and facebook-github-bot committed May 20, 2021
1 parent bd65938 commit 2d71381
Show file tree
Hide file tree
Showing 12 changed files with 1,046 additions and 11 deletions.
22 changes: 16 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,7 @@

## Introduction

The goal of **pycls** is to provide a simple and flexible codebase for image classification. It is designed to support rapid implementation and evaluation of research ideas. **pycls** also provides a large collection of baseline results ([Model Zoo](MODEL_ZOO.md)).

The codebase supports efficient single-machine multi-gpu training, powered by the PyTorch distributed package, and provides implementations of standard models including [ResNet](https://arxiv.org/abs/1512.03385), [ResNeXt](https://arxiv.org/abs/1611.05431), [EfficientNet](https://arxiv.org/abs/1905.11946), and [RegNet](https://arxiv.org/abs/2003.13678).
The goal of **pycls** is to provide a simple and flexible codebase for image classification. It is designed to support rapid implementation and evaluation of research ideas. **pycls** also provides a large collection of baseline results ([Model Zoo](MODEL_ZOO.md)). The codebase supports efficient single-machine multi-gpu training, powered by the PyTorch distributed package, and provides implementations of standard models including [ResNet](https://arxiv.org/abs/1512.03385), [ResNeXt](https://arxiv.org/abs/1611.05431), [EfficientNet](https://arxiv.org/abs/1905.11946), and [RegNet](https://arxiv.org/abs/2003.13678).

## Using pycls

Expand All @@ -21,13 +19,18 @@ Please see [`GETTING_STARTED`](docs/GETTING_STARTED.md) for brief installation i

We provide a large set of baseline results and pretrained models available for download in the **pycls** [Model Zoo](MODEL_ZOO.md); including the simple, fast, and effective [RegNet](https://arxiv.org/abs/2003.13678) models that we hope can serve as solid baselines across a wide range of flop regimes.

## Sweep Code

The pycls codebase now provides powerful support for studying *design spaces* and more generally *population statistics* of models as introduced in [On Network Design Spaces for Visual Recognition](https://arxiv.org/abs/1905.13214) and [Designing Network Design Spaces](https://arxiv.org/abs/2003.13678). This idea is that instead of planning a single pycls job (e.g., testing a specific model configuration), one can study the behavior of an entire population of models. This allows for quite powerful and succinct experimental design, and elevates the study of individual model behavior to the study of the behavior of model populations. Please see [`SWEEP_INFO`](docs/SWEEP_INFO.md) for details.

## Projects

A number of projects at FAIR have been built on top of **pycls**:

- [On Network Design Spaces for Visual Recognition](https://arxiv.org/abs/1905.13214)
- [Exploring Randomly Wired Neural Networks for Image Recognition](https://arxiv.org/abs/1904.01569)
- [Designing Network Design Spaces](https://arxiv.org/abs/2003.13678)
- [Fast and Accurate Model Scaling](https://arxiv.org/abs/2103.06877)
- [Are Labels Necessary for Neural Architecture Search?](https://arxiv.org/abs/2003.12056)
- [PySlowFast Video Understanding Codebase](https://github.com/facebookresearch/SlowFast)

Expand All @@ -40,22 +43,29 @@ If you find **pycls** helpful in your research or refer to the baseline results
```
@InProceedings{Radosavovic2019,
title = {On Network Design Spaces for Visual Recognition},
author = {Radosavovic, Ilija and Johnson, Justin and Xie, Saining and Lo, Wan-Yen and Doll{\'a}r, Piotr},
author = {Ilija Radosavovic and Justin Johnson and Saining Xie Wan-Yen Lo and Piotr Doll{\'a}r},
booktitle = {ICCV},
year = {2019}
}
@InProceedings{Radosavovic2020,
title = {Designing Network Design Spaces},
author = {Radosavovic, Ilija and Kosaraju, Raj Prateek and Girshick, Ross and He, Kaiming and Doll{\'a}r, Piotr},
author = {Ilija Radosavovic and Raj Prateek Kosaraju and Ross Girshick and Kaiming He and Piotr Doll{\'a}r},
booktitle = {CVPR},
year = {2020}
}
@InProceedings{Dollar2021,
title = {Fast and Accurate Model Scaling},
author = {Piotr Doll{\'a}r and Mannat Singh and Ross Girshick},
booktitle = {CVPR},
year = {2021}
}
```

## License

**pycls** is released under the MIT license. Please see the [LICENSE](LICENSE) file for more information.
**pycls** is released under the MIT license. Please see the [`LICENSE`](LICENSE) file for more information.

## Contributing

Expand Down
87 changes: 87 additions & 0 deletions configs/sweeps/cifar/cifar_best.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
DESC:
Example CIFAR sweep 3 of 3 (trains the best model from cifar_regnet sweep).
Train the best RegNet-125M from cifar_regnet sweep for variable epoch lengths.
Trains 3 copies of every model (to obtain mean and std of the error).
The purpose of this sweep is to show how to train FINAL version of a model.
NAME: cifar/cifar_best
SETUP:
# Number of configs to sample
NUM_CONFIGS: 12
# SAMPLERS for optimization parameters
SAMPLERS:
OPTIM.MAX_EPOCH:
TYPE: value_sampler
VALUES: [50, 100, 200, 400]
RNG_SEED:
TYPE: int_sampler
RAND_TYPE: uniform
RANGE: [1, 3]
QUANTIZE: 1
CONSTRAINTS:
REGNET:
NUM_STAGES: [2, 2]
# BASE_CFG is RegNet-125MF (best model from cifar_regnet sweep)
BASE_CFG:
MODEL:
TYPE: regnet
NUM_CLASSES: 10
REGNET:
STEM_TYPE: res_stem_cifar
SE_ON: True
STEM_W: 16
DEPTH: 12
W0: 96
WA: 19.5
WM: 2.942
GROUP_W: 8
OPTIM:
BASE_LR: 1.0
LR_POLICY: cos
MAX_EPOCH: 50
MOMENTUM: 0.9
NESTEROV: True
WARMUP_EPOCHS: 5
WEIGHT_DECAY: 0.0005
EMA_ALPHA: 0.00025
EMA_UPDATE_PERIOD: 32
BN:
USE_CUSTOM_WEIGHT_DECAY: True
TRAIN:
DATASET: cifar10
SPLIT: train
BATCH_SIZE: 1024
IM_SIZE: 32
MIXED_PRECISION: True
LABEL_SMOOTHING: 0.1
MIXUP_ALPHA: 0.5
TEST:
DATASET: cifar10
SPLIT: test
BATCH_SIZE: 1000
IM_SIZE: 32
NUM_GPUS: 1
DATA_LOADER:
NUM_WORKERS: 4
LOG_PERIOD: 25
VERBOSE: False
# Launch config options
LAUNCH:
PARTITION: devlab
NUM_GPUS: 1
PARALLEL_JOBS: 12
TIME_LIMIT: 180
# Analyze config options
ANALYZE:
PLOT_METRIC_VALUES: False
PLOT_COMPLEXITY_VALUES: False
PLOT_CURVES_BEST: 3
PLOT_CURVES_WORST: 0
PLOT_MODELS_BEST: 1
METRICS: []
COMPLEXITY: [flops, params, acts, memory, epoch_fw_bw, epoch_time]
PRE_FILTERS: {done: [0, 1, 1]}
SPLIT_FILTERS:
epochs=050: {cfg.OPTIM.MAX_EPOCH: [ 50, 50, 50]}
epochs=100: {cfg.OPTIM.MAX_EPOCH: [100, 100, 100]}
epochs=200: {cfg.OPTIM.MAX_EPOCH: [200, 200, 200]}
epochs=400: {cfg.OPTIM.MAX_EPOCH: [400, 400, 400]}
76 changes: 76 additions & 0 deletions configs/sweeps/cifar/cifar_optim.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
DESC:
Example CIFAR sweep 1 of 3 (find lr and wd for cifar_regnet and cifar_best sweeps).
Tunes the learning rate (lr) and weight decay (wd) for ResNet-56 at 50 epochs.
The purpose of this sweep is to show how to optimize OPTIM parameters.
NAME: cifar/cifar_optim
SETUP:
# Number of configs to sample
NUM_CONFIGS: 64
# SAMPLERS for optimization parameters
SAMPLERS:
OPTIM.BASE_LR:
TYPE: float_sampler
RAND_TYPE: log_uniform
RANGE: [0.25, 5.0]
QUANTIZE: 1.0e-10
OPTIM.WEIGHT_DECAY:
TYPE: float_sampler
RAND_TYPE: log_uniform
RANGE: [5.0e-5, 1.0e-3]
QUANTIZE: 1.0e-10
# BASE_CFG is R-56 with large batch size and stronger augmentation
BASE_CFG:
MODEL:
TYPE: anynet
NUM_CLASSES: 10
ANYNET:
STEM_TYPE: res_stem_cifar
STEM_W: 16
BLOCK_TYPE: res_basic_block
DEPTHS: [9, 9, 9]
WIDTHS: [16, 32, 64]
STRIDES: [1, 2, 2]
OPTIM:
BASE_LR: 1.0
LR_POLICY: cos
MAX_EPOCH: 50
MOMENTUM: 0.9
NESTEROV: True
WARMUP_EPOCHS: 5
WEIGHT_DECAY: 0.0005
EMA_ALPHA: 0.00025
EMA_UPDATE_PERIOD: 32
BN:
USE_CUSTOM_WEIGHT_DECAY: True
TRAIN:
DATASET: cifar10
SPLIT: train
BATCH_SIZE: 1024
IM_SIZE: 32
MIXED_PRECISION: True
LABEL_SMOOTHING: 0.1
MIXUP_ALPHA: 0.5
TEST:
DATASET: cifar10
SPLIT: test
BATCH_SIZE: 1000
IM_SIZE: 32
NUM_GPUS: 1
DATA_LOADER:
NUM_WORKERS: 4
LOG_PERIOD: 25
VERBOSE: False
# Launch config options
LAUNCH:
PARTITION: devlab
NUM_GPUS: 1
PARALLEL_JOBS: 32
TIME_LIMIT: 60
# Analyze config options
ANALYZE:
PLOT_CURVES_BEST: 3
PLOT_METRIC_VALUES: True
PLOT_COMPLEXITY_VALUES: True
METRICS: [lr, wd, lr_wd]
COMPLEXITY: [flops, params, acts, memory, epoch_fw_bw, epoch_time]
PRE_FILTERS: {done: [1, 1, 1]}
78 changes: 78 additions & 0 deletions configs/sweeps/cifar/cifar_regnet.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
DESC:
Example CIFAR sweep 2 of 3 (uses lr and wd found by cifar_optim sweep).
This sweep searches for a good RegNet-125MF model on cifar (same flops as R56).
The purpose of this sweep is to show how to optimize REGNET parameters.
NAME: cifar/cifar_regnet
SETUP:
# Number of configs to sample
NUM_CONFIGS: 32
# SAMPLER for RegNet
SAMPLERS:
REGNET:
TYPE: regnet_sampler
DEPTH: [6, 16]
GROUP_W: [1, 32]
# CONSTRAINTS for complexity (roughly based on R-56)
CONSTRAINTS:
CX:
FLOPS: [0.12e+9, 0.13e+9]
PARAMS: [0, 2.0e+6]
ACTS: [0, 1.0e+6]
REGNET:
NUM_STAGES: [2, 2]
# BASE_CFG is R-56 with large batch size and stronger augmentation
BASE_CFG:
MODEL:
TYPE: regnet
NUM_CLASSES: 10
REGNET:
STEM_TYPE: res_stem_cifar
SE_ON: True
STEM_W: 16
OPTIM:
BASE_LR: 1.0
LR_POLICY: cos
MAX_EPOCH: 50
MOMENTUM: 0.9
NESTEROV: True
WARMUP_EPOCHS: 5
WEIGHT_DECAY: 0.0005
EMA_ALPHA: 0.00025
EMA_UPDATE_PERIOD: 32
BN:
USE_CUSTOM_WEIGHT_DECAY: True
TRAIN:
DATASET: cifar10
SPLIT: train
BATCH_SIZE: 1024
IM_SIZE: 32
MIXED_PRECISION: True
LABEL_SMOOTHING: 0.1
MIXUP_ALPHA: 0.5
TEST:
DATASET: cifar10
SPLIT: test
BATCH_SIZE: 1000
IM_SIZE: 32
NUM_GPUS: 1
DATA_LOADER:
NUM_WORKERS: 4
LOG_PERIOD: 25
VERBOSE: False
# Launch config options
LAUNCH:
PARTITION: devlab
NUM_GPUS: 1
PARALLEL_JOBS: 32
TIME_LIMIT: 60
# Analyze config options
ANALYZE:
PLOT_METRIC_VALUES: True
PLOT_COMPLEXITY_VALUES: True
PLOT_CURVES_BEST: 3
PLOT_CURVES_WORST: 0
PLOT_MODELS_BEST: 8
PLOT_MODELS_WORST: 0
METRICS: [regnet_depth, regnet_w0, regnet_wa, regnet_wm, regnet_gw]
COMPLEXITY: [flops, params, acts, memory, epoch_fw_bw, epoch_time]
PRE_FILTERS: {done: [0, 1, 1]}
8 changes: 4 additions & 4 deletions docs/DATA.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,14 +36,14 @@ Create a directory containing symlinks:
mkdir -p /path/pycls/pycls/datasets/data
```

Symlink ImageNet:
Symlink ImageNet (`/datasets01/imagenet_full_size/061417/` on FAIR cluster):

```
ln -s /path/imagenet /path/pycls/pycls/datasets/data/imagenet
ln -sv /path/imagenet /path/pycls/pycls/datasets/data/imagenet
```

Symlink CIFAR-10:
Symlink CIFAR-10 (`/datasets01/cifar-10-batches-py/060817/` on FAIR cluster):

```
ln -s /path/cifar10 /path/pycls/pycls/datasets/data/cifar10
ln -sv /path/cifar10 /path/pycls/pycls/datasets/data/cifar10
```
2 changes: 1 addition & 1 deletion docs/GETTING_STARTED.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ python tools/time_net.py
PREC_TIME.NUM_ITER 50
```

### MODEL SCALING
### Model Scaling

Scale a RegNetY-4GF by 4x using fast compound scaling (see https://arxiv.org/abs/2103.06877):

Expand Down
Loading

0 comments on commit 2d71381

Please sign in to comment.