Skip to content
This repository has been archived by the owner on Dec 3, 2024. It is now read-only.

Latest commit

 

History

History
243 lines (183 loc) · 15.5 KB

README.md

File metadata and controls

243 lines (183 loc) · 15.5 KB

DyNAS-T

Caution

PROJECT NOT UNDER ACTIVE MANAGEMENT

  • This project will no longer be maintained by Intel.
  • Intel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project.
  • Intel no longer accepts patches to this project.
  • If you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the open source software community, please create your own fork of this project.

DyNAS-T (Dynamic Neural Architecture Search Toolkit) is a super-network neural architecture search NAS optimization package designed for efficiently discovering optimal deep neural network (DNN) architectures for a variety of performance objectives such as accuracy, latency, multiply-and-accumulates, and model size.

Background

Neural architecture search, the study of automating the discovery of optimal deep neural network architectures for tasks in domains such as computer vision and natural language processing, has seen rapid growth in the machine learning research community. The computational overhead of evaluating DNN architectures during the neural architecture search process can be very costly due to the training and validation cycles. To address the training overhead, novel weight-sharing approaches known as one-shot or super-networks [1] have offered a way to mitigate the training overhead by reducing training times from thousands to a few GPU days. These approaches train a task-specific super-network architecture with a weight-sharing mechanism that allows the sub-networks to be treated as unique individual architectures. This enables sub-network model extraction and validation without a separate training cycle.

To learn more about super-networks and how to define/train them, please see our super-network tutorial.

Algorithms

Evolutionary algorithms, specifically genetic algorithms, have a history of usage in NAS and continue to gain popularity as a highly efficient way to explore the architecture objective space. DyNAS-T supports a wide range of evolutionary algorithms (EAs) such as NSGA-II [2] by leveraging the pymoo library.

A unique capability of DyNAS-T is the Lightweight Iterative NAS (LINAS) that pairs evolutionary algorithms with lightly trained objective predictors in an iterative cycle to accelerate architectural exploration [3]. This technique is ~4x more sample efficient than typical one-shot predictor-based NAS approaches.

DyNAS-T Design Flow

The following number of optimization algorithms are supported by DyNAS-T in both standard and LINAS formats.

1 Objective
(Single-Objective)
2 Objectives
(Multi-Objective)
3 Objectives
(Many-Objective)
GA* 'ga' NSGA-II* 'nsga2' UNSGA-II* 'unsga3'
CMA-ES 'cmaes' AGE-MOEA 'age' CTAEA 'ctaea'
MOEAD 'moead'
*Recommended for stability of search results

Super-networks

DyNAS-T included support for the following super-network frameworks suchs as Once-for-All (OFA).

Super-Network Model Name Dataset Objectives/Measurements Supported
OFA MobileNetV3-w1.0 ofa_mbv3_d234_e346_k357_w1.0 ImageNet 1K accuracy_top1, macs, params, latency
OFA MobileNetV3-w1.2 ofa_mbv3_d234_e346_k357_w1.2 ImageNet 1K accuracy_top1, macs, params, latency
OFA ResNet50 ofa_resnet50 ImageNet 1K accuracy_top1, macs, params, latency
Quantization-aware OFA ResNet50 inc_quantization_ofa_resnet50 ImageNet 1K accuracy_top1, model_size, params, latency
OFA ProxylessNAS ofa_proxyless_d234_e346_k357_w1.3 ImageNet 1K accuracy_top1, macs, params, latency
TransformerLT transformer_lt_wmt_en_de WMT En-De bleu (BLEU Score), macs, params, latency
BERT-SST2 bert_base_sst2 SST2 latency, macs, params, accuracy_sst2
Quantization-aware BERT-SST2 bert_base_sst2_quantized SST2 latency, model_size, accuracy_sst2
BootstrapNAS - - accuracy_top1, macs, params, latency
Vision Transformer vit_base_imagenet ImageNet 1K accuracy_top1, macs, params, latency

ImageNet: When using any of the OFA super-networks, the ImageNet directory tree should have a separate directory for each of the classes in both train and val sets. To prepare your ImageNet dataset for use with OFA you could follow instructions available here. WMT En-De: To obtain and prepare dataset please follow instructions available here. BootstrapNAS: BootstrapNAS is currently only avaiable through the Python interface. To read more how to use DyNAS-T on BootstrapNAS search space, please refer to the example notebook.

Intel Library Support

The following software libraries are compatible with DyNAS-T:

Getting Started

To setup DyNAS-T from source code run pip install -e . or make a local copy of the dynast subfolder in your local subnetwork repository with the requirements.txt dependencies installed.

You can also install DyNAS-T from PyPI:

pip install dynast

Installing DyNAS-T with pip will make a dynast command available in your CLI.

Running DyNAS-T

The dynast/cli.py (you can use dynast command to invoke this script) template provide a starting point for running the NAS process. An evaluation is the process of determining the fitness of an architectural candidate. A validation evaluation is the costly process of running the full validation set. A predictor evaluation uses a pre-trained performance predictor.

  • supernet - Name of the pre-trained super-network. See list of supported super-networks. For a custom super-network, you will have to modify the code including the dynast_manager.py and supernetwork_registry.py files.
  • optimization_metrics - These are the metrics that the NAS process optimizes for. Note that the number of objectives you specify must be compatible with the supporting algorithm.
  • measurements - In addition to the optimization metrics, you can specify which measurements you would like to take during an full evaluation.
  • search_tactic - linas Lightweight iterative NAS (recommended) or evolutionary (good for benchmarking and testing new super-networks).
  • search_algo - Determines which evolutionary algorithm to run for the linas low-fidelity inner loop or the evolutionary search tactic.
  • num_evals - Number of evaluations (full validation measurements) to take. For example, if 1 validation measurement takes 5 minutes, 120 evaluations would take 10 hours.
  • seed - Random seed.
  • population - The size of the pool of candidates for each evolutionary generation. 50 is recommended for most cases, though this can be treated as a tunable hyperparameter.
  • results_path - The location of the csv file that store information of the DNN candidates during the search process. The csv file is used for plotting NAS results.
  • dataset_path - Location of the dataset used for training the super-network of interest.

Single-Objective

Example 1a. NAS process for the OFA MobileNetV3-w1.0 super-network that optimizes for ImageNet Top-1 accuracy using a simple evolutionary genetic algorithm (GA) approach.

dynast \
    --supernet ofa_mbv3_d234_e346_k357_w1.0 \
    --optimization_metrics accuracy_top1 \
    --measurements accuracy_top1 macs params \
    --results_path mbnv3w10_ga_acc.csv \
    --search_tactic evolutionary \
    --num_evals 250 \
    --search_algo ga

Example 1b. NAS process for the OFA MobileNetV3-w1.2 super-network that optimizes for ImageNet Top-1 accuracy using a LINAS + GA approach.

dynast \
    --supernet ofa_mbv3_d234_e346_k357_w1.2 \
    --optimization_metrics accuracy_top1 \
    --measurements accuracy_top1 macs params \
    --results_path mbnv3w12_linasga_acc.csv \
    --search_tactic linas \
    --num_evals 250 \
    --search_algo ga

Multi-Objective

Example 2a. NAS process for the OFA MobileNetV3-w1.0 super-network that optimizes for ImageNet Top-1 accuracy and multiply-and-accumulates (MACs) using a LINAS+NSGA-II approach.

dynast \
    --supernet ofa_mbv3_d234_e346_k357_w1.0 \
    --optimization_metrics accuracy_top1 macs \
    --measurements accuracy_top1 macs params \
    --results_path mbnv3w10_linasnsga2_acc_macs.csv \
    --search_tactic evolutionary \
    --num_evals 250 \
    --search_algo nsga2

Example 2b. NAS process for the OFA ResNet50 super-network that optimizes for ImageNet Top-1 accuracy and model size (parameters) using a evolutionary AGE-MOEA approach.

dynast \
    --supernet ofa_resnet50 \
    --optimization_metrics accuracy_top1 params \
    --measurements accuracy_top1 macs params \
    --results_path resnet50_age_acc_params.csv \
    --search_tactic evolutionary \
    --num_evals 500 \
    --search_algo age

Many-Objective

Example 3a. NAS process for the OFA ResNet50 super-network that optimizes for ImageNet Top-1 accuracy and model size (parameters) and multiply-and-accumulates (MACs) using a evolutionary unsga3 approach.

dynast \
    --supernet ofa_resnet50 \
    --optimization_metrics accuracy_top1 macs params \
    --measurements accuracy_top1 macs params \
    --results_path resnet50_linasunsga3_acc_macs_params.csv \
    --search_tactic evolutionary \
    --num_evals 500 \
    --search_algo unsga3

Example 3b. NAS process for the OFA MobileNetV3-w1.0 super-network that optimizes for ImageNet Top-1 accuracy and model size (parameters) and multiply-and-accumulates (MACs) using a linas+unsga3 approach.

dynast \
    --supernet ofa_mbv3_d234_e346_k357_w1.0 \
    --optimization_metrics accuracy_top1 macs params \
    --measurements accuracy_top1 macs params \
    --results_path mbnv3w10_linasunsga3_acc_macs_params.csv \
    --search_tactic linas \
    --num_evals 500 \
    --search_algo unsga3

An example of the search results for a Multi-Objective search using both LINAS+NSGA-II and standard NSGA-II algorithms will yield results in the following format. DyNAS-T Results

Quantization-aware Search

This approach allows you to run search on your FP32 super-network and find optimal model configurations w.r.t. both architecture and Post-Training Quantization policy. DyNAS-T's implementation uses Intel® Neural Compressor as an underlying backend for quantizing models. This search approach is specific to the CPU, and so --device=cpu has to be used.

Example 4. Quantization-aware search on OFA ResNet50 super-network.

dynast \
        --results_path dynast_ofaresnet50_quant.csv \
        --dataset_path /ML_datasets/imagenet/ilsvrc12_raw \
        --supernet inc_quantization_ofa_resnet50 \
        --device cpu \
        --batch_size 128 \
        --search_tactic linas \
        --measurements latency accuracy_top1 \
        --optimization_metrics latency accuracy_top1 \
        --seed 42

Distributed Search

Search can be performed with multiple workers using the MPI / torch.distributed library. To use this functionality, your script should be called with mpirun/mpiexec command and an additional --distributed param has to be set (DyNAS([...], distributed=True).

Note: When run with torchrun, unless explicitly specified, torch.distributed uses OMP_NUM_THREADS=1 (link) which may result in slow evaluation time. Good practice is to explicitly set OMP_NUM_THREADS to (total_core_count)/(num_workers) (optional for MPI).

Example 5. Distributed NAS process with two OpenMPI workers for the OFA MobileNetV3-w1.0 super-network that optimizes for ImageNet Top-1 accuracy and model size (parameters)

OMP_NUM_THREADS=28 mpirun \
    --report-bindings \
    -x MASTER_ADDR=127.0.0.1 \
    -x MASTER_PORT=1234 \
    -np 2 \
    -bind-to socket \
    -map-by socket \
    dynast \
        --supernet ofa_mbv3_d234_e346_k357_w1.0 \
         --optimization_metrics accuracy_top1 macs \
        --results_path results.csv \
        --search_tactic linas \
        --distributed \
        --population 50 \
        --num_evals 250

References

[1] Cai, H., Gan, C., & Han, S. (2020). Once for All: Train One Network and Specialize it for Efficient Deployment. ArXiv, abs/1908.09791.

[2] K. Deb, A. Pratap, S. Agarwal and T. Meyarivan, "A fast and elitist multiobjective genetic algorithm: NSGA-II," in IEEE Transactions on Evolutionary Computation, vol. 6, no. 2, pp. 182-197, April 2002, doi: 10.1109/4235.996017.

[3] Cummings, D., Sarah, A., Sridhar, S.N., Szankin, M., Muñoz, J.P., & Sundaresan, S. (2022). A Hardware-Aware Framework for Accelerating Neural Architecture Search Across Modalities. ArXiv, abs/2205.10358.

Legal Disclaimer and Notices

This “research quality code” is for Non-Commercial purposes provided by Intel “As Is” without any express or implied warranty of any kind. Please see the dataset's applicable license for terms and conditions. Intel does not own the rights to this data set and does not confer any rights to it. Intel does not warrant or assume responsibility for the accuracy or completeness of any information, text, graphics, links or other items within the code. A thorough security review has not been performed on this code. ImageNet, WMT, SST2: Please see the dataset's applicable license for terms and conditions. Intel does not own the rights to this data set and does not confer any rights to it.