[Numpy] Benchmark the backbone models + Some fixes + Always use pytho…

…n3 + Fix conversion tool (#1292) * update update Update benchmark_hf.py Update benchmark_hf.py Update benchmark_hf.py Create requirements.txt Update benchmark_hf.py Update benchmark_hf.py Update benchmark_hf.py Update benchmark_hf.py Update benchmark_hf.py Update benchmark_hf.py Update benchmark_hf.py Update benchmark_hf.py Update benchmark_hf.py Update benchmark_hf.py Update benchmark_hf.py Update benchmark_hf.py Update benchmark_hf.py Update requirements.txt update Update README.md Update benchmark_hf.py Update benchmark_hf.py Update benchmark_hf.py Update benchmark_hf.py Update benchmark_hf.py Update benchmark_hf.py fix fix Update test_models_bart.py Update test_models_bart.py Update bart.py update Update __init__.py Update electra.py update update Update convert_bert_from_tf_hub.sh update Update unittests.yml fix conversion update fix bert conversion update fix fix Update __init__.py fix bug fix css Update benchmark_utils.py Update benchmark_utils.py update update Update misc.py Update benchmark_utils.py Update benchmark_utils.py Update benchmark_utils.py Update benchmark_utils.py Update benchmark_utils.py Update benchmark_utils.py Update benchmark_utils.py Update benchmark_utils.py no multiprocessing Update benchmark_utils.py Update benchmark_utils.py Update benchmark_utils.py fix bug Update benchmark_utils.py Update benchmark_utils.py try to use mxnet profiler Update benchmark_utils.py Update benchmark_utils.py Update benchmark_utils.py Update benchmark_utils.py Update benchmark_utils.py Update benchmark_utils.py fix update Update benchmark_utils.py Update benchmark_utils.py Update benchmark_utils.py Update benchmark_utils.py fix Update benchmark_utils.py Update bart.py Update bart.py fix fix Update benchmark_utils.py Update benchmark_utils.py Update benchmark_utils.py Update benchmark_utils.py Update benchmark_gluonnlp.py Update benchmark_gluonnlp.py Update benchmark_gluonnlp.py Update benchmark_utils.py Update benchmark_utils.py Update benchmark_utils.py Update README.md * Update benchmark_utils.py * Update benchmark_utils.py * Update requirements.txt * Update benchmark_utils.py * Update benchmark_utils.py * Update benchmark_utils.py * Update benchmark_utils.py * Update benchmark_utils.py * Update benchmark_utils.py * debug * Update benchmark_utils.py * Update benchmark_gluonnlp.py * Update benchmark_gluonnlp.py * Update benchmark_utils.py * Update pretraining_utils.py * Update benchmark_utils.py * update * Update benchmark_utils.py * Update benchmark_utils.py * fix convert * tiny fix * python3 * fix * lower tolerance for albert large and xlarge * Update benchmark_utils.py * fix xlmr * lower tolerance for albert large * update * Update benchmark_utils.py * Update benchmark_utils.py * Update benchmark_utils.py * Update benchmark_utils.py * fix * Squashed commit of the following: commit bd05969 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Tue Aug 11 23:44:53 2020 +0800 lower tolerance for albert large commit f0f9cd6 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Tue Aug 11 14:59:06 2020 +0800 fix xlmr commit edd6655 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Tue Aug 11 14:49:36 2020 +0800 lower tolerance for albert large and xlarge commit d651730 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Tue Aug 11 14:34:55 2020 +0800 fix commit e097c3b Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Tue Aug 11 14:02:13 2020 +0800 python3 commit d6f3fc4 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Tue Aug 11 14:00:28 2020 +0800 tiny fix commit 93bd659 Author: ZheyuYe <zheyu.ye1995@gmail.com> Date: Tue Aug 11 13:08:34 2020 +0800 fix convert commit 9238d56 Author: Xingjian Shi <xshiab@connect.ust.hk> Date: Mon Aug 10 21:03:13 2020 -0700 Update benchmark_utils.py commit 9bbc581 Author: Xingjian Shi <xshiab@connect.ust.hk> Date: Mon Aug 10 12:58:04 2020 -0700 Update benchmark_utils.py commit b1f5955 Author: Xingjian Shi <xshiab@connect.ust.hk> Date: Mon Aug 10 11:18:43 2020 -0700 update commit a43e65b Author: Xingjian Shi <xshiab@connect.ust.hk> Date: Mon Aug 10 10:32:55 2020 -0700 Update benchmark_utils.py commit 13db82f Author: Xingjian Shi <xshiab@connect.ust.hk> Date: Mon Aug 10 10:16:46 2020 -0700 Update pretraining_utils.py commit fdd9df5 Author: Xingjian Shi <xshiab@connect.ust.hk> Date: Mon Aug 10 08:49:17 2020 -0700 Update benchmark_utils.py commit 44f9c8b Author: Xingjian Shi <xshiab@connect.ust.hk> Date: Mon Aug 10 05:07:45 2020 -0700 Update benchmark_gluonnlp.py commit 45c58b6 Author: Xingjian Shi <xshiab@connect.ust.hk> Date: Mon Aug 10 05:06:05 2020 -0700 Update benchmark_gluonnlp.py commit f0ae933 Author: Xingjian Shi <xshiab@connect.ust.hk> Date: Mon Aug 10 05:04:41 2020 -0700 Update benchmark_utils.py commit 9735edb Author: Xingjian Shi <xshiab@connect.ust.hk> Date: Mon Aug 10 04:59:58 2020 -0700 debug commit d9daf58 Author: Xingjian Shi <xshiab@connect.ust.hk> Date: Mon Aug 10 04:57:17 2020 -0700 Update benchmark_utils.py commit 9e0f631 Author: Xingjian Shi <xshiab@connect.ust.hk> Date: Mon Aug 10 04:56:52 2020 -0700 Update benchmark_utils.py commit 37f224f Author: Xingjian Shi <xshiab@connect.ust.hk> Date: Mon Aug 10 04:56:06 2020 -0700 Update benchmark_utils.py commit 1cf5c7b Author: Xingjian Shi <xshiab@connect.ust.hk> Date: Mon Aug 10 04:54:34 2020 -0700 Update benchmark_utils.py commit 15272f1 Author: Xingjian Shi <xshiab@connect.ust.hk> Date: Mon Aug 10 04:49:28 2020 -0700 Update benchmark_utils.py commit 8215df6 Author: Xingjian Shi <xshiab@connect.ust.hk> Date: Mon Aug 10 04:48:20 2020 -0700 Update benchmark_utils.py commit 1451f03 Author: Xingjian Shi <xshiab@connect.ust.hk> Date: Mon Aug 10 04:42:21 2020 -0700 Update requirements.txt commit 626739d Author: Xingjian Shi <xshiab@connect.ust.hk> Date: Mon Aug 10 04:38:54 2020 -0700 Update benchmark_utils.py commit 1955197 Author: Xingjian Shi <xshiab@connect.ust.hk> Date: Mon Aug 10 04:31:30 2020 -0700 Update benchmark_utils.py commit 2fd7e3b Author: Xingjian Shi <xshiab@connect.ust.hk> Date: Thu Aug 6 23:56:49 2020 -0700 update update Update benchmark_hf.py Update benchmark_hf.py Update benchmark_hf.py Create requirements.txt Update benchmark_hf.py Update benchmark_hf.py Update benchmark_hf.py Update benchmark_hf.py Update benchmark_hf.py Update benchmark_hf.py Update benchmark_hf.py Update benchmark_hf.py Update benchmark_hf.py Update benchmark_hf.py Update benchmark_hf.py Update benchmark_hf.py Update benchmark_hf.py Update requirements.txt update Update README.md Update benchmark_hf.py Update benchmark_hf.py Update benchmark_hf.py Update benchmark_hf.py Update benchmark_hf.py Update benchmark_hf.py fix fix Update test_models_bart.py Update test_models_bart.py Update bart.py update Update __init__.py Update electra.py update update Update convert_bert_from_tf_hub.sh update Update unittests.yml fix conversion update fix bert conversion update fix fix Update __init__.py fix bug fix css Update benchmark_utils.py Update benchmark_utils.py update update Update misc.py Update benchmark_utils.py Update benchmark_utils.py Update benchmark_utils.py Update benchmark_utils.py Update benchmark_utils.py Update benchmark_utils.py Update benchmark_utils.py Update benchmark_utils.py no multiprocessing Update benchmark_utils.py Update benchmark_utils.py Update benchmark_utils.py fix bug Update benchmark_utils.py Update benchmark_utils.py try to use mxnet profiler Update benchmark_utils.py Update benchmark_utils.py Update benchmark_utils.py Update benchmark_utils.py Update benchmark_utils.py Update benchmark_utils.py fix update Update benchmark_utils.py Update benchmark_utils.py Update benchmark_utils.py Update benchmark_utils.py fix Update benchmark_utils.py Update bart.py Update bart.py fix fix Update benchmark_utils.py Update benchmark_utils.py Update benchmark_utils.py Update benchmark_utils.py Update benchmark_gluonnlp.py Update benchmark_gluonnlp.py Update benchmark_gluonnlp.py Update benchmark_utils.py Update benchmark_utils.py Update benchmark_utils.py Update README.md * fix squad * fix typo * Update benchmark_utils.py * Update benchmark_hf.py * Update benchmark_gluonnlp.py * Update benchmark_gluonnlp.py * Update benchmark_gluonnlp.py * Update benchmark_utils.py * Update benchmark_gluonnlp.py * update * Update benchmark_gluonnlp.py * Update benchmark_gluonnlp.py * Update benchmark_gluonnlp.py * Update benchmark_gluonnlp.py * Update README.md * update * Update benchmark_hf.py * Update benchmark_hf.py * Update requirements.txt * Update benchmark_hf.py * Delete conversion_tool_test.yml * Update README.md * Update README.md * Update README.md * move python --> python3 * try to fix test * fix test case * add test cases * Update README.md * update * update logging config * fix logging config Co-authored-by: ZheyuYe <zheyu.ye1995@gmail.com>
dmlc · Aug 14, 2020 · 32e87d4 · 32e87d4
1 parent 9e268c0
commit 32e87d4
Show file tree

Hide file tree

Showing 46 changed files with 2,008 additions and 313 deletions.
diff --git a/.github/workflows/unittests.yml b/.github/workflows/unittests.yml
@@ -35,7 +35,7 @@ jobs:
           python -m pip install --user --upgrade pip
           python -m pip install --user setuptools pytest pytest-cov contextvars
           python -m pip install --upgrade cython
-          python -m pip install --pre --user "mxnet>=2.0.0b20200716" -f https://dist.mxnet.io/python
+          python -m pip install --pre --user "mxnet>=2.0.0b20200802" -f https://dist.mxnet.io/python
           python -m pip install --user -e .[extras]
       - name: Test project
         run: |

diff --git a/README.md b/README.md
@@ -20,35 +20,32 @@ First of all, install the latest MXNet. You may use the following commands:
 
 ```bash
 # Install the version with CUDA 10.0
-pip install -U --pre "mxnet-cu100>=2.0.0b20200802" -f https://dist.mxnet.io/python
+python3 -m pip install -U --pre "mxnet-cu100>=2.0.0b20200802" -f https://dist.mxnet.io/python
 
 # Install the version with CUDA 10.1
-pip install -U --pre "mxnet-cu101>=2.0.0b20200802" -f https://dist.mxnet.io/python
+python3 -m pip install -U --pre "mxnet-cu101>=2.0.0b20200802" -f https://dist.mxnet.io/python
 
 # Install the version with CUDA 10.2
-pip install -U --pre "mxnet-cu102>=2.0.0b20200802" -f https://dist.mxnet.io/python
+python3 -m pip install -U --pre "mxnet-cu102>=2.0.0b20200802" -f https://dist.mxnet.io/python
 
 # Install the cpu-only version
-pip install -U --pre "mxnet>=2.0.0b20200802" -f https://dist.mxnet.io/python
+python3 -m pip install -U --pre "mxnet>=2.0.0b20200802" -f https://dist.mxnet.io/python
 ```
 
 
-To install, use
+To install GluonNLP, use
 
 ```bash
-pip install -U -e .
+python3 -m pip install -U -e .
 
 # Also, you may install all the extra requirements via
-pip install -U -e .[extras]
-
-# In case you are using zsh, try to use the following command for installing
-pip install -U -e ."[extras]"
+python3 -m pip install -U -e ."[extras]"
 ```
 
 If you find that you do not have the permission, you can also install to the user folder:
 
 ```bash
-pip install -U -e . --user
+python3 -m pip install -U -e . --user
 ```
 
 For Windows users, we recommend to use the [Windows Subsystem for Linux](https://docs.microsoft.com/en-us/windows/wsl/about).
@@ -68,8 +65,8 @@ nlp_data help
 nlp_preprocess help
 
 # Also, you can use `python -m` to access the toolkits
-python -m gluonnlp.cli.data help
-python -m gluonnlp.cli.preprocess help
+python3 -m gluonnlp.cli.data help
+python3 -m gluonnlp.cli.preprocess help
 
 ```
 

diff --git a/docs/_static/custom.css b/docs/_static/custom.css
@@ -20,9 +20,11 @@
 }
 
 @media (max-width: 650px) {
-.install .option, .install .title {
-    width: 90%;
-}
-.install .title {
-    margin-top: 1em;
+    .install .option, .install .title {
+        width: 90%;
+    }
+
+    .install .title {
+        margin-top: 1em;
+    }
 }
diff --git a/scripts/benchmarks/README.md b/scripts/benchmarks/README.md
@@ -0,0 +1,45 @@
+# Benchmarking the Performance of NLP Backbones
+
+We benchmark the latency and peak memory usage of a single training (forward + backward) and inference (forward-only) step 
+of the NLP backbones.
+For comparison, we also provide the numbers of the models in huggingface.
+
+## Backbones in HuggingFace
+
+We use the [huggingface benchmark](https://github.com/huggingface/transformers/tree/master/examples/benchmarking) 
+to benchmark the training + inference speed of common workloads in NLP. 
+
+```bash
+python3 -m pip install -U -r requirements.txt --user
+python3 benchmark_hf.py
+```
+
+It will generate a list of csv files:
+
+```
+├── pytorch_train_fp32.csv
+├── pytorch_train_fp16.csv
+├── pytorch_infer_fp32.csv
+├── pytorch_infer_fp16.csv
+├── pytorch_infer_fp32_ts.csv
+```
+
+## GluonNLP Backbones based on MXNet-2.0
+
+We profile three options: `NT` layout, `NT` layout with `TN` layout as the compute layout,
+and `TN` layout.
+
+```bash
+python3 -m pip install -U -r requirements.txt --user
+bash benchmark_gluonnlp.sh
+```
+
+It will generate csv files with `gluonnlp_` as the prefix
+```
+├── gluonnlp_train_fp32_NT_NT.csv
+├── gluonnlp_train_fp32_NT_TN.csv
+├── gluonnlp_train_fp32_TN_TN.csv
+├── gluonnlp_infer_fp32_NT_NT.csv
+├── gluonnlp_infer_fp32_NT_TN.csv
+├── gluonnlp_infer_fp32_TN_TN.csv
+```
diff --git a/scripts/benchmarks/benchmark_gluonnlp.py b/scripts/benchmarks/benchmark_gluonnlp.py
@@ -0,0 +1,130 @@
+import mxnet as mx
+import argparse
+import os
+import pandas as pd
+from benchmark_utils import GluonNLPBackboneBenchmark
+import multiprocessing as mp
+from multiprocessing import Process
+mx.npx.set_np()
+
+
+MODELS = [
+    'google_en_uncased_bert_base',
+    'google_en_uncased_bert_large',
+    'google_albert_base_v2',
+    'google_albert_large_v2',
+    'google_albert_xlarge_v2',
+    'google_albert_xxlarge_v2',
+    'google_electra_small',
+    'google_electra_base',
+    'google_electra_large',
+    'google_uncased_mobilebert',
+    'fairseq_bart_base',
+    'fairseq_bart_large'
+]
+
+# (batch_size, seq_length)
+train_workloads =\
+    [(4, 128),
+     (8, 128),
+     (16, 128),
+     (32, 128),
+     (1, 512),
+     (2, 512),
+     (4, 512),
+     (8, 512)]
+
+
+inference_workloads = [
+    (1, 128),
+    (1, 384),
+    (1, 512),
+    (8, 32),
+    (8, 128),
+    (8, 512),
+    (32, 512),
+    (256, 128),
+    (400, 100),
+]
+
+
+def get_parser():
+    parser = argparse.ArgumentParser(description='Process some integers.')
+    parser.add_argument('--layout', type=str, default='NT',
+                        help='The layout of the computation')
+    parser.add_argument('--compute_layout', type=str, default=None,
+                        help='The compute layout of the computation')
+    parser.add_argument('--mode', type=str, default='train',
+                        choices=['train', 'inference'])
+    return parser
+
+
+def run_benchmark(workload, model_name, out_file_name, is_train):
+    if is_train:
+        benchmark = GluonNLPBackboneBenchmark(
+            workloads=workload,
+            model_names=model_name,
+            profile_inference=False,
+            profile_train=True,
+            to_csv=True,
+            train_out_csv_file=out_file_name)
+        benchmark.run()
+    else:
+        benchmark = GluonNLPBackboneBenchmark(
+            workloads=workload,
+            model_names=model_name,
+            profile_inference=True,
+            profile_train=False,
+            to_csv=True,
+            inference_out_csv_file=out_file_name)
+        benchmark.run()
+    return
+
+
+if __name__ == '__main__':
+    mp.set_start_method('spawn')
+    parser = get_parser()
+    args = parser.parse_args()
+    if args.compute_layout is None:
+        args.compute_layout = args.layout
+    for layout, compute_layout in [(args.layout, args.compute_layout)]:
+        if compute_layout != layout:
+            profile_models = [ele for ele in MODELS if 'bart' not in ele]
+        else:
+            profile_models = [ele for ele in MODELS]
+        if args.mode == 'inference':
+            out_dir = 'infer_fp32_{}_{}'.format(layout, compute_layout)
+            df = pd.DataFrame(columns=['model', 'batch_size', 'sequence_length',
+                                       'latency', 'memory'])
+            os.makedirs(out_dir, exist_ok=True)
+            for model_name in profile_models:
+                for workload in inference_workloads:
+                    out_path = os.path.join(out_dir, '{}_{}_{}.csv'.format(model_name, workload[0],
+                                                                           workload[1]))
+                    process = Process(
+                        target=run_benchmark,
+                        args=(workload, model_name, out_path, False))
+                    process.start()
+                    process.join()
+                    new_df = pd.read_csv(out_path)
+                    df = df.append(new_df, ignore_index=True)
+                    df.to_csv('gluonnlp_infer_fp32_{}_{}.csv'.format(layout, compute_layout))
+        elif args.mode == 'train':
+            out_dir = 'train_fp32_{}_{}'.format(layout, compute_layout)
+            df = pd.DataFrame(columns=['model', 'batch_size', 'sequence_length',
+                                       'latency', 'memory'])
+            os.makedirs(out_dir, exist_ok=True)
+            for model_name in profile_models:
+                for workload in train_workloads:
+                    out_path = os.path.join(out_dir, '{}_{}_{}.csv'.format(model_name, workload[0],
+                                                                           workload[1]))
+                    process = Process(
+                        target=run_benchmark,
+                        args=(workload, model_name, out_path, True))
+                    process.start()
+                    process.join()
+                    new_df = pd.read_csv(out_path)
+                    df = df.append(new_df, ignore_index=True)
+                    df.to_csv('gluonnlp_train_fp32_{}_{}.csv'.format(layout, compute_layout))
+        else:
+            raise NotImplementedError
diff --git a/scripts/benchmarks/benchmark_gluonnlp.sh b/scripts/benchmarks/benchmark_gluonnlp.sh
@@ -0,0 +1,14 @@
+for mode in train inference
+do
+  python3 benchmark_gluonnlp.py --layout NT --compute_layout NT --mode $mode
+done
+
+for mode in train inference
+do
+  python3 benchmark_gluonnlp.py --layout NT --compute_layout TN --mode $mode
+done
+
+for mode in train inference
+do
+  python3 benchmark_gluonnlp.py --layout TN --compute_layout TN --mode $mode
+done