Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify tf bert-base model with frozen graph #122

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

wangkl2
Copy link
Member

@wangkl2 wangkl2 commented Nov 28, 2022

  • Modify the instruction for eval and infer, separately
  • Enable padding for dGPU if the number of examples not a multiple of the batch size, to avoid performance degradation on dynamic shape.
  • Add perf calculation for each step.
  • Add the link to bert-base model with frozen graph in "language modelling" section in README. There is also another bert base inference link in "language translation" section but the performance with checkpoint is not as good as frozen graph.

WafaaT and others added 8 commits September 19, 2022 09:28
* revert bf16 changes (#488)

* Add partials and spec yml for the end2end DLSA pipeline (#460)

* Add partials and specs for the end2end DLSA pipeline

* Add missing end line

* Update name to include ipex

* update specs to have use the public image as a base on one and SPR for the other

* Dockerfile updates for the updated DLSA repo

* Update pip install list

* Rename to public

* Removing partials that aren't used anymore

* Fixes for 'kmp-blocktime' env var (#493)

* Fixes for 'kmp-blocktime' env var

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* update per review feedback

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Add 'kmp-blocktime' for mlperf-gnmt (#494)

* Add 'kmp-blocktime' for mlperf-gnmt

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Remove duplicate parameter definition

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* add sample_input for resnet50 training (#495)

* remove the case when fragment_size not equal args.batch_size (#500)

* Changed the transformer_mlperf fp32 model so that we can fuse the ops… (#389)

* Changed the transformer_mlperf fp32 model so that we can fuse the ops in the model, and also minor changes for python3

* Changed the transformer_mlperf int8 model so that we can fuse the ops in the model, and also minor changes for python3

* SPR updates for WW12, 2022 (#492)

* SPR updates for WW12, 2022

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Update for PyTorch SPR WW2022-12

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Update pytorch base for SPR too

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Stick with specific 'keras-nightly' version

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Updates per code review

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* update maskrcnn training_multinode.sh (#502)

* Fixed a bug in the transformer_mlperf model threads setting (#482)

* Fixed a bug in the transformer_mlperf model threads setting

* Fix failing tests

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

Co-authored-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Added the default threads setting for transformer_mlperf inference in… (#504)

* Added the default threads setting for transformer_mlperf inference in case there is no command line input

* Fix unit tests

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

Co-authored-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* PyTorch Image Classification TL notebook (#490)

* Adds new TL notebook with documentation

* Added newline

* Added to main TL README

* Small fixes

* Updated for review feedback

* Added more models and a download limit arg

* Removed py3.9 requirement and changed default model

* Adds Kitti torchvision dataset to TL notebook (#512)

* Adds Kitti torchvision dataset to TL notebook

* Fixed citations formatting

* update maskrcnn model (#515)

* minor update. (#465)

* Create unit-test github action workflow (#518)

* Create unit-test github action workflow

Tested here: https://github.com/sriester/frameworks.ai.models.intel-models/runs/6089350443?check_suite_focus=true
Runs tox py.test on push.

* Containerize job

* Update unit-test.yml

* Update unit-test.yml

* Update unit-test.yml

* Update unit-test.yml

* Update unit-test.yml

* Update unit-test.yml

* Added login credentials to docker

Trying to fix pull rate issue

* Update unit-test.yml

* Update unit-test.yml

* Update unit-test.yml

Changed pip install command.

* Update unit-test.yml

* Update unit-test.yml

* Update unit-test.yml

Changed docker credentials to imzbot

* Update to Horovod commit 11c1389 to fix TF v2.9 + Horovod install failure (#519)

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* update distilbert model to  4.18 transformers and enable int8 path (#521)

* rnnt: use launcher to set output file path and name (#524)

* Update BareMetalSetup.md (#526)

Always use the latest torchvision

* Reduce memory usage for dlrm acc test (#527)

* updatedistilbert with text_classification (#529)

* add patch for distilbert (#530)

* Update the model-builder dockerfile to use ubuntu 20.04 (#532)

* Add script for coco training dataset processing (#525)

* and update tensorflow ssd-resnet34 training dataset instructions

* update patch (#533)

Co-authored-by: Wang, Chuanqi <chuanqi.wang@intel.com>

* [RNN-T training] Enable FP32 gemm using oneDNN (#531)

* Update the Readme guide for distilbert (#534)

* Update the Readme guide for distilbert

* Fix accuracy grep bug, and grep accuracy for distilbert

Co-authored-by: Weizhuo Zhang <weizhuo.zhang@intel.com>

* Update end2end public dockerfile to look for IPEX in the conda directory (#535)

* Notebook to script conversion example (#516)

* Add notebook script conversion example

* Fixed doc

* Replaces custom preprocessor with built-in one

* Changed tag to remove_for_custom_dataset

* Add URL check prior to calling urlretrieve (#538)

* Add URL check prior to calling urlretrieve

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix a typo

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* disable for ssd since fused cat cat kernel is slow (#537)

* fix bug when adding steps in rnnt inference (#528)

* Fix and updates for TensorFlow WW18-2022 SPR (#542)

* Fix and updates for TensorFlow WW18-2022 SPR

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix TensorFlow SPR nightly versions

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Update pre-trained models download URLs

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Intall Python 3.8 development tools

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix OpenMPI install and setup

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Update to Horovod commit 11c1389 to fix TF v2.9 + Horovod install failure (#519)

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix Horovod Installaion for SPR and CentOS

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix Python3.8 version for CentOS

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix a typo in TensorFlow 3d-unet partial

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix a broken partial

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Add TCMalloc to TF base container for SPR and remove OpenSSL

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Remove some repositories

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Add 'matplotlib' for '3d-unet'

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* switch to build OpenMPI due to issue in Market Place provided version

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix PYTORCH_WHEEL and IPEX_WHEEL arg values

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix and updates for PyTorch WW14-2022 SPR (#543)

* Fix and updates for PyTorch WW14-2022 SPR

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix and updates for TensorFlow WW18-2022 SPR

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix TensorFlow SPR nightly versions

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Update pre-trained models download URLs

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Intall Python 3.8 development tools

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix OpenMPI install and setup

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Update to Horovod commit 11c1389 to fix TF v2.9 + Horovod install failure (#519)

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix Horovod Installaion for SPR and CentOS

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix Python3.8 version for CentOS

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix a typo in TensorFlow 3d-unet partial

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix a broken partial

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Add TCMalloc to TF base container for SPR and remove OpenSSL

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Updates required to the base image

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Remove some repositories

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Add 'matplotlib' for '3d-unet'

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* switch to build OpenMPI due to issue in Market Place provided version

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix PYTORCH_WHEEL and IPEX_WHEEL arg values

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix PYT resnet50 quickstart scripts for both Linux and Windows (#547)

* fix quickstart scripts, detect platform type, update to run with pytorch only

* Fix SPR PyTorch MaskRCNN inference documentation for CHECKPOINT_DIR (#548)

* Enable bert large multi stream inference (#554)

* test bert multi stream module

* enable input split and output concat for accuracy run

* change the default num_streams batchsize cores to 56

* change ssd multi stream throughput to 1 core 1 batch

* change the default parameter for rn50 ssd multi stream module

* modify enable_ipex_for_squad.diff to align new multistream hint implementation

* enable warmup and multi socket support

* change default parameter for rn50 ssd multi stream inference

* Add train-no-eval for rn50 pytorch (#555)

* PyTorch SPR BERT large training updates (h5py and dataset instructions) and update LD_PRELOAD for SPR entrypoints (#550)

* Add h5py install to bert training dockerfile

* documentation updates

* update docs, and add input_preprocessing to the wrapper package

* Update LD_PRELOAD trailing :

* Fix syntax

* removing unnecessary change

* Update DLRM entrypoint

* Update docs to note that phase2 has bert_config.json in the CHECKPOINT_DIR

* Fix syntax

* increase shm-size to 10g

* [RNN-T training] Update scripts -- run on 1S (#561)

* Update maskrcnn training script to run on 1s (#562)

* use single node to do ssd-rn34 training (#563)

* Update training.sh (#564)

* Update training.sh (#565)

Use tcmalloc instead of jemalloc

* use single node to do resnet50 training (#568)

* add numactl -C and remove jit warm in main thread (#569)

* Update unit-test.yml (#546)

* Update unit-test.yml

* Update unit-test.yml

* Update unit-test.yml

* Update unit-test.yml

* Update unit-test.yml

* Update unit-test.yml

* Update unit-test.yml

* Update unit-test.yml

* Update unit-test.yml

* Fixed make command, updated pip install.

Fixed make command to run from the root directory. Replaced pip install tox with a pip install -r requirements-tests.txt to install all dependencies for the tests.

* Add tox to test dependencies. 

Added tox to the dependencies so that the Workflow and others may install it with pip install -r requirements-test.txt and be covered for running make lint and make unit-test.

* Update unit-test.yml

Changed 'make unit-test' to 'make unit_test' as that is the actual target defined in the Makefile.

* Update unit-test.yml

Changed apt-get install command.

* re-enable int8 for api change (#579)

* saperate fully convergency test from training test (#581)

Co-authored-by: jianan-gu <jianan.gu@intel.com>

* ssd enable new int8 (#580)

* v1

* enable new int8 method

* Revert "ssd enable new int8 (#580)" (#584)

This reverts commit 9eb3211.

* Revert "re-enable int8 for api change (#579)" (#583)

This reverts commit 0bded92.

* Update training script using 1s (#560)

* Enable checkpoint during training for bert-large (#573)

* minor fix

* Add readme for enabling checkpoint

* update phase1 to enable checkpoint by default

* Update README.md

* Enable ssd bf32 inference training (#589)

* enable ssd bf32 inference

* enable ssd bf32 train

* enable RNN-T bf32 inference (#591)

* Enable bf32 for bert and distilbert for inference (#593)

* enable bf32 distilbert

* enable bert bf32

* Enable RNN-T bf32 training (#594)

* enable maskrcnn bf32 inference and training (#595)

* enable resnet50 and resnext101 bf16 path (#596)

* enable bert bf32 train (#600)

* update resnet int8 path using new int8 api (#603)

* re-enable int8 for api change (#604)

Co-authored-by: jianan-gu <jianan.gu@intel.com>

* Leslie/ssd enable new int8 (#605)

* v1

* enable new int8 method

* update json file

* add rn50 int8 weight sharing

Co-authored-by: Jiang, Xiaofei <xiaofei.jiang@intel.com>

* update ssd training bs to the multily of core numbers (#606)

* enable bf32 for dlrm (#607)

Co-authored-by: jianan-gu <jianan.gu@intel.com>

* Update IPEX new int8 API enabling for distilbert/bert-large (#608)

* enable distilbert

* enable bert

* fix max-ind-range and add memory info (#609)

Co-authored-by: jianan-gu <jianan.gu@intel.com>

* Remove debug code (#610)

* update training steps (#611)

* fix bandit scan fails (#612)

* PYT Image recognition models support on Windows (#549)

* fix all image recognition scripts to run on windows and linux with PYT, and only linux with IPEX

* [RNN-T training] fix bandit scan fails (#614)

* RNN-T inference: fix IMZ Bandit scan fails (#615)

* Update unit-test.yml (#570)

Changed the docker user credential to utilize GitHub Secret.

* MaskRCNN: fix IMZ Bandit scan fails (#623)

* Fix for horovod-related failures in TF nightly runs (#613)

* cpp17 horovod failure fix

* minor debugging changes

* minor fixes - directory name

* cleanup

* addressing reviewer comments

* Minor fix for Horovod install and adding 'tf_slim' for SSD ResNet34 (#624)

* Minor fix for Horovod install and adding 'tf_slim' for SSD ResNet34

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Set 'HOROVOD_WITH_MPI=1' explicitly

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* update GCC version to GCC 9

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Add 'horovodrun --check-build' for sanity check

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* removo force install inside Docker

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* [RNN-T training] Fix ddp sample number issue (#625)

* update BF32 usage (#627)

* resnet50 training: add warm up before collecting time (#628)

* image to bf16 (#629)

* Update end2end DLSA dockerfile due to SPR wheel path update and removing int8 patch (#631)

* Update mlpc path for SPR wheels

* remove patch

* Update Horovod commit id for BareMetal, Docker will be updated next (#630)

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* fix dlrm convergence and change training performance BS to 32K (#633)

Co-authored-by: jianan-gu <jianan.gu@intel.com>

* [RNN-T training] Merge sh files to one (#635)

* update torch-ccl into 1.12 (#636)

* Liangan1/update torch ccl version (#637)

* Update torch_ccl version

* resnet50_distributed_training: don't set MASTER_ADDR by user (#638)

* Update torch_ccl in script (#639)

* Enable offline download distilbert (#632)

* enable offline download distilbert

* add convert

* Update README.md

* add accuracy.py

* add file

* refine download

* refine path

* refine path

* add license

* Update dlrm_s_pytorch.py (#643)

* Update README.md (#649)

* init pytorch T5 language model (#648)

* init pytorch T5 language model

* update README.md

* update doc

* update fpn models (#650)

* pytorch resnet50: directly call ipex.quantization (#653)

* fix int8 accuracy (#655)

Co-authored-by: Zhang, Weizhuo <weizhuo.zhang@intel.com>

* Made fixes to the broken links (#652)

* Made fixes to the broken links

* Changed the ResNet50v1_5 version back to v2_7_0

* Modified the setup AI kit instructions

Co-authored-by: msalopan <msalopan@mlp-prod-skx-99155.ra.intel.com>

* Update Security Center URL (#657)

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Weizhuoz/fix for pt 1.12 (#656)

* fix vgg11_bn accuracy syntax error

* remove exact_match from roberta-base

* modify maskrcnn BS to 2*num_cores

* Update dlrm_s_pytorch.py (#660)

* Update dlrm_s_pytorch.py

Reduce int8 memory usage.

* Update dlrm_s_pytorch.py

* Update dlrm_s_pytorch.py

* Update dlrm_s_pytorch.py

* Update dlrm_s_pytorch.py

* Add BF32 DDP for bert-large (#663)

* Update run_ddp_bert_pretrain_phase1.sh

* Update run_ddp_bert_pretrain_phase2.sh

* Update README.md

* move OMP_NUM_THREADS=1 into dlrm_s_pytorch.py (#664)

minor changes

* remove rn50 ao (#665)

* Re-organize models list to be grouped by framework  (#654)

* re-organize models list to be grouped by framework

* update tensorflow ssd-resnet34 training dataset

* add T5 in benchmark/README.md

* mannuel set torch num threads only for int8 (#666)

* Update inference_performance.sh (#669)

* improve ssdrn34 perf. (#671)

* improve ssdrn34 perf.

* minor update.

* Fix linting

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix unit tests too

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

Co-authored-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* update py version in base spec (#678)

* TF addons upgrade to 0.17.1 (#689)

* updated tf adons version

* remove comment

* Sriniva2/ssd rn34 (#682)

* improve ssdrn34 perf.

* minor update.

* enabling synthetic data.

* Update base_benchmark_util.py

* Fix linting error

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

Co-authored-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Update Dockerfiles prior to IMZ 2.8 release (#693)

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Update Documents prior to IMZ 2.8 release (#694)

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* add support for open SUSE leap operating system (#708) (#715)

* updated tpps (#725)

* remove tf bert int8 from main readmes, model is not supported in this release. (#743)

* Adding Scipy for TensorFlow serving SSD-MobileNet model (#764) (#766)

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* remove .github

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>
Co-authored-by: leslie-fang-intel <leslie.fang@intel.com>
Co-authored-by: Dina Suehiro Jones <dina.s.jones@intel.com>
Co-authored-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>
Co-authored-by: XiaobingZhang <xiaobing.zhang@intel.com>
Co-authored-by: Xiaoming (Jason) Cui <xiaoming.cui@intel.com>
Co-authored-by: jiayisunx <jiayi.sun@intel.com>
Co-authored-by: Melanie Buehler <melanie.h.buehler@intel.com>
Co-authored-by: Srini511 <srinivasan.narayanamoorthy@intel.com>
Co-authored-by: Sean-Michael Riesterer <sean-michael.riesterer@intel.com>
Co-authored-by: jianan-gu <jianan.gu@intel.com>
Co-authored-by: Chunyuan WU <chunyuan.wu@intel.com>
Co-authored-by: zhuhaozhe <haozhe.zhu@intel.com>
Co-authored-by: Wang, Chuanqi <chuanqi.wang@intel.com>
Co-authored-by: YanbingJiang <yanbing.jiang@intel.com>
Co-authored-by: Weizhuo Zhang <weizhuo.zhang@intel.com>
Co-authored-by: xiaofeij <xiaofei.jiang@intel.com>
Co-authored-by: liangan1 <liangang.zhang@intel.com>
Co-authored-by: blzheng <beilei.zheng@intel.com>
Co-authored-by: Om Thakkar <om.thakkar@intel.com>
Co-authored-by: mahathis <36486206+Mahathi-Vatsal@users.noreply.github.com>
Co-authored-by: msalopan <msalopan@mlp-prod-skx-99155.ra.intel.com>
Co-authored-by: Jitendra Patil <jitendra.patil@intel.com>
* revert bf16 changes (#488)

* Add partials and spec yml for the end2end DLSA pipeline (#460)

* Add partials and specs for the end2end DLSA pipeline

* Add missing end line

* Update name to include ipex

* update specs to have use the public image as a base on one and SPR for the other

* Dockerfile updates for the updated DLSA repo

* Update pip install list

* Rename to public

* Removing partials that aren't used anymore

* Fixes for 'kmp-blocktime' env var (#493)

* Fixes for 'kmp-blocktime' env var

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* update per review feedback

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Add 'kmp-blocktime' for mlperf-gnmt (#494)

* Add 'kmp-blocktime' for mlperf-gnmt

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Remove duplicate parameter definition

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* add sample_input for resnet50 training (#495)

* remove the case when fragment_size not equal args.batch_size (#500)

* Changed the transformer_mlperf fp32 model so that we can fuse the ops… (#389)

* Changed the transformer_mlperf fp32 model so that we can fuse the ops in the model, and also minor changes for python3

* Changed the transformer_mlperf int8 model so that we can fuse the ops in the model, and also minor changes for python3

* SPR updates for WW12, 2022 (#492)

* SPR updates for WW12, 2022

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Update for PyTorch SPR WW2022-12

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Update pytorch base for SPR too

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Stick with specific 'keras-nightly' version

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Updates per code review

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* update maskrcnn training_multinode.sh (#502)

* Fixed a bug in the transformer_mlperf model threads setting (#482)

* Fixed a bug in the transformer_mlperf model threads setting

* Fix failing tests

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

Co-authored-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Added the default threads setting for transformer_mlperf inference in… (#504)

* Added the default threads setting for transformer_mlperf inference in case there is no command line input

* Fix unit tests

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

Co-authored-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* PyTorch Image Classification TL notebook (#490)

* Adds new TL notebook with documentation

* Added newline

* Added to main TL README

* Small fixes

* Updated for review feedback

* Added more models and a download limit arg

* Removed py3.9 requirement and changed default model

* Adds Kitti torchvision dataset to TL notebook (#512)

* Adds Kitti torchvision dataset to TL notebook

* Fixed citations formatting

* update maskrcnn model (#515)

* minor update. (#465)

* Create unit-test github action workflow (#518)

* Create unit-test github action workflow

* Update to Horovod commit 11c1389 to fix TF v2.9 + Horovod install failure (#519)

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* update distilbert model to  4.18 transformers and enable int8 path (#521)

* rnnt: use launcher to set output file path and name (#524)

* Update BareMetalSetup.md (#526)

Always use the latest torchvision

* Reduce memory usage for dlrm acc test (#527)

* updatedistilbert with text_classification (#529)

* add patch for distilbert (#530)

* Update the model-builder dockerfile to use ubuntu 20.04 (#532)

* Add script for coco training dataset processing (#525)

* and update tensorflow ssd-resnet34 training dataset instructions

* update patch (#533)

Co-authored-by: Wang, Chuanqi <chuanqi.wang@intel.com>

* [RNN-T training] Enable FP32 gemm using oneDNN (#531)

* Update the Readme guide for distilbert (#534)

* Update the Readme guide for distilbert

* Fix accuracy grep bug, and grep accuracy for distilbert

Co-authored-by: Weizhuo Zhang <weizhuo.zhang@intel.com>

* Update end2end public dockerfile to look for IPEX in the conda directory (#535)

* Notebook to script conversion example (#516)

* Add notebook script conversion example

* Fixed doc

* Replaces custom preprocessor with built-in one

* Changed tag to remove_for_custom_dataset

* Add URL check prior to calling urlretrieve (#538)

* Add URL check prior to calling urlretrieve

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix a typo

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* disable for ssd since fused cat cat kernel is slow (#537)

* fix bug when adding steps in rnnt inference (#528)

* Fix and updates for TensorFlow WW18-2022 SPR (#542)

* Fix and updates for TensorFlow WW18-2022 SPR

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix TensorFlow SPR nightly versions

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Update pre-trained models download URLs

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Intall Python 3.8 development tools

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix OpenMPI install and setup

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Update to Horovod commit 11c1389 to fix TF v2.9 + Horovod install failure (#519)

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix Horovod Installaion for SPR and CentOS

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix Python3.8 version for CentOS

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix a typo in TensorFlow 3d-unet partial

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix a broken partial

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Add TCMalloc to TF base container for SPR and remove OpenSSL

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Remove some repositories

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Add 'matplotlib' for '3d-unet'

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* switch to build OpenMPI due to issue in Market Place provided version

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix PYTORCH_WHEEL and IPEX_WHEEL arg values

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix and updates for PyTorch WW14-2022 SPR (#543)

* Fix and updates for PyTorch WW14-2022 SPR

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix and updates for TensorFlow WW18-2022 SPR

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix TensorFlow SPR nightly versions

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Update pre-trained models download URLs

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Intall Python 3.8 development tools

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix OpenMPI install and setup

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Update to Horovod commit 11c1389 to fix TF v2.9 + Horovod install failure (#519)

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix Horovod Installaion for SPR and CentOS

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix Python3.8 version for CentOS

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix a typo in TensorFlow 3d-unet partial

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix a broken partial

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Add TCMalloc to TF base container for SPR and remove OpenSSL

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Updates required to the base image

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Remove some repositories

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Add 'matplotlib' for '3d-unet'

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* switch to build OpenMPI due to issue in Market Place provided version

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix PYTORCH_WHEEL and IPEX_WHEEL arg values

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix PYT resnet50 quickstart scripts for both Linux and Windows (#547)

* fix quickstart scripts, detect platform type, update to run with pytorch only

* Fix SPR PyTorch MaskRCNN inference documentation for CHECKPOINT_DIR (#548)

* Enable bert large multi stream inference (#554)

* test bert multi stream module

* enable input split and output concat for accuracy run

* change the default num_streams batchsize cores to 56

* change ssd multi stream throughput to 1 core 1 batch

* change the default parameter for rn50 ssd multi stream module

* modify enable_ipex_for_squad.diff to align new multistream hint implementation

* enable warmup and multi socket support

* change default parameter for rn50 ssd multi stream inference

* Add train-no-eval for rn50 pytorch (#555)

* PyTorch SPR BERT large training updates (h5py and dataset instructions) and update LD_PRELOAD for SPR entrypoints (#550)

* Add h5py install to bert training dockerfile

* documentation updates

* update docs, and add input_preprocessing to the wrapper package

* Update LD_PRELOAD trailing :

* Fix syntax

* removing unnecessary change

* Update DLRM entrypoint

* Update docs to note that phase2 has bert_config.json in the CHECKPOINT_DIR

* Fix syntax

* increase shm-size to 10g

* [RNN-T training] Update scripts -- run on 1S (#561)

* Update maskrcnn training script to run on 1s (#562)

* use single node to do ssd-rn34 training (#563)

* Update training.sh (#564)

* Update training.sh (#565)

Use tcmalloc instead of jemalloc

* use single node to do resnet50 training (#568)

* add numactl -C and remove jit warm in main thread (#569)

* Update unit-test.yml (#546)

* re-enable int8 for api change (#579)

* saperate fully convergency test from training test (#581)

Co-authored-by: jianan-gu <jianan.gu@intel.com>

* ssd enable new int8 (#580)

* v1

* enable new int8 method

* Revert "ssd enable new int8 (#580)" (#584)

This reverts commit 9eb3211.

* Revert "re-enable int8 for api change (#579)" (#583)

This reverts commit 0bded92.

* Update training script using 1s (#560)

* Enable checkpoint during training for bert-large (#573)

* minor fix

* Add readme for enabling checkpoint

* update phase1 to enable checkpoint by default

* Update README.md

* Enable ssd bf32 inference training (#589)

* enable ssd bf32 inference

* enable ssd bf32 train

* enable RNN-T bf32 inference (#591)

* Enable bf32 for bert and distilbert for inference (#593)

* enable bf32 distilbert

* enable bert bf32

* Enable RNN-T bf32 training (#594)

* enable maskrcnn bf32 inference and training (#595)

* enable resnet50 and resnext101 bf16 path (#596)

* enable bert bf32 train (#600)

* update resnet int8 path using new int8 api (#603)

* re-enable int8 for api change (#604)

Co-authored-by: jianan-gu <jianan.gu@intel.com>

* Leslie/ssd enable new int8 (#605)

* v1

* enable new int8 method

* update json file

* add rn50 int8 weight sharing

Co-authored-by: Jiang, Xiaofei <xiaofei.jiang@intel.com>

* update ssd training bs to the multily of core numbers (#606)

* enable bf32 for dlrm (#607)

Co-authored-by: jianan-gu <jianan.gu@intel.com>

* Update IPEX new int8 API enabling for distilbert/bert-large (#608)

* enable distilbert

* enable bert

* fix max-ind-range and add memory info (#609)

Co-authored-by: jianan-gu <jianan.gu@intel.com>

* Remove debug code (#610)

* update training steps (#611)

* fix bandit scan fails (#612)

* PYT Image recognition models support on Windows (#549)

* fix all image recognition scripts to run on windows and linux with PYT, and only linux with IPEX

* [RNN-T training] fix bandit scan fails (#614)

* RNN-T inference: fix IMZ Bandit scan fails (#615)

* Update unit-test.yml (#570)

* MaskRCNN: fix IMZ Bandit scan fails (#623)

* Fix for horovod-related failures in TF nightly runs (#613)

* cpp17 horovod failure fix

* minor debugging changes

* minor fixes - directory name

* cleanup

* addressing reviewer comments

* Minor fix for Horovod install and adding 'tf_slim' for SSD ResNet34 (#624)

* Minor fix for Horovod install and adding 'tf_slim' for SSD ResNet34

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Set 'HOROVOD_WITH_MPI=1' explicitly

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* update GCC version to GCC 9

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Add 'horovodrun --check-build' for sanity check

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* removo force install inside Docker

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* [RNN-T training] Fix ddp sample number issue (#625)

* update BF32 usage (#627)

* resnet50 training: add warm up before collecting time (#628)

* image to bf16 (#629)

* Update end2end DLSA dockerfile due to SPR wheel path update and removing int8 patch (#631)

* Update mlpc path for SPR wheels

* remove patch

* Update Horovod commit id for BareMetal, Docker will be updated next (#630)

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* fix dlrm convergence and change training performance BS to 32K (#633)

Co-authored-by: jianan-gu <jianan.gu@intel.com>

* [RNN-T training] Merge sh files to one (#635)

* update torch-ccl into 1.12 (#636)

* Liangan1/update torch ccl version (#637)

* Update torch_ccl version

* resnet50_distributed_training: don't set MASTER_ADDR by user (#638)

* Update torch_ccl in script (#639)

* Enable offline download distilbert (#632)

* enable offline download distilbert

* add convert

* Update README.md

* add accuracy.py

* add file

* refine download

* refine path

* refine path

* add license

* Update dlrm_s_pytorch.py (#643)

* Update README.md (#649)

* init pytorch T5 language model (#648)

* init pytorch T5 language model

* update README.md

* update doc

* update fpn models (#650)

* pytorch resnet50: directly call ipex.quantization (#653)

* fix int8 accuracy (#655)

Co-authored-by: Zhang, Weizhuo <weizhuo.zhang@intel.com>

* Made fixes to the broken links (#652)

* Made fixes to the broken links

* Changed the ResNet50v1_5 version back to v2_7_0

* Modified the setup AI kit instructions

Co-authored-by: msalopan <msalopan@mlp-prod-skx-99155.ra.intel.com>

* Update Security Center URL (#657)

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Weizhuoz/fix for pt 1.12 (#656)

* fix vgg11_bn accuracy syntax error

* remove exact_match from roberta-base

* modify maskrcnn BS to 2*num_cores

* Update dlrm_s_pytorch.py (#660)

* Update dlrm_s_pytorch.py

Reduce int8 memory usage.

* Update dlrm_s_pytorch.py

* Update dlrm_s_pytorch.py

* Update dlrm_s_pytorch.py

* Update dlrm_s_pytorch.py

* Add BF32 DDP for bert-large (#663)

* Update run_ddp_bert_pretrain_phase1.sh

* Update run_ddp_bert_pretrain_phase2.sh

* Update README.md

* move OMP_NUM_THREADS=1 into dlrm_s_pytorch.py (#664)

minor changes

* remove rn50 ao (#665)

* Re-organize models list to be grouped by framework  (#654)

* re-organize models list to be grouped by framework

* update tensorflow ssd-resnet34 training dataset

* add T5 in benchmark/README.md

* mannuel set torch num threads only for int8 (#666)

* Update inference_performance.sh (#669)

* improve ssdrn34 perf. (#671)

* improve ssdrn34 perf.

* minor update.

* Fix linting

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Fix unit tests too

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

Co-authored-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Use IPEX Pytorch whls instead of building IPEX from source (#674)

Co-authored-by: Clayne Robison <clayne.b.robison@intel.com>

* Lpot2inc (#446)

Co-authored-by: ltsai1 <louie.tsai@intel.com>

* Sriniva2/ssd rn34 (#682)

* improve ssdrn34 perf.

* minor update.

* enabling synthetic data.

* Update base_benchmark_util.py

* Fix linting error

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

Co-authored-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Add doc updates for '--synthetic-data' option (#683)

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Change checkpoint setting for Bert train phase 1 (#602)

* Change checkpoint setting for Bert train phase 1

* fix model and config saving

* fix error when runing gpu path (#686)

* fix load pretrained model error when using torch_ccl (#688)

* update py version in base spec (#678) (#690)

* TF addons upgrade to 0.17.1 (#689) (#691)

* updated tf adons version

* remove comment

* Update Dockerfiles prior to IMZ 2.8 release (#693)

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Update Documents prior to IMZ 2.8 release (#694)

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Update README.md (#697)

* change numpy version requirement (#703)

* Remove MiniGo training from IMZ (#644)

* remove MiniGo training scripts and unit test

* [RNN-T] [Inference] optimize the batch decoder (#711)

* reduce fill_ OP in rnnt embedding kernel

* optimize add between int and log to reduce dtype conversion

* rnnt: support dump tracing file and print profile table (#712)

* add support for open SUSE leap operating system (#708)

* rnnt inference: pre convert data to bf16 (#713)

* remove squeeze/slice/transpose (#714)

* update resnet50 training code (#710)

* update resnet50 training code

* not using ipex optimize for resnet50 training

* use ipex.optimize() on the whole model (#718)

* resnet50 bf32: calling ipex.optimize to enable bf32 path (#719)

* Added batch size as an env variable to the quickstart scripts (#676)

Co-authored-by: Clayne Robison <clayne.b.robison@intel.com>

* Added batchsize as an env variable to quickstart scripts (#680)

* updated readme: nit fix (#723)

Co-authored-by: Rahul Nair <rahulunair@users.noreply.github.com>

* compute throughput by test_mini_batch_size (#740)

* pytorch resnet50: fix bf32 training path error (#739)

* Fix a subtle 'E275' style issue that causes unknown behavior (#742)

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* rearrange the paragraphs and fix Markdown headers (#744)

* Align Transformers version for BERT models (#738)

* align transformer version(4.18) for bert models

* change scripts to legacy

* redo calibration

* patch fix

* Update README.md (#746)

* Add support for stock PYT- object detection models (#732)

* stock PYT and windows support for object detection models

* Weizhuoz/reduce model zoo steps (#762)

* reduce steps for bert-base, roberta, fpn models

* modify max_iter for fpn models

* reduce all img classification models steps

* update new config for bert models (#763)

* Addin Scipy for TensorFlow serving SSD-MobileNet model (#764)

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* Update TF ResNet50v1.5 inference for SPR (baremetal) (#749)

* Added matplotlib dependency to image_segmentation requirements (#768)

* Update readmes for the path to output directory (#769)

* update wide & deep readme for the path to pretrained model directory (#771)

* add a check for ubuntu 22.04 support (#721)

* Changes to add bfloat16 support for DIEN training (#679)

* Changes to add bfloat16 support for DIEN training
* Some for for reporting performance
* Fixes for dien training and unit tests

* updated tpp file withr2.8 approvals (#773)

* Add Windows stock PyTorch support for TransNet v2 (#779)

* update TransNet v2 to work with stock pytorch
* update Windows.md path in all relevant docs

* add P99 metric for LZ models (#780)

Co-authored-by: Weizhuo Zhang <weizhuo.zhang@intel.com>

* Rn50 training multiple epoches output 1 KPI and add training_steps argument. (#775)

* enable --training_steps and 1 training KPI output with multiple epoches

* add prefix

* update print freq

* fix display bug

* enable PyTorch resnet50 fp16 path (#783)

* enable PyTorch resnet50 fp16 path

* fix conflict

* Extract p99 metric from log to summary (#784)

* enable fp16 bert train and inference (#782)

* Vruddarr/pt update windows readmes (#778)

* remove bfloat16 experimental support note (#786)

* Update IPEX installation path (#788)

* Clean up _pycache_ files, remove symlinks, and add license headers for dien training bf16 (#787)

* update readme for jemalloc and iomp path (#789)

* update readme for jemalloc and iomp path

* Updated IOMP path as path to the intel-openmp directory

* PyTorch: fix resnext101 running script (#795)

* Update 3dunet mlperf bash scripts and README (#797)

* update 3dunet mlperf doc to use quickstart scripts, rename quickstart scripts for multi-instance

* fix tests job (#803)

* rnnt inference: align replace lstm API due to IPEX change (#802)

* Adding quick start scripts to MobileNetV1 bfloat16 precision (#793)

* Adding quick start scripts to ssd-mobilenet bfloat16 precision (#798)

* Update T5 model with windows quick start scripts (#790)

* Update T5 model with windows quick start scripts

* Updated Readme by specifying values to environment variables

* Update inference int8 readme and script of 4 CV models using INC (#698)

* update docs to add INC int8 models as an option
* add instructions for how to quantize a fp32 model using INC

* rnnt: fix stft due to PyTorch API change (#811)

* rnnt training: fix stft due to PyTorch API change (#813)

* Update BareMetalSetup.md (#817)

* Gerardod/build container (#807)

First phase of GHA WF to build the image of a Model Zoo workload container and push it to CAAS.

* Sharvils/tf workload (#808)

* TFv2.10 support added. Horovod version updated.

* Vruddarr/tf add language translation bert fp32 quick start scripts (#804)

* Adding quick start scripts to language translation BERT FP32 model

* Updated TL notebooks for SPR Launch (#810)

* Updates for TL PyTorch notebook

* Edits for two more TL notebooks

* Reverting previous change for virtualenv

* Removed --no-deps and some nonexistent links

* Added TFHub cache dir

* Updated TL notebook README for legal/branding

* Update typo in Readme (#821)

Co-authored-by: veena.mounika.ruddarraju <vruddarr@mlp-prod-skx-7756.ra.intel.com>

* PyTorch: using ipex.optimize for bf16 training (#824)

* Fix CVEs for Pillow and notebook packages (#831)

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>

* add intel-alphafold2 optimized w/ IPEX from realm of AIDD (#737)

* add alphafold2 from AIDD realm

* Remove unused variable in mlperf 3DUnet performance run (#832)

* Update Model Zoo name, Python version and message for IPEX (#833)

* Update instruction for Miniconda, Jemalloc, PyTorch and IPEX and updt… (#830)

* Update models main tables (#836)

*update main readmes

* Adding jemalloc instructions and environment variables (#838)

* Add support for dGPU models (#840)

* add support for dGPU support

* remove spr dockerfiles and spec files (#842)

* delete links to 3dunet mlperf and bert large int8 (#841)

* update tbb files (#843)

* fix vulnerability issues reported by snyk scans (#848)

* update for new precision (#849)

* upgrade for ipex 1.13

* delete workflows

Signed-off-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>
Co-authored-by: leslie-fang-intel <leslie.fang@intel.com>
Co-authored-by: Dina Suehiro Jones <dina.s.jones@intel.com>
Co-authored-by: Abolfazl Shahbazi <abolfazl.shahbazi@intel.com>
Co-authored-by: XiaobingZhang <xiaobing.zhang@intel.com>
Co-authored-by: Xiaoming (Jason) Cui <xiaoming.cui@intel.com>
Co-authored-by: jiayisunx <jiayi.sun@intel.com>
Co-authored-by: Melanie Buehler <melanie.h.buehler@intel.com>
Co-authored-by: Srini511 <srinivasan.narayanamoorthy@intel.com>
Co-authored-by: Sean-Michael Riesterer <sean-michael.riesterer@intel.com>
Co-authored-by: jianan-gu <jianan.gu@intel.com>
Co-authored-by: Chunyuan WU <chunyuan.wu@intel.com>
Co-authored-by: zhuhaozhe <haozhe.zhu@intel.com>
Co-authored-by: Wang, Chuanqi <chuanqi.wang@intel.com>
Co-authored-by: YanbingJiang <yanbing.jiang@intel.com>
Co-authored-by: Weizhuo Zhang <weizhuo.zhang@intel.com>
Co-authored-by: xiaofeij <xiaofei.jiang@intel.com>
Co-authored-by: liangan1 <liangang.zhang@intel.com>
Co-authored-by: blzheng <beilei.zheng@intel.com>
Co-authored-by: Om Thakkar <om.thakkar@intel.com>
Co-authored-by: mahathis <36486206+Mahathi-Vatsal@users.noreply.github.com>
Co-authored-by: Clayne Robison <clayne.b.robison@intel.com>
Co-authored-by: root <root@mlp-prod-clx-6957.ra.intel.com>
Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>
Co-authored-by: ltsai1 <louie.tsai@intel.com>
Co-authored-by: Jitendra Patil <jitendra.patil@intel.com>
Co-authored-by: Kanvi Khanna <kanvi.khanna@intel.com>
Co-authored-by: Rahul Nair <rahulunair@users.noreply.github.com>
Co-authored-by: Veena2207 <111923243+Veena2207@users.noreply.github.com>
Co-authored-by: jojivk-intel-nervana <jojimon.varghese@intel.com>
Co-authored-by: xiangdong <40376367+zxd1997066@users.noreply.github.com>
Co-authored-by: Huang, Zhiwei <zhiwei.huang@intel.com>
Co-authored-by: gera-aldama <111396864+gera-aldama@users.noreply.github.com>
Co-authored-by: Sharvil Shah <shahsharvil96@gmail.com>
Co-authored-by: wyang2 <99377901+intelyoungway@users.noreply.github.com>
Co-authored-by: Yimei Sun <yimei.sun@intel.com>
If enabling padding, the "end" function will not be executed. Add perf calculation for each step in "after_run" to demonstrate perf results.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants