This directory stores good models for benchmarking.

Int8 BERT quantized with Quantization-Aware training following the steps in https://github.com/NVIDIA/FasterTransformer/tree/main/bert-quantization/bert-pyt-quantization#quantization-aware-fine-tuning and converted to ONNX manually using this function. The model and run_squad.py script that the export code is based on are both licensed under Apache-2.0.
EfficientNetv2-M, the original TF2 model is from https://github.com/google/automl/tree/master/efficientnetv2 and converted to ONNX following the steps in https://github.com/NVIDIA/TensorRT/tree/master/samples/python/efficientnet#2-efficientnet-v2. Both the original model and the ONNX export code are licensed under Apache-2.0.

Provide feedback

Saved searches