-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add benchmark models that are not easily accessible #5
Conversation
cc @driazati as there's some overlap with tlc-pack/ci-data |
IIUC, you want to add the serialized binary files of these models to this repo. In terms of the license, I'd like to confirm that did you generate these binary files by yourself based on the FasterTransformer, or are these directly cloned from somewhere in that repo? I checked the FastTransformer repo and it is Apache-2.0 license, so it is fine for us to use any code from that repo. We only need to add one line saying this is modified from FastTransformer repo. In the case of binary files, I think a separate README.md under the same directory also works. |
Yes I generated both of them myself. I added ugly hack in one of scripts in I was not aware of |
Got it. Then IMHO we just need to explicitly saying like |
Ok, added more details on the export process and I think it is good to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
An example on how to use quantized BERT (Running requires apache/tvm#10596):
Example output:
|
The motivation is that, there has been growing interest in testing and benchmarking int8 BERT. But BERT and other transformer models are hard to quantize or import to TVM properly, so until recently they were not available to us for benchmarking.
Recently I found that NVIDIA
FasterTransformer
repo has an example of quantizing BERT by PTQ or QAT using TensorRT'spytorch_quantization
tool. And thanks to the recent work on apache/tvm#10239, we can now import those "fake-quantized" QAT BERT models into Relay and convert them into a fully-integer model.Example usages will be provided soon in the tvm repo under
python/tvm/meta_schedule/testing/XXX.py
. Also see #5 (comment)I wonder if we need to worry about license issues?
cc @junrushao1994 @areusch @tqchen @comaniac