Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QKeras to QONNX converter #7

Closed
wants to merge 23 commits into from

Conversation

vloncar
Copy link
Contributor

@vloncar vloncar commented Nov 10, 2021

Long-overdue, incomplete, horribly-coded first stab at the QKeras to QONNX converter.

It relies on tf2onnx to perform the actual conversion. The conversion is done in several steps:

  1. A model is stripped of QKeras-related attributes (kernel, bias and activation quantizers) and layers, and replaced with generic Keras layers. A map of layers and their quantizers is saved.
  2. Stripped model is fed into tf2onnx
  3. We add hooks into the standard tf2onnx handler for Keras layers, and insert Quant nodes based on the quantizer map.

The 3rd step can in theory be replaced with the ModelWrapper-based approach after the initial conversion, however for that ModelWrapper needs to be extended with graph manipulation routines (like, insert before/after a specific node, remove node etc). Also, the name mangling that occurs after the model comes out of tf2onnx makes it more difficult to map the layer names in ONNX to the saved quantizer map. This was the original idea, however I couldn't figure out a perfect way to deduce the final layer names. It makes sense to revisit this once I get a bit more knowledge on the insides of tf2onnx.

Some known caveats:

  • The current version (1.9.2) of tf2onnx has an API bug, so until a new version is released we use it incorrectly, and should be updated for 1.9.3 and the project should depend on that version.
  • Overriding tf2onnx handlers may not be foolproof. Some layers are handled in unexpected ways, requiring deeper understanding of the conversion process and figuring out what actually needs to be overridden.
  • The included test doesn't really test the conversion was correct, only that it was successful.
  • The supported Q layer and quantizer list is rather incomplete.
  • pre-commit doesn't like the code.

@heborras
Copy link
Member

Hi @vloncar,
the converter already looks very interesting and the approach of using hooks into tf2onnx is quite nice.
I've had a closer look at the code and in particular the test cases. Here I've made some additions to get them to run in tox on my machine and also to pass the pre-commit hooks, though you may notice that some lines of unused code were simply commented out to get this to work, so at a later stage we should take another look at unused comments.
After this I had a look at the conversion itself and fixed some smaller things to get the network through the QONNX cleanup procedure:

  • Some Inputs of the Quant nodes were set as attributes, so I moved them to the inputs.
  • tf2onnx defines the input and output shape of the network as [?,...] and finn-base can only handle known batch sizes, so I replaced this with [1,...].
  • One really puzzling thing was, that for some reason the output datatype of the Quant node got set to literally nothing, as in not even defined. So I manually set these types to float32. However, there might be some deeper underlying reason for this behavior in the tf2onnx API, which I couldn't spot. So we should keep an eye out for this behavior.

I also have yet to run some actual execution tests to compare the QKeras and onnx execution, but with these changes that should become easier to implement in the next few days.

As an aside: Do you know if the tf2onnx API has some auto-generated documentation somewhere? I noticed that there are comments in the code for such a process, but on their project page I could only find their general documentation for the tool, not the API.

@maltanar
Copy link
Collaborator

This is a great start, thanks @vloncar ! Couple of comments & questions from a joint discussion with @HenniOVP , @volcacius and myself that came up while reviewing:

  • It would be beneficial to have some single-layer export/conversion tests with execution comparison to easily catch + debug issues at a smaller granularity. We have some tests like this in FINN that helped discover plenty of bugs, which define a single quantized layer in Brevitas, export it to FINN-ONNX, then compare the execution with PyTorch/Brevitas. To start with I'd suggest single-layer tests for the layers of interest so far, i.e. quantized convs, ReLU and FC layers.
  • It seems that quantized ReLU nodes are currently converted as Quant -> ReLU, while in Brevitas they are exported as ReLU -> Quant. Does QKeras implement quantized activations in a different way, or is this an error?
  • The core idea of extracting/stripping quantization info, then re-introducing it on a node name basis in ONNX seems to work, but we wondered whether the tf2onnx hooks can be used earlier without stripping the quant layers? Is this because exporting QKeras layer types directly with tf2onnx is not possible?
  • If a QKeras layer will turn into more than one ONNX node (e.g. FC/conv layer with quantized bias, where the bias turns into a separate Add node), will the current mechanism for tracking the quantization information handle this correctly? If not, I would propose to add assertions to catch this if the user tries to convert such a model.
  • Minor: @HenniOVP 's comment above re. "the output datatype of the Quant node got set to literally nothing" may warrant a closer look, since it seems an ONNX ValueInfoProto is created for these tensors but without any datatype. I'm not sure exactly where these get created but I suspect that one of the insertion functions from tf2onnx (insert_node_on_output or insert_new_node_on_input) may take an extra argument to specify the ONNX datatype correctly.

@vloncar
Copy link
Contributor Author

vloncar commented Nov 24, 2021

A brief summary of discussion on @maltanar's questions:

  • Totally agree on the single-layer tests. Will be included in the next few commits.
  • The obeserved issue comes from the quirkiness of the model, the real bug here is that the Quant node is missing after the activation. We can explore special cases where the Quant node before the activation can be removed with a transformation pass post-conversion.
  • The approach with hooks for QKeras layers was attempted, but there were issues. Will be revisited now that the internals of tf2onnx is better understood.
  • 1-to-many mapping between QKeras nodes and ONNX nodes can be handled in the tf2onnx's handler.

@jmduarte
Copy link
Member

Just checking in on this PR, and doing some housekeeping. I also checked which tests fail:

======================================================= short test summary info ========================================================
FAILED tests/test_keras_convert.py::test_qkeras_qactivation[4] - AssertionError: 
FAILED tests/test_keras_convert.py::test_qkeras_qactivation[5] - AssertionError: 
FAILED tests/test_keras_convert.py::test_qkeras_qactivation[6] - AssertionError: 
FAILED tests/test_keras_convert.py::test_qkeras_qactivation[7] - AssertionError: 
FAILED tests/test_keras_convert.py::test_qkeras_qactivation[8] - AssertionError: 
FAILED tests/test_keras_convert.py::test_qkeras_qactivation[9] - AssertionError: 
FAILED tests/test_keras_convert.py::test_qkeras_qactivation[10] - AssertionError: 
FAILED tests/test_keras_convert.py::test_qkeras_qactivation[11] - AssertionError: 
FAILED tests/test_keras_convert.py::test_qkeras_qactivation[12] - AssertionError: 
FAILED tests/test_keras_convert.py::test_qkeras_qactivation[13] - AssertionError: 
FAILED tests/test_keras_convert.py::test_qkeras_qactivation[14] - AssertionError: 
FAILED tests/test_keras_convert.py::test_qkeras_qactivation[15] - AssertionError: 
FAILED tests/test_keras_convert.py::test_qkeras_qactivation[16] - AssertionError: 
FAILED tests/test_keras_convert.py::test_qkeras_conv2d_conversion - onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeErr...
============================================ 14 failed, 32 passed, 1073 warnings in 27.01s =============================================

Only the test_qkeras_conv2d_conversion seems very concerning at the moment in that the inference fails entirely for the ONNX model. Looking at the converted ONNX model in netron, I wonder if it has to do with the shape inference not working properly.

Error message:

E           onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Conv node. Name:'Conv_1' Status Message: Input channels C is not equal to kernel channels * group. C: 14 kernel channels: 3 group: 1

../miniconda3/envs/qonnx/lib/python3.8/site-packages/onnxruntime/capi/session.py:110: Fail
--------------------------------------------------------- Captured stderr call ---------------------------------------------------------
2022-04-21 18:45:32.772521537 [E:onnxruntime:, sequential_executor.cc:309 Execute] Non-zero status code returned while running Conv node. Name:'Conv_1' Status Message: Input channels C is not equal to kernel channels * group. C: 14 kernel channels: 3 group: 1
---------------------------------------------------------- Captured log call -----------------------------------------------------------
ERROR    tf2onnx.tfonnx:tfonnx.py:302 Failed to convert node 'model_45/conv2d_1_m/BiasAdd' (fct=<function process_parsed_graph.<locals>.compat_handler at 0x7fe330156790>)
'OP=Conv\nName=model_45/conv2d_1_m/BiasAdd\nInputs:\n\tmodel_45/act0_m/Relu_activation_quantizer:0=Quant, None, 1\n\tmodel_45/conv2d_1_m/Conv2D/ReadVariableOp:0=Const, [3, 3, 32, 64], 1\n\tmodel_45/conv2d_1_m/BiasAdd/ReadVariableOp:0=Const, [64], 1\nOutpus:\n\tmodel_45/conv2d_1_m/BiasAdd:0=[-1, 6, 6, 64], 1'
Traceback (most recent call last):
  File "/crucial/1/jduarte/miniconda3/envs/qonnx/lib/python3.8/site-packages/tf2onnx/tfonnx.py", line 292, in tensorflow_onnx_mapping
    func(g, node, **kwargs, initialized_tables=initialized_tables, dequantize=dequantize)
  File "/crucial/1/jduarte/miniconda3/envs/qonnx/lib/python3.8/site-packages/tf2onnx/tfonnx.py", line 534, in compat_handler
    return func(ctx, node, name, args)
  File "/crucial/1/jduarte/qonnx/src/qonnx/converters/qkeras/onnx.py", line 109, in conv2d_handler
    ConvOp.any_version(11, ctx, node)
  File "/crucial/1/jduarte/miniconda3/envs/qonnx/lib/python3.8/site-packages/tf2onnx/onnx_opset/nn.py", line 367, in any_version
    shape_dim = ctx.get_shape(node.input[0])[3]
TypeError: 'NoneType' object is not subscriptable

model:

@heborras
Copy link
Member

Hi @jmduarte ,
the missing shapes are for sure an issue. The ?x64 shape after the second Reshape node might also be confusing to the shape inference. I can maybe take a look at what's happening next week.

@thesps
Copy link
Contributor

thesps commented Apr 25, 2022

To add some info on the other failing tests, they mostly use have the pattern of an activation layer like QActivation(activation=quantized_bits(...)) (it should just become a single Quant node in the 'activation path'). I think what happens is that the _strip_qkeras_model returns that layer as Activation(activation=linear)), which is then optimized away by tf2onnx during tf2onnx.convert.from_keras call before we're able to put back in the Quant node. I didn't come up with a fix yet.

@heborras
Copy link
Member

So the shape inference seems to be doing fine, in the sense that it works as expected. It also simply propagates the ? shape, which we should probably avoid, since the FINN execution engine likely can't work with that.
This is what the network looks like after shape inference:
tmp onnx

However, the actual error comes from somewhere else. This appears to have to do with the spurious Transpose in the graph. I'm not sure where it comes from, but it converts to the data to channels last and interestingly also the weights for the Conv node afterwards are in a channels last shape. However, since ONNX doesn't support execution in the channels last format, the attempt at inference then results in the error which we see in the test.
Interestingly, also the third and fourth conv node have weights in the channels last shape. And the shape inference itself is somewhat oblivious to the issue and just predicts shapes, which then later won't work with the Dense layer.

Any idea where this partial channels last conversion and the transpose come form?

@jmduarte
Copy link
Member

I think it comes from the fact that the shapes are not correct for the quantized relu. I added some debug statements in ConvOp here: https://github.com/onnx/tensorflow-onnx/blob/310949ef8246d73c0392311826ac9050e292412f/tf2onnx/onnx_opset/nn.py#L378

        print("debug")
        print("node inputs", node.input)
        print("node input 0", node.input[0])
        print("shape", ctx.get_shape(node.input[0]))

output for test_keras_conv2d_conversion:

tests/test_keras_convert.py::test_keras_conv2d_conversion debug
node inputs ['input', 'model/conv2d_0_m/Conv2D/ReadVariableOp:0']
node input 0 input
shape [-1, 28, 28, 1]
debug
node inputs ['model/act0_m/Relu:0', 'model/conv2d_1_m/Conv2D/ReadVariableOp:0']
node input 0 model/act0_m/Relu:0
shape [-1, 14, 14, 32]
debug
node inputs ['model/conv2d_1_m/Relu:0', 'model/conv2d_2_m/Conv2D/ReadVariableOp:0']
node input 0 model/conv2d_1_m/Relu:0
shape [-1, 6, 6, 64]

output for test_qkeras_conv2d_conversion

tests/test_keras_convert.py::test_qkeras_conv2d_conversion debug
node inputs ['input', 'model/conv2d_0_m/Conv2D/ReadVariableOp:0', 'model/conv2d_0_m/BiasAdd/ReadVariableOp:0']
node input 0 input
shape [-1, 28, 28, 1]
debug
node inputs ['model/act0_m/Relu_activation_quantizer:0', 'model/conv2d_1_m/Conv2D/ReadVariableOp:0', 'model/conv2d_1_m/BiasAdd/ReadVariableOp:0']
node input 0 model/act0_m/Relu_activation_quantizer:0
shape None
debug
node inputs ['model/conv2d_1_m/Relu_activation_quantizer:0', 'model/conv2d_2_m/Conv2D/ReadVariableOp:0']
node input 0 model/conv2d_1_m/Relu_activation_quantizer:0
shape None
debug
node inputs ['model/act2_m/Relu_activation_quantizer:0', 'model/conv2d_3_m/Conv2D/ReadVariableOp:0']
node input 0 model/act2_m/Relu_activation_quantizer:0
shape None

@jmduarte
Copy link
Member

To do:

@maltanar maltanar mentioned this pull request Aug 12, 2022
@jmduarte jmduarte added the enhancement New feature or request label Nov 22, 2022
@jmduarte
Copy link
Member

discussion can continue in #28

@jmduarte jmduarte closed this Dec 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants