Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(//core/): Add support for tensorrt plugin and implement instance… #285

Closed
wants to merge 1 commit into from

Conversation

ChaoFang-TRI
Copy link
Contributor

@ChaoFang-TRI ChaoFang-TRI commented Jan 14, 2021

… norm layer

Fix #210

Signed-off-by: Chao Fang chao.fang@tri.global

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Please delete options that are not relevant and/or add your own.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist:

  • My code follows the style guidelines of this project (You can use the linters)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas and hacks
  • I have made corresponding changes to the documentation
  • I have added tests to verify my fix or my feature
  • New and existing unit tests pass locally with my changes

… norm layer

Fix pytorch#210

Signed-off-by: Chao Fang <chao.fang@tri.global>
@ChaoFang-TRI
Copy link
Contributor Author

@narendasan The tensorRT plugin support need more tests. It would be nice if any NVIDIA folks could help.

@ChaoFang-TRI
Copy link
Contributor Author

I keep getting linter erro:
ERROR: Skipping '//tools/linter:cpplint_diff': error loading package 'tools/linter': Unable to find package for @pylinter_deps//:requirements.bzl: The repository '@pylinter_deps' could not be resolved.

@narendasan
Copy link
Collaborator

I keep getting linter erro:
ERROR: Skipping '//tools/linter:cpplint_diff': error loading package 'tools/linter': Unable to find package for @pylinter_deps//:requirements.bzl: The repository '@pylinter_deps' could not be resolved.

Do you have the Python dependencies in the WORKSPACE enabled? It should just be picking up the requirements.txt file in //tools/linter

https://github.com/NVIDIA/TRTorch/blob/72bf74b2a2e425e6c83542f99c92b9e551148149/WORKSPACE#L142

@narendasan narendasan requested a review from peri044 January 14, 2021 19:09
@narendasan
Copy link
Collaborator

@narendasan The tensorRT plugin support need more tests. It would be nice if any NVIDIA folks could help.

Yeah for sure we can take a look here. Next test that would be good to see is if a TRT plugin based model is portable. So compile a module, save to disk, start a new process, load the module and see if it runs or something similar

Copy link
Collaborator

@narendasan narendasan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is a great addition. Left some preliminary comments. Would also like @peri044 to take a look. I think we need to think a bit about how to make sure plugins are available both at conversion and runtime.

@@ -45,6 +48,25 @@ ConversionCtx::ConversionCtx(BuilderSettings build_settings)
"[TRTorch Conversion Context] - ",
util::logging::get_logger().get_reportable_severity(),
util::logging::get_logger().get_is_colored_output_on()) {
// Get list of all available plugin creators

initLibNvInferPlugins(&logger, "");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that this will need to occur in a more global manner since it will need to cover cases were we only use the runtime and the conversion context is therefore never created. We also probably should use the global logger in that case as well instead of the conversion logger.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this would be located somewhere in the runtime module and run on library load since we would also want this for libtrtorchrt.so as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure where is the best place to initialize the plugin. Can you suggest a specific place?

@@ -66,6 +66,9 @@ struct ConversionCtx {
// copy of the values
std::vector<void*> builder_resources;

//Registry of official tensorrt plugin layers.
std::unordered_map<std::string, nvinfer1::IPluginCreator*> mPluginRegistry;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering we are starting to ship our own TRTorch specific plugins as well maybe we could unify the various registries in a separate module which is a dep of conversion and runtime. It might solve the global init issue as well. @peri044 thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a nice idea. We can have a single registry and our own version of initLibNvInferPlugins which can initialize TRT and TRTorch plugins.
But here are some questions in mind
Do we plan on integrating custom plugins with libtrtorch.so as well ? or do we have something like libtrtorch_plugins.so ?

  • For the former case, If the extracted TRT engine has a plugin that doesn't depend on aten:calls and uses custom cuda kernels (as for general plugins), this sample needs to load libnvinfer.so, libtrtorch.so which depends on libtorch.so ?
  • In the later case, the sample can load libnvinfer.so and libtrtorch_plugins.so. Maybe this would be ideal from a deployment perspective ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this is touching the core requirement of TRTorch. I like the second idea to load load libnvinfer.so and libtrtorch_plugins.so.

ASSERT_TRUE(trtorch::tests::util::almostEqual(jit_results[0], trt_results[0].reshape_as(jit_results[0]), 2e-6));
}

TEST(Converters, ATenInstanceNormWithConvertsCorrectly) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another good test would be to make sure we are hitting both paths of the instance norm converter

Copy link
Collaborator

@peri044 peri044 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The testcase which hits one path of instance norm works on my local.

fc.fields = f.data();
nvinfer1::IPluginV2* pluginV2 = ctx->mPluginRegistry.at(pluginName)->createPlugin("instancenorm", &fc);

TRTORCH_CHECK(pluginV2, "Unable to create interpolation plugin from node" << *n);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interpolation plugin name -> instance norm plugin. To distinguish from line 143 check, this message can be changed to something like "Unable to create instance norm plugin from TensorRT plugin registry"

@@ -66,6 +66,9 @@ struct ConversionCtx {
// copy of the values
std::vector<void*> builder_resources;

//Registry of official tensorrt plugin layers.
std::unordered_map<std::string, nvinfer1::IPluginCreator*> mPluginRegistry;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a nice idea. We can have a single registry and our own version of initLibNvInferPlugins which can initialize TRT and TRTorch plugins.
But here are some questions in mind
Do we plan on integrating custom plugins with libtrtorch.so as well ? or do we have something like libtrtorch_plugins.so ?

  • For the former case, If the extracted TRT engine has a plugin that doesn't depend on aten:calls and uses custom cuda kernels (as for general plugins), this sample needs to load libnvinfer.so, libtrtorch.so which depends on libtorch.so ?
  • In the later case, the sample can load libnvinfer.so and libtrtorch_plugins.so. Maybe this would be ideal from a deployment perspective ?

@peri044
Copy link
Collaborator

peri044 commented Jan 27, 2021

@ChaoFang-TRI Thanks for the PR. Any thoughts on the above review comments ? Can you address some of them ?

@ChaoFang-TRI
Copy link
Contributor Author

@ChaoFang-TRI Thanks for the PR. Any thoughts on the above review comments ? Can you address some of them ?

Hi Peri044, I was occupied for family issue in the past month, sorry for the delayed response. Please feel free to jump in on addressing comments! I will push some commits as well to address some commits.

@peri044
Copy link
Collaborator

peri044 commented Apr 3, 2021

I tried to add this instance norm to the plugin suite PR . The testcase provided in this PR does not use TensorRT plugin but rather hits the else section (using native TRT layers).
The following testcase graph uses plugin block in the instance norm code in this PR; but upon testing this, I face seg fault issue. So I'm holding off this plugin for now and later add this once the issues are fixed.

const auto graph = R"IR(
      graph(%0 : Tensor,
            %1: Float(5, strides=[1]),
            %2: Float(5, strides=[1]),
            %3: Float(5, strides=[1]),
            %4: Float(5, strides=[1])):
        %9 : Tensor? = prim::Constant()
        %5 : bool = prim::Constant[value=1]()
        %6 : float = prim::Constant[value=0.10000000000000001]()
        %7 : float = prim::Constant[value=1.0000000000000001e-05]()
        %8 : Tensor = aten::instance_norm(%0, %1, %2, %9, %9, %5, %6, %7, %5)
        return (%8))IR";

@narendasan
Copy link
Collaborator

We should be merging in support for nvinfer plugin in #425, Can we reduce the scope of this PR to only include the additional converters?

@narendasan
Copy link
Collaborator

Closing in favor of #573

@narendasan narendasan closed this Aug 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

✨[Feature] Please support aten::instance_norm
3 participants