Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collection Support [Inprogress] #802

Closed
wants to merge 22 commits into from
Closed

Conversation

inocsin
Copy link
Contributor

@inocsin inocsin commented Jan 13, 2022

TODO list and current progress

  • Make torch_tensorrt::core::ir::Input and torch_tensorrt::Input compatible with IValue.
  • Support simple case of model with tuple inputs
  • Support simple case of model with list inputs
  • Support output of list type
  • Support output of tuple type
  • Add unit test for tuple input and output
  • Add unit test for list input and output
  • Test compatibility of traditional compile spec
  • Support user defined datatype in list/tuple inputs (fp16, fp32, etc.)
  • Python api
  • Add more unit test of different models
  • Handle prim::ListConstruct without fallback them (currently we have to fallback those two operator to support list input and output)
  • Handle aten::__getitem__ without fallback them

Description

Fixes #629 #428

Type of change

  • New feature: collection support

Checklist:

  • My code follows the style guidelines of this project (You can use the linters)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas and hacks
  • I have made corresponding changes to the documentation
  • I have added tests to verify my fix or my feature
  • New and existing unit tests pass locally with my changes

@inocsin
Copy link
Contributor Author

inocsin commented Mar 17, 2022

@narendasan @peri044, the first version of collection feature is ready, can you review the code?
There are some diffenrences from the method mentioned in #629
(1) This implementation can handle input and output of list/tuple type of the whole graph. While the trt-segment doesn't have to handle input and output of list/tuple type, because ListConstruct and TupleConsturct are included and handled in torch-segment
(2) I didn't find any api to calculate the list size of Value which has the type of in->type()->kind() == torch::jit::TypeKind::ListType, so I didn't flatten the input IValue of ir::Input to vector, instead, I use vector<vector> as flattened spec, because it's easier to index Input spec without knowing the size of List. For example, given ([a,b],(c,d)) as input, we can get Input spec of c with input_spec[1][0] without know the size of list [a,b].

Copy link
Collaborator

@narendasan narendasan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a high level pass. I think the logic seems mostly fine, lots of usability issues however re the messaging.

core/ir/ir.cpp Outdated
InputSpecMap pair_input_vals_with_specs(std::vector<const torch::jit::Value*> vals, std::vector<Input> specs) {
LOG_DEBUG("pair_input_vals_with_specs");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have a better message here or just remove it if its not useful. Perhaps dump what the pairings are?

core/ir/ir.cpp Outdated
@@ -27,19 +36,59 @@ InputSpecMap pair_input_vals_with_specs(std::vector<const torch::jit::Value*> va
return a;
}

CollectionInputSpecMap pair_input_vals_with_specs_collection(std::vector<const torch::jit::Value*> vals, std::vector<std::vector<Input>>& specs) {
LOG_DEBUG("pair_input_vals_with_specs collection");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here


CollectionInputSpecMap a;
for (size_t i = 0; i < vals.size(); i++) {
LOG_DEBUG("Paring " << i << ": " << vals[i]->debugName() << " : " << specs[i]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to make this slightly more formatted so it stands out in the log

core/ir/ir.cpp Outdated
std::vector<const torch::jit::Value*> get_tensor_inputs(
std::shared_ptr<torch::jit::Graph>& g,
StaticParams& static_params) {
std::vector<const torch::jit::Value*> input_tensors;
auto inputs = g->inputs();
LOG_DEBUG("Inputs size " << inputs.size());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we need to keep this, or if we should this needs to be more discriptive

core/ir/ir.cpp Outdated
for (auto in : inputs) {
LOG_DEBUG("input debug name: " << in->debugName());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here


TEST(CppAPITests, TestCollectionListInput) {

std::string path = "/root/Torch-TensorRT/list_input.ts";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here

// ir::TypeMap& first_use_type_map) {
// Associate input specs with inputs
// cfg.convert_info.inputs = std::move(ir::associate_specs_with_inputs(g, cfg.inputs, static_params));
cfg.convert_info.collection_inputs = std::move(ir::associate_specs_with_collection_inputs(g, cfg.graph_inputs, static_params));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we call this twice with different apis?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will get the input spec map(collection_inputs maybe confused with GraphInput.collection_input, it should be changed into collection_input_spec_map)

cfg.convert_info.collection_inputs = std::move(ir::associate_specs_with_collection_inputs(g, cfg.graph_inputs, static_params));

auto collection_inputs = ir::get_collection_inputs(g, static_params);
LOG_DEBUG("In MapInputsAndDetermineDTypes " << "g->inputs() size " << g->inputs().size() << ", collection_inputs size " << collection_inputs.size());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better logging messages

LOG_INFO(
"Since input type is not explicitly defined, infering using first tensor calculation\n Found input "
<< in->debugName() << " has type " << est_type_opt[i].value()
<< ". If this is incorrect explicitly set dtype for input and file a bug");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we should tell them to file a bug here since we just use a heuristic, its not always going to be right.

// If we can calculate the type from the graph and the type was not defined by the user then use the calculated
// type
LOG_INFO(
"Since input type is not explicitly defined, infering using first tensor calculation\n Found input "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inferred instead of Found

@narendasan
Copy link
Collaborator

@inocsin can you rebase quickly?

@inocsin
Copy link
Contributor Author

inocsin commented Apr 5, 2022

@inocsin can you rebase quickly?

@narendasan rebased to master

@inocsin
Copy link
Contributor Author

inocsin commented Apr 8, 2022

@naren,I think we should remove aten::__getitem__ and prim::ListConstruct converters, since tensorrt doesn't support tensorList as output, so the ListConstruct should be done in torch subgraph. And since we cannot evaluate item in tensorList, we should remove aten::getitem. What do you think?

# info.inputs = [i._to_internal() for i in inputs]
info.graph_inputs.inputs = [i._to_internal() for i in inputs]
else:
info.graph_inputs.input_signature = _parse_collection_input(compile_spec["inputs"])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still cannot pass python value (torch.classes.tensorrt._Input()) to info.graph_inpus.input_signature (torch::jit::IValue)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@narendasan do you have any suggestions?

@narendasan
Copy link
Collaborator

We need prim::ListConstruct because some operations just group and ungroup tensors but dont actually need to be executed at runtime. By forcing it to run PyTorch we would incur a performance cost for switching between TensorRT and PyTorch a bunch

inocsin added 18 commits April 12, 2022 10:08
…sorrt::Input compatible with IValue. Support simple case of tuple input model. Add unit test.

Signed-off-by: inocsin <vcheungyi@163.com>
… elements. Using two level vector to store ir::Input

Signed-off-by: inocsin <vcheungyi@163.com>
Signed-off-by: inocsin <vcheungyi@163.com>
Signed-off-by: inocsin <vcheungyi@163.com>
Signed-off-by: inocsin <vcheungyi@163.com>
Signed-off-by: inocsin <vcheungyi@163.com>
Signed-off-by: inocsin <vcheungyi@163.com>
Signed-off-by: inocsin <vcheungyi@163.com>
Signed-off-by: inocsin <vcheungyi@163.com>
Signed-off-by: inocsin <vcheungyi@163.com>
…sionInfo.collection_input_spec_map

Signed-off-by: inocsin <vcheungyi@163.com>
Signed-off-by: inocsin <vcheungyi@163.com>
Signed-off-by: inocsin <vcheungyi@163.com>
Signed-off-by: inocsin <vcheungyi@163.com>
Signed-off-by: inocsin <vcheungyi@163.com>
Signed-off-by: inocsin <vcheungyi@163.com>
Signed-off-by: inocsin <vcheungyi@163.com>
…ually

Signed-off-by: inocsin <vcheungyi@163.com>
@inocsin
Copy link
Contributor Author

inocsin commented Apr 14, 2022

Now can keep prim::ListConstruct in prim.cpp, and don't have to fallback it manually. Only fallback ListConstrct to torch when the output of tensorrt subgraph is ListConstruct.

inocsin added 2 commits April 14, 2022 21:57
Signed-off-by: inocsin <vcheungyi@163.com>
…and all the nodes can be converted

Signed-off-by: inocsin <vcheungyi@163.com>
# raise KeyError("Input specs should be either torch_tensorrt.Input or torch.Tensor, found types: {}".format(
# [type(i) for i in compile_spec["inputs"]]))

if isinstance(compile_spec["inputs"], list) and all([isinstance(i, torch.Tensor) or isinstance(i, Input) for i in compile_spec["inputs"]]):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these be reduced into a common code path?

Signed-off-by: inocsin <vcheungyi@163.com>
@ncomly-nvidia
Copy link
Contributor

Fixes #469

@facebook-github-bot
Copy link
Contributor

Hi @inocsin!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@ncomly-nvidia
Copy link
Contributor

Should fix #798

@ntakouris
Copy link

When this is done, it will probably make many (if not all) torchvision models much closer to automatic conversion to trt:

Relevant Issue

@narendasan
Copy link
Collaborator

Closing in favor of #1201

@narendasan narendasan closed this Aug 4, 2022
@JXQI
Copy link

JXQI commented Nov 1, 2023

After I merge this changes, When I import torch_tensorrt raise errors:
File "/media/tx-deepocean/Data/poetry_cache/virtualenvs/chest-ct-engine-jmgQEMqM-py3.7/lib/python3.7/site-packages/torch_tensorrt/_enums.py", line 1, in <module> from torch_tensorrt._C import dtype, DeviceType, EngineCapability, TensorFormat ImportError: /media/tx-deepocean/Data/poetry_cache/virtualenvs/chest-ct-engine-jmgQEMqM-py3.7/lib/python3.7/site-packages/torch_tensorrt/_C.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN14torch_tensorrt4core2ir11GraphInputsC1ERN3c106IValueE

can you help me?thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed component: core Issues re: The core compiler release: v1.2 Tagged to be included in v1.2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants