Collection IO #629
Replies: 10 comments 8 replies
-
Hi @narendasan, I have a few questions void AddEngineToGraph(
torch::jit::script::Module mod,
std::shared_ptr<torch::jit::Graph>& g,
const std::string& serialized_engine,
runtime::CudaDevice& device_info,
GraphIO graph_io,
std::string engine_id = "",
bool fallback = false)
{
...
// Add inputs to graph
// Setting the input binding relation
torch::jit::Value nested_inputs = xxx;
g->block()->appendNode(nested_inputs);
...
auto execute_node = g->create(
c10::Symbol::fromQualString("tensorrt::execute_engine"),
torch::jit::ArrayRef<torch::jit::Value*>(execute_node_inputs),
1);
...
// Set the output binding relation
// Register outputs
...
}
|
Beta Was this translation helpful? Give feedback.
-
A couple of questions regarding this spec for input:
This may significantly simplify the implementation (for MVP at least), as we should need only to unwrap the input container to reduce the problem to a form that we can already handle today (ordering is kept as given in the input container). I suspect that this may solve most common use cases, since the user may just append to their container should they need to mix tensors with tensor containers. The input shapes also have a natural 1-1 mapping to the inputs here as well.
|
Beta Was this translation helpful? Give feedback.
-
@inocsin How far along on the implementation of this are you already? Would you happen to have a public dev branch that is usable? If not, I don't mind taking a crack at the implementation here, focusing specifically on input for now. |
Beta Was this translation helpful? Give feedback.
-
For a simple model with tuple as input as below import torch
import copy
import torch.nn as nn
import torch.nn.functional as F
from typing import Tuple, List, Dict
class TestModel(nn.Module):
def __init__(self):
super(TestModel, self).__init__()
def forward(self, z: Tuple[torch.Tensor, torch.Tensor]):
r = z[0] + z[1]
return r
test_model = TestModel()
ts = torch.jit.script(test_model)
print(ts.graph) The original graph is
If we use
And models that take list as input as below will produce class TestModel(nn.Module):
def __init__(self):
super(TestModel, self).__init__()
def forward(self, z: List[torch.Tensor]):
r = z[0] + z[1]
return r
When I was implementing this feature, I have a few questions as below, do you have any suggestions? Thanks. @narendasan |
Beta Was this translation helpful? Give feedback.
-
Hey wanted to follow up on this since it's been a while; have there been any additional updates to this? |
Beta Was this translation helpful? Give feedback.
-
As of v1.2.0 there will be experimental collections support (#1201). It comes with the following caveats:
This means users will not have access to features like dynamic shape and performance may not be optimal. The intention is to address this limitation in v1.3.0 |
Beta Was this translation helpful? Give feedback.
-
When we go to implement the graph synthesis component we should include the ability to handle optional tensors which may or may not be def forward(self, x: Tuple[torch.Tensor, torch.Tensor], y: Optional[torch.Tensor], z: List[Optional[torch.Tensor]]): A user should be able to do this torch_tensorrt.compile(mod,
input_signature=((a,b), None, [c, None, d],)
) |
Beta Was this translation helpful? Give feedback.
-
Further Work and SuggestionsIn regard to models which output Tuples or other complex types, certain unexpected failures can stem from specifying DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Evaluating %729 : (Float(1, 128, 768, strides=[98304, 768, 1], requires_grad=1, device=cpu)) = prim::TupleConstruct(%input)
DEBUG: [Torch-TensorRT TorchScript Conversion Context] - Found the value to be: (<__torch__.torch.classes._torch_tensorrt_eval_ivalue_types.TensorContainer object at 0x881822b0>,)
RuntimeError: [Error thrown at core/conversion/conversion.cpp:230] Tuple type. Only a single tensor or a TensorList type is supported. This error does not appear, however, when To resolve this issue, it could be helpful to modify the TensorRT/core/conversion/conversion.cpp Lines 214 to 246 in 5063b14 If this function can handle tuple-formatted outputs, and more generally, if Input/Output tensor formatting could be handled symbolically, as is discussed in the RFC and comments above, this could be a promising approach to resolve the above error as well as other related ones. Additionally, this would provide a performance boost to such models, as requiring a Torch-executed block simply for casting inputs and outputs to the correct nesting format could slow down inference. Depending on the scope of the solution, this could require updates to |
Beta Was this translation helpful? Give feedback.
-
Dictionary
|
Beta Was this translation helpful? Give feedback.
-
Collections
Goal
Currently TRTorch programs that can be compiled must be trivially reducible to the form
f([Tensor]) -> [Tensor]
. Cases likef(Tensor) -> ((Tensor, Tensor))
are supported through this method. This means that any sort of Input/Output formatting is not currently handled by TRTorch. We would like to add support for cases likef(Tensor[]) -> (Tensor, Tensor, (Tensor, Tensor))
orf(Tensor, Tensor, (Tensor, Tensor)) -> (Tensor, (Tensor, Tensor))
which have non trivial subgrouping of tensors.API Considerations
Considering that the formatting of the function signature is now more complex, we might want to think about ways to make it easy to convey the input specification.
Proposed API
For a module with a signature such as:
We could change the API to expect a tuple formatted in the same way someone might call the function. In conjunction with the example tensor feature (#616), this might provide a natural way to reuse or more easily provide input specs vs. doing some sort of mental computation about aligning specs with inputs.
Example
This is as opposed to
Where the inputs must be aligned properly and paired internally with the graph signature.
The advantage is we can create an internal structure which encodes the format of the inputs for the user directly from the tuple provided. It also gives us an input of fixed size. Alternative methods that examine the graph input signature may have these fixed sizes obfuscated by type information. For instance the graph signature that uses a list to group subsets of arguments instead of a tuple you might see a signature like:
This will not tell us how to align the inputs provided by the user as a flat list.
One limitation of this design may be the usage in C++, more exploration will be required to determine if this is ergonomic and consistent with PyTorch
Internal Implementation
1.Inputs
We could look to make
trtorch::core::ir::Input
compatible with IValues by registering it as a torch custom class. This would let us nest Inputs in PyTorch types. This means we can pass around one IValue which holds the full input spec. This can then be parsed in the graph construction phase directly.1. Go from user spec to IValue
Its unclear the exact process to go from a presumably standard Python or C++ tuple to an IValue but this is something that PyTorch is able to do so it should just require looking at the source for PyTorch.
2. Assign IDs to Inputs and create list of Inputs to pass to TensorRT
The next step is to populate a data structure like the one below which assigns each input an ID so that we can create a flattened vector of inputs to pass to TensorRT.
We should add a field to the
trtorch::core::ir::Input
class which is called ID. This will be the unique identifier for the Input during compilation. The order in which we add these inputs will be determined by an in-order traversal of the tuple provided by the user. We only increment the id counter when we hit a new un-labeled Input (i.e. the leaves of the syntax tree). At the same time we can create a list of Inputs which will be passed to the conversion phase. This likely should be stored in a single structThis object should then be added to the CompileSpec (this could potentially replace the vector of Inputs we use right now).
2. Parse IValue and Construct Graph
Once we get to the graph construction phase we now need to amend it so that the first step is to create the input to graph and then take the inputs and flatten them to a list where each index of the list corresponds to the ID of each Input in TorchScript. This will involve using the IValue created in step 1 as the spec for access procedure for each Input.
2. Outputs
1. Evaluating collection operations to get list of outputs
The evaluation system should automatically construct any sort of collections that will be used in the output during conversion. However currently MarkOutputs only handles ITensors and TensorContainers. It will need to extended to handle parsing the collection types. At this time we should construct a similar IValue to the Input IValue which encodes the indexes from the output of TensorRT to the final output tuple. This IValue should be returned from the conversion process with the serialized TensorRT engine. We already have an ID for each output to deal with the fact that TensorRT doesn't guarantee output order. These IDs can be reused in the IValue.
2. Parse IValue and Construct Graph
In the graph construction phase once the TensorRT engine is embedded now we need to add the nodes to pack the outputs into the right format. This should use a similar system to the input system except it is packing Tensors from a list into a format vs unpacking.
Data Structures
GraphIO is a pair where the first index is a struct which both the formatted input tuple containing
core::ir::Input
structs. And then a flattened version of the input tuple. The second index holds an IValue which is formatted tuple of Ints which defines how to go from the list output of TensorRT to the output tuple.Implementation Phases
WAR
We should first check to see if partial compilation can handle some of this trivially to start so that users can get unblocked
MVP
We should implement support for one to two simple collection types. I think that tuples likely will be the simplest so we should start with that and get the system working end to end from user API to graph synthesis.
Additional Data Types
The next least complex type would be lists most likely. They should be implementable like tuples with very few changes if we use the API described above. After that we may want to look at dictionaries (this could be pushed to a later release even) which have the added complexity of keys.
Syntax Sugar
Finally we should consider if there is any way to make the API simpler than what we have proposed here. If there is any work we could do for the user.
Beta Was this translation helpful? Give feedback.
All reactions