Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to load engine in c++ api #4339

Open
ninono12345 opened this issue Jan 26, 2025 · 4 comments
Open

Unable to load engine in c++ api #4339

ninono12345 opened this issue Jan 26, 2025 · 4 comments
Assignees
Labels
Engine Build Issues with engine build triaged Issue has been triaged by maintainers

Comments

@ninono12345
Copy link

ninono12345 commented Jan 26, 2025

Description

I was once working on a small project a year ago. I wanted to convert a model to tensorrt in python and run inference in c++, I was successful. Now with TensorRT 10 out I am facing problems. I am still successfully able to convert onnx to tensorrt and run inference in python api, but I am unsuccessful in loading the engine in TensorRT.

The code is left untouched, last time it worked, and as far as I know from TensorRT 8.6 to 10 the same code should work.
Perhaps it could be that now I am using Visual Studio 2022? Last time I was using windows/linux cmake and everything worked.

`
std::tuple<nvinfer1::ICudaEngine*, nvinfer1::IExecutionContext*> load_feature_extractor(std::string engine_file_name)
{
std::cout << engine_file_name << std::endl;
std::ifstream file(engine_file_name, std::ios::binary | std::ios::ate);
if (!file) {
std::cout << "failed to load engine" << std::endl;
throw std::runtime_error("failed to load engine");
}

	std::streamsize size = file.tellg();
	file.seekg(0, std::ios::beg);

	std::vector<char> buffer(size);
	if (!file.read(buffer.data(), size)) {
		throw std::runtime_error("unable to read engine");
	}

	Logger m_l = Logger();
	nvinfer1::IRuntime* runtime = nvinfer1::createInferRuntime(m_l);

	nvinfer1::ICudaEngine* engine = runtime->deserializeCudaEngine(buffer.data(), buffer.size());
	if (!engine) {
		std::cout << "engine is null" << std::endl;
	}

	for (int i = 0; i < engine->getNbIOTensors(); i++) {
		std::cout << (engine->getIOTensorName(i)) << std::endl;
		//io_tensor_dims.push_back(engine->getTensorShape(io_tensor_names[i]));
	}

	nvinfer1::IExecutionContext* context = engine->createExecutionContext();

	// cudaStream_t* stream;
	// cudaStreamCreate(stream);

	//scale_factors = torch::ones({1});
	// scores_raw = torch::empty({ 1,1,18,18 }, torch::TensorOptions().dtype(torch::kFloat32).device(torch::kCUDA));
	// bbox_preds = torch::empty({ 1,1,4,18,18 }, torch::TensorOptions().dtype(torch::kFloat32).device(torch::kCUDA));

	return std::make_tuple(engine, context);
}

`

output:
IRuntime::deserializeCudaEngine: Error Code 1: Internal Error (Unexpected call to stub loadRunner for ShuffleRunner.)

And also I tried to create the engine from the c++ api:
`
class Logger : public ILogger
{
void log(Severity severity, const char* msg) noexcept override
{
// suppress info-level messages
if (severity <= Severity::kWARNING)
std::cout << msg << std::endl;
}
} logger;

IBuilder* builder = createInferBuilder(logger);
auto strt = 1U << static_cast<uint32_t>(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH);
INetworkDefinition* network = builder->createNetworkV2(strt);

IParser* parser = createParser(*network, logger);

parser->parseFromFile("feature_extractor_tompnet_50.onnx",
static_cast<int32_t>(ILogger::Severity::kWARNING));
for (int32_t i = 0; i < parser->getNbErrors(); ++i)
{
std::cout << parser->getError(i)->desc() << std::endl;
}

IBuilderConfig* config = builder->createBuilderConfig();
// config->setMemoryPoolLimit(MemoryPoolType::kWORKSPACE, 1U << 20);
// config->setMemoryPoolLimit(MemoryPoolType::kTACTIC_SHARED_MEMORY, 48 << 10);

IHostMemory* serializedModel = builder->buildSerializedNetwork(*network, *config);

ICudaEngine* engine = builder->buildEngineWithConfig(*network, *config); // THIS FAILS

`

The engine is successfully serialized, but loading still fails.

buildEngineWithConfig outputs:
Unexpected Internal Error: [virtualMemoryBuffer.cpp::nvinfer1::StdVirtualMemoryBufferImpl::~StdVirtualMemoryBufferImpl::123] Error Code 1: Cuda Driver (TensorRT internal error)

if I print all logs, I get:

Image

If config->setMemoryPoolLimit(MemoryPoolType::kWORKSPACE, 1U << 20) is uncommented then I also get outputs both for buildSerializedNetwork and buildEngineWithConfig:
UNSUPPORTED_STATE: Skipping tactic 0 due to insufficient memory on requested size of 10616832 detected for tactic 0x0000000000000000.

Keep in mind that using python api I am successful in converting and running inference, but c++ api fails.

Could this be due to Visual Studio?

P. S. trtexec is also able to convert and run inference; The most important thing for me is to be able to run inference in c++, I can convert the engine in python

Thank you

Environment

TensorRT Version: 10.7

NVIDIA GPU: GTX 1660 Ti

NVIDIA Driver Version: 561.19

CUDA Version: 12.4

CUDNN Version: 8.9.7

Operating System: Windows 10

Visual Studio 2022

Python Version (if applicable): 3.10

PyTorch Version (if applicable): latest with cuda 12.4

ONNX Model link:
https://drive.google.com/file/d/1S2O6FAm5tbzkbFUUIcuZiAmdSUcSSosa/view?usp=sharing

@ninono12345
Copy link
Author

@zerollzeng I remember last time you we're really helpfull to me, perhaps you could give me insight into what I am missing. Thank you

@kevinch-nv kevinch-nv added Engine Build Issues with engine build triaged Issue has been triaged by maintainers labels Feb 5, 2025
@kevinch-nv kevinch-nv self-assigned this Feb 5, 2025
@kevinch-nv
Copy link
Collaborator

The API usage looks fine to me, the errors suggest some environment issue:

[virtualMemoryBuffer.cpp::nvinfer1::StdVirtualMemoryBufferImpl::~StdVirtualMemoryBufferImpl::123] Error Code 1: Cuda Driver (TensorRT internal error)

UNSUPPORTED_STATE: Skipping tactic 0 due to insufficient memory on requested size of 10616832 detected for tactic 0x0000000000000000

Can you double check that your program in Visual Studio is linking properly to all the TensorRT and CUDA libraries?

@ninono12345
Copy link
Author

ninono12345 commented Feb 6, 2025

Thank you for your answer. You see, there is not a lot of information on visual studio. Mostly everything is in CMake.

Here is how my imports look like, please tell me if you notice, that something should be changed:

Image

Image

Image

Of course I should probably switch to cmake, but this is a very strange error, I am able to import libraries, create nvinfer1::IRuntime, but creating an engine fails.
As well as building from onnx a serialized tensorrt network is successful, but loading it fails.
Both times creating ICudaEngine fails

@ninono12345
Copy link
Author

I indeed can confirm that a simple cmake works, the engine is loaded without error. But I only wonder what could I have missed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Engine Build Issues with engine build triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

2 participants