Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: empty model dirs causes triton 22.08 to core dump #475

Closed
dagardner-nv opened this issue Nov 17, 2022 · 3 comments
Closed

[BUG]: empty model dirs causes triton 22.08 to core dump #475

dagardner-nv opened this issue Nov 17, 2022 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@dagardner-nv
Copy link
Contributor

Version

22.11

Which installation method(s) does this occur on?

No response

Describe the bug.

Launching triton from the root of the morpheus repo as documented in docs/source/developer_guide/guides/2_real_world_phishing.md:

docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:22.08-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --log-info=true

Causes triton to core dump:

#0  0x00007fe14850042b in triton::backend::tensorrt::UseTensorRTv2API(std::shared_ptr<nvinfer1::ICudaEngine> const&) () from /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so
#1  0x00007fe1484dd08e in triton::backend::tensorrt::ModelState::AutoCompleteConfigHelper(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
   from /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so
#2  0x00007fe1484dfc60 in triton::backend::tensorrt::ModelState::AutoCompleteConfig() () from /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so
#3  0x00007fe1484e0588 in triton::backend::tensorrt::ModelState::Create(TRITONBACKEND_Model*, triton::backend::tensorrt::ModelState**) () from /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so
#4  0x00007fe1484e0a3a in TRITONBACKEND_ModelInitialize () from /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so
#5  0x00007fe1559d5fde in triton::core::TritonModel::Create(triton::core::InferenceServer*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long, inference::ModelConfig const&, std::unique_ptr<triton::core::TritonModel, std::default_delete<triton::core::TritonModel> >*) () from /opt/tritonserver/bin/../lib/libtritonserver.so
#6  0x00007fe155a903a4 in triton::core::ModelLifeCycle::CreateModel(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long, triton::core::ModelLifeCycle::ModelInfo*) ()
   from /opt/tritonserver/bin/../lib/libtritonserver.so
#7  0x00007fe155a96e38 in std::_Function_handler<void (), triton::core::ModelLifeCycle::AsyncLoad(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, inference::ModelConfig const&, std::shared_ptr<triton::core::TritonRepoAgentModelList> const&, std::function<void (triton::core::Status)>&&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /opt/tritonserver/bin/../lib/libtritonserver.so
#8  0x00007fe155bc9b00 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<triton::common::ThreadPool::ThreadPool(unsigned long)::{lambda()#1}> > >::_M_run() ()
   from /opt/tritonserver/bin/../lib/libtritonserver.so
#9  0x00007fe155523de4 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007fe156874609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#11 0x00007fe15520e133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

model works fine with triton 22.02

Minimum reproducible example

docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:22.08-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --log-info=true


### Relevant log output

```shell
#0  0x00007fe14850042b in triton::backend::tensorrt::UseTensorRTv2API(std::shared_ptr<nvinfer1::ICudaEngine> const&) () from /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so
#1  0x00007fe1484dd08e in triton::backend::tensorrt::ModelState::AutoCompleteConfigHelper(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
   from /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so
#2  0x00007fe1484dfc60 in triton::backend::tensorrt::ModelState::AutoCompleteConfig() () from /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so
#3  0x00007fe1484e0588 in triton::backend::tensorrt::ModelState::Create(TRITONBACKEND_Model*, triton::backend::tensorrt::ModelState**) () from /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so
#4  0x00007fe1484e0a3a in TRITONBACKEND_ModelInitialize () from /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so
#5  0x00007fe1559d5fde in triton::core::TritonModel::Create(triton::core::InferenceServer*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long, inference::ModelConfig const&, std::unique_ptr<triton::core::TritonModel, std::default_delete<triton::core::TritonModel> >*) () from /opt/tritonserver/bin/../lib/libtritonserver.so
#6  0x00007fe155a903a4 in triton::core::ModelLifeCycle::CreateModel(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long, triton::core::ModelLifeCycle::ModelInfo*) ()
   from /opt/tritonserver/bin/../lib/libtritonserver.so
#7  0x00007fe155a96e38 in std::_Function_handler<void (), triton::core::ModelLifeCycle::AsyncLoad(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, inference::ModelConfig const&, std::shared_ptr<triton::core::TritonRepoAgentModelList> const&, std::function<void (triton::core::Status)>&&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /opt/tritonserver/bin/../lib/libtritonserver.so
#8  0x00007fe155bc9b00 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<triton::common::ThreadPool::ThreadPool(unsigned long)::{lambda()#1}> > >::_M_run() ()
   from /opt/tritonserver/bin/../lib/libtritonserver.so
#9  0x00007fe155523de4 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007fe156874609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#11 0x00007fe15520e133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95


### Full env printout

_No response_

### Other/Misc.

_No response_

### Code of Conduct

- [X] I agree to follow Morpheus' Code of Conduct
- [X] I have searched the [open bugs](https://github.com/nv-morpheus/Morpheus/issues?q=is%3Aopen+is%3Aissue+label%3Abug) and have found no duplicates for this bug report
@dagardner-nv
Copy link
Contributor Author

Worked around the issue by using Triton 22.06 in the documentation (#477) marking this issue as blocked on issue: triton-inference-server/server#5084

@dagardner-nv dagardner-nv changed the title [BUG]: phishing-bert-onnx causes triton 22.08 to core dump [BUG]: empty model dirs causes triton 22.08 to core dump Nov 30, 2022
@dagardner-nv
Copy link
Contributor Author

The issue is due to models/triton-model-repo/phishing-bert-trt/1 and models/triton-model-repo/sid-minibert-trt/1 not containing models instead these dirs contain a readme instructing the user how to generate the model using morpheus tools onnx-to-trt. The crash on empty directory issue is a known issue (https://jirasw.nvidia.com/browse/DLIS-4354).

We could:

  • Remove the dir, the readme in there instructs the user to run morpheus tools onnx-to-trt but requires tensortrt which isn't in our env by default
  • Add the --disable-auto-complete-config flag to the triton commands in our docs

ghost pushed a commit that referenced this issue Dec 16, 2022
* Includes changes from PR #538 
* Fixes casing for proper nouns (NVIDIA, Docker, Triton, Python, Conda) when not referring to a command
* Document `StreamPair` (docstrings for globals need to appear after the definition)
* Other fixes suggested by Zenobia
* Brings grade up from an A to A+
* Fixes CSS theme issue #543 
* Various fixes to the `getting_started.md` doc fixes #539
* Fixes issue with the `docker/run_container_*.sh` scripts for users who have the Nvidia Container Toolkit installed but does not have nvidia set as the default runtime
* Documents launching a pre-built Morpheus container
* Fix errant entry in docstring for `AppShieldSource` which was showing up in the command line help
* Documentation work-around for #475
* Fix a few remaining references to 'srf'

Authors:
  - David Gardner (https://github.com/dagardner-nv)
  - https://github.com/bsuryadev
  - Bhargav Suryadevara (https://github.com/bsuryadevara)

Approvers:
  - Christopher Harris (https://github.com/cwharris)
  - Michael Demoret (https://github.com/mdemoret-nv)

URL: #547
@dagardner-nv
Copy link
Contributor Author

Following up on this one. The bug in question was fixed in Triton 2.28 (version 22.11 of the docker container). Currently the --disable-auto-complete-config is working for us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

2 participants