Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terminated by signal SIGSEGV (Address boundary error) if import onnxruntime before onnxruntime-genai #257

Closed
trajepl opened this issue Apr 10, 2024 · 26 comments · Fixed by #393

Comments

@trajepl
Copy link

trajepl commented Apr 10, 2024

Cannot create og.Model if user import onnxruntime before the onnxruntime-gneai.
The job is terminated by signal SIGSEGV (Address boundary error)

import onnxruntime as ort

def genai_run(prompt, model_path, max_length=200):

    import time

    import onnxruntime_genai as og

    print("Loading model...")
    app_started_timestamp = time.time()
    model = og.Model(model_path)
    model_loaded_timestamp = time.time()
    print("Model loaded in {:.2f} seconds".format(model_loaded_timestamp - app_started_timestamp))
    tokenizer = og.Tokenizer(model)
    tokenizer_stream = tokenizer.create_stream()
    input_tokens = tokenizer.encode(prompt)
    started_timestamp = time.time()

    print("Creating generator ...")
    params = og.GeneratorParams(model)
    params.set_search_options(
        {
            "do_sample": False,
            "max_length": max_length,
            "min_length": 0,
            "top_p": 0.9,
            "top_k": 40,
            "temperature": 1.0,
            "repetition_penalty": 1.0,
        }
    )
    params.input_ids = input_tokens
    generator = og.Generator(model, params)
    print("Generator created")

    first = True
    new_tokens = []

    while not generator.is_done():
        generator.compute_logits()
        generator.generate_next_token()
        if first:
            first_token_timestamp = time.time()
            first = False

        new_token = generator.get_next_tokens()[0]
        print(tokenizer_stream.decode(new_token), end="")
        new_tokens.append(new_token)

    run_time = time.time() - started_timestamp
    print(
        f"Prompt tokens: {len(input_tokens)}, New tokens: {len(new_tokens)},"
        f" Time to first: {(first_token_timestamp - started_timestamp):.2f}s,"
        f" New tokens per second: {len(new_tokens)/run_time:.2f} tps"
    )


model_path = "xxxx"
genai_run("helo world", model_path)

How to reproduce:

  1. python -m onnxruntime_genai.models.builder -m microsoft/phi-2 -e cpu -p int4 -o ./models/phi2
  2. run above scripts
@trajepl
Copy link
Author

trajepl commented Apr 10, 2024

Not only onnxruntime, it failed when import torch, transformer.

@baijumeswani
Copy link
Collaborator

Could you share more details on this error? And if possible, a stacktrace. That would help me reproduce this.

I tried running your script and was able to successfully execute it. Here is the output:

Loading model...
Model loaded in 15.46 seconds
Creating generator ...
Generator created
.

In the end, the friends realized that their journey had not only deepened their understanding of the world but also strengthened their bond. They had learned the importance of embracing diversity, respecting different perspectives, and finding common ground.

As they bid farewell to the Enchanted Forest, they carried with them the memories of their adventure and the wisdom gained from their encounters. They knew that their shared experiences would forever shape their lives and inspire them to continue exploring the wonders of the world.

And so, the three friends returned to their small town, forever changed by their journey. They became advocates for unity, spreading the message of acceptance and understanding wherever they went. Their story became a testament to the power of friendship, curiosity, and the pursuit of knowledge.

As they looked back on their adventure, they realized that the Enchanted Forest had not only taught them about the world but also about themselves. They had discovered their own strengths, overcome their fears, andPrompt tokens: 3, New tokens: 197, Time to first: 0.25s, New tokens per second: 14.55 tps

I am using onnxruntime-genai 0.1.0 from PyPI for this test. Could you share the following information:

Platform: windows, linux...
ort version
ort-genai version
python version
torch version
transformers version
stacktrace if possible

@trajepl
Copy link
Author

trajepl commented Apr 11, 2024

  • Platform: linux ubuntu20.04

  • ort version: onnxruntime-gpu 1.17.1

  • ort-genai version: onnxruntime-genai-cuda (pip install onnxruntime-genai-cuda --pre --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-genai/pypi/simple/)

    I tested cpu version of ort-genai just now and it worked for me. The failure happened on genai-cuda version no matter we target to run the mode in cpu or gpu.

  • python version: 3.8

  • torch version: 2.2.2

  • transformers version: 4.39.3

  • stacktrace if possible: only terminated by signal SIGSEGV (Address boundary error) was throwed when model = og.Model(model_path)

@trajepl
Copy link
Author

trajepl commented Apr 11, 2024

Thanks for you trying @baijumeswani :) , please let me know if anything is missed. Thanks!!

@baijumeswani
Copy link
Collaborator

baijumeswani commented Apr 11, 2024

Ok, I was able to reproduce the problem on my end. I was also able to narrow down the problem to be related to std::filesystem.

Summary of the problem:

  • onnxruntime-genai-cuda is built with GCC 8.5.
  • onnxruntime-genai (cpu variant) is built with GCC 12.
  • onnxruntime-gpu and probably torch/transformers use GCC > 8 for their build.

The problem is introduced because when importing onnxruntime/torch/transformers first, the symbols for std::filesystem are loaded from libstdc++.so.6. These symbols are incompatible with the symbols needed for std::filesystem for GCC 8. There is more meaningful information provided here: https://bugs.launchpad.net/ubuntu/+source/gcc-8/+bug/1824721/comments/6

I think to resolve this problem, we might need publish a patch release that is built with a higher GCC version.

cc @jchen351

@snnn
Copy link
Member

snnn commented Apr 11, 2024

I can reproduce the problem on Mariner2 Linux. Supposedly the two packages use the same compiler and there is no discrepancy between them. I could see the genai package uses a fs implementation from itself, instead of libstdc++.

@snnn
Copy link
Member

snnn commented Apr 11, 2024

I am rebuilding the packages with symbols.

@snnn
Copy link
Member

snnn commented Apr 12, 2024

Now I have an onnxruntime-gpu debug package, I don't have a genai debug package yet. So I still do not have enough clues. But I see when it crashed the callstack has both packages' native code there. It seems that the onnxruntime-gpu package somehow calls into GenAI, which I didn't expect to happen. Is it possible that the onnxruntime-gpu code constructed a std::filesystem::path object then passed it to the GenAI package's C++ code?

@snnn
Copy link
Member

snnn commented Apr 12, 2024

Do the two packages pass C++ objects around?

@baijumeswani
Copy link
Collaborator

No, there should be no interaction between the two packages.

My hypothesis is that this is caused because we statically link against stdc++fs and when importing any python module that requires libstdc++.so, the symbols are not distinguishable (between the statically linked code and the shared lib) and ort-genai invokes std::filesystem symbols from libstdc++.so which would result in the crash.

@baijumeswani
Copy link
Collaborator

I am guessing that if we tried doing this same thing on an ubuntu 18.04 machine, we wouldn't see this crash.

@snnn
Copy link
Member

snnn commented Apr 12, 2024

But when it crashed in the callstack I see both packages' native code, which is abnormal.

@baijumeswani
Copy link
Collaborator

ort-genai calls into onnxruntime.so library which is embedded inside the ort-genai python package. Are you sure you saw native code from ort python package in the callstack? Or from ort-genai calling into ort.so inside the ort-genai python package? Could you paste your callstack?

@snnn
Copy link
Member

snnn commented Apr 12, 2024

Native code from ort python package in the callstack. I am 100% sure on that.

@snnn
Copy link
Member

snnn commented Apr 12, 2024

Here is the callstack.

#0  0x00007fff8c858d57 in std::filesystem::__cxx11::path::~path() ()
   from /home/chasun/.local/lib/python3.9/site-packages/onnxruntime_genai/onnxruntime_genai.cpython-39-x86_64-linux-gnu.so
#1  0x00007fff8c858d74 in std::filesystem::__cxx11::path::~path() ()
   from /home/chasun/.local/lib/python3.9/site-packages/onnxruntime_genai/onnxruntime_genai.cpython-39-x86_64-linux-gnu.so
#2  0x00007fff8c836972 in ?? ()
   from /home/chasun/.local/lib/python3.9/site-packages/onnxruntime_genai/onnxruntime_genai.cpython-39-x86_64-linux-gnu.so
#3  0x00007fff8c84159a in ?? ()
   from /home/chasun/.local/lib/python3.9/site-packages/onnxruntime_genai/onnxruntime_genai.cpython-39-x86_64-linux-gnu.so
#4  0x00007fff8c84fb5f in ?? ()
   from /home/chasun/.local/lib/python3.9/site-packages/onnxruntime_genai/onnxruntime_genai.cpython-39-x86_64-linux-gnu.so
#5  0x00007ffff7d849d2 in cfunction_call (func=0x7fff8fd0fef0, args=<optimized out>, kwargs=<optimized out>)
    at Objects/methodobject.c:543
#6  0x00007ffff7d68390 in _PyObject_MakeTpCall (tstate=0x555555559b80, callable=0x7fff8fd0fef0, args=0x7fffffffd7e0,
    nargs=2, keywords=0x0) at Objects/call.c:191
#7  0x00007ffff7cc351c in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7fffffffd7e0,
    callable=0x7fff8fd0fef0, tstate=0x555555559b80) at ./Include/cpython/abstract.h:116
#8  _PyObject_VectorcallTstate (kwnames=0x0, nargsf=2, args=0x7fffffffd7e0, callable=0x7fff8fd0fef0,
    tstate=0x555555559b80) at ./Include/cpython/abstract.h:103
#9  method_vectorcall (method=<optimized out>, args=0x7ffff784a838, nargsf=<optimized out>, kwnames=0x0)
    at Objects/classobject.c:83
#10 0x00007ffff7d8f6a0 in slot_tp_init (self=self@entry=0x7fff8fcffa70, args=args@entry=0x7ffff784a820,
    kwds=kwds@entry=0x0) at Objects/typeobject.c:6974
#11 0x00007ffff7d8e59f in type_call (type=<optimized out>, args=0x7ffff784a820, kwds=0x0) at Objects/typeobject.c:1028
#12 0x00007ffff528b911 in pybind11::detail::pybind11_meta_call (type=0x55555619b370, args=0x7ffff784a820, kwargs=0x0)
--Type <RET> for more, q to quit, c to continue without paging--
    at /build/Debug/_deps/pybind11_project-src/include/pybind11/detail/class.h:187
#13 0x00007ffff7d68390 in _PyObject_MakeTpCall (tstate=0x555555559b80, callable=0x55555619b370, args=0x5555555904b0,
    nargs=1, keywords=0x0) at Objects/call.c:191
#14 0x00007ffff7daf4ee in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=9223372036854775809, args=0x5555555904b0,
    callable=0x55555619b370, tstate=<optimized out>) at ./Include/cpython/abstract.h:116
#15 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=9223372036854775809, args=0x5555555904b0, callable=0x55555619b370,
    tstate=<optimized out>) at ./Include/cpython/abstract.h:103
#16 PyObject_Vectorcall (kwnames=0x0, nargsf=9223372036854775809, args=0x5555555904b0, callable=0x55555619b370)
    at ./Include/cpython/abstract.h:127
#17 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x555555559b80)
    at Python/ceval.c:5077
#18 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x5555555902a0, throwflag=<optimized out>)
    at Python/ceval.c:3489
#19 0x00007ffff7da9b2c in _PyEval_EvalFrame (throwflag=0, f=0x5555555902a0, tstate=0x555555559b80)
    at ./Include/internal/pycore_ceval.h:40
#20 _PyEval_EvalCode (tstate=tstate@entry=0x555555559b80, _co=<optimized out>, globals=<optimized out>,
    locals=locals@entry=0x0, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, kwargs=0x5555555b3f80,
    kwcount=0, kwstep=1, defs=0x7ffff784ada8, defcount=1, kwdefs=0x0, closure=0x0, name=0x7ffff781d5b0,
    qualname=0x7ffff781d5b0) at Python/ceval.c:4329
#21 0x00007ffff7d68ab5 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>,
    kwnames=<optimized out>) at Objects/call.c:396
#22 0x00007ffff7daaaaa in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x5555555b3f70,
    callable=0x7ffff78cc160, tstate=0x555555559b80) at ./Include/cpython/abstract.h:118
#23 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x5555555b3f70, callable=<optimized out>)
    at ./Include/cpython/abstract.h:127
#24 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x555555559b80)
    at Python/ceval.c:5077
#25 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x5555555b3e00, throwflag=<optimized out>)
    at Python/ceval.c:3520
--Type <RET> for more, q to quit, c to continue without paging--
#26 0x00007ffff7da9b2c in _PyEval_EvalFrame (throwflag=0, f=0x5555555b3e00, tstate=0x555555559b80)
    at ./Include/internal/pycore_ceval.h:40
#27 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=<optimized out>,
    args=<optimized out>, argcount=<optimized out>, kwnames=0x0, kwargs=0x0, kwcount=0, kwstep=2, defs=0x0,
    defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at Python/ceval.c:4329
#28 0x00007ffff7e1f675 in _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>,
    locals=0x7ffff788d440, args=<optimized out>, argcount=<optimized out>, kwnames=<optimized out>, kwargs=0x0,
    kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at Python/ceval.c:4361
#29 0x00007ffff7e1f60d in PyEval_EvalCodeEx (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>,
    args=<optimized out>, argcount=<optimized out>, kws=<optimized out>, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0,
    closure=0x0) at Python/ceval.c:4377
#30 0x00007ffff7e1f5bf in PyEval_EvalCode (co=co@entry=0x7ffff781f500, globals=globals@entry=0x7ffff788d440,
    locals=locals@entry=0x7ffff788d440) at Python/ceval.c:828
#31 0x00007ffff7e326d4 in run_eval_code_obj (tstate=0x555555559b80, co=0x7ffff781f500, globals=0x7ffff788d440,
    locals=0x7ffff788d440) at Python/pythonrun.c:1221
#32 0x00007ffff7e32666 in run_mod (mod=<optimized out>, filename=<optimized out>, globals=0x7ffff788d440,
    locals=0x7ffff788d440, flags=<optimized out>, arena=<optimized out>) at Python/pythonrun.c:1242
#33 0x00007ffff7d1ded0 in pyrun_file (fp=fp@entry=0x555555559520, filename=filename@entry=0x7ffff790b760,
    start=start@entry=257, globals=globals@entry=0x7ffff788d440, locals=locals@entry=0x7ffff788d440,
    closeit=closeit@entry=1, flags=0x7fffffffe098) at Python/pythonrun.c:1140
#34 0x00007ffff7d1dc27 in pyrun_simple_file (flags=0x7fffffffe098, closeit=1, filename=0x7ffff790b760,
    fp=0x555555559520) at Python/pythonrun.c:450
#35 PyRun_SimpleFileExFlags (fp=fp@entry=0x555555559520, filename=<optimized out>, closeit=closeit@entry=1,
    flags=flags@entry=0x7fffffffe098) at Python/pythonrun.c:483
#36 0x00007ffff7d1e9c3 in PyRun_AnyFileExFlags (fp=fp@entry=0x555555559520, filename=<optimized out>,
    closeit=closeit@entry=1, flags=flags@entry=0x7fffffffe098) at Python/pythonrun.c:92
#37 0x00007ffff7e3acbd in pymain_run_file (cf=0x7fffffffe098, config=0x55555555b580) at Modules/main.c:373
#38 pymain_run_python (exitcode=0x7fffffffe090) at Modules/main.c:598
#39 Py_RunMain () at Modules/main.c:677
--Type <RET> for more, q to quit, c to continue without paging--
#40 0x00007ffff7e3a85d in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:731
#41 0x00007ffff7a6e57d in __libc_start_call_main (main=main@entry=0x555555555050 <main>, argc=argc@entry=2,
    argv=argv@entry=0x7fffffffe2b8) at ../sysdeps/nptl/libc_start_call_main.h:58
#42 0x00007ffff7a6e630 in __libc_start_main_impl (main=0x555555555050 <main>, argc=2, argv=0x7fffffffe2b8,
    init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe2a8)
    at ../csu/libc-start.c:392
#43 0x0000555555555081 in _start () at ../sysdeps/x86_64/start.S:115

@snnn
Copy link
Member

snnn commented Apr 12, 2024

The function call at the line "# 12 0x00007ffff528b911 in pybind11::detail::pybind11_meta_call" was from ORT's python package.

pybind11_meta_call + 77 in section .text of /home/chasun/.local/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_pybind11_state.cpython-39-x86_64-linux-gnu.so

@baijumeswani
Copy link
Collaborator

It is unexpected for ort and ort-genai to share any objects specially since the python script does not invoke anything on onnxruntime (except for importing it and the crash does not happen during importing onnxruntime).

@snnn
Copy link
Member

snnn commented Apr 12, 2024

I got a genai debug package, but the callstack is wired. But, I found an issue.
When I ran

nm -C -g --defined-only /home/chasun/.local/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_pybind11_state.cpython-39-x86_64-linux-gnu.so

It only prints two things:

00000000000e8595 T PyInit_onnxruntime_pybind11_state
0000000000000000 A VERS_1.0

However, with the same command the GenAI package's binary prints a lot of stdc++ symbols. Which means you didn't hide the symbols per suggestion from https://gcc.gnu.org/wiki/Visibility

@snnn
Copy link
Member

snnn commented Apr 12, 2024

Does GenAI has this file? python/version_script.lds

@snnn
Copy link
Member

snnn commented Apr 12, 2024

The symbol exported from GenAI should be PyInit_onnxruntime_genai

@snnn
Copy link
Member

snnn commented Apr 13, 2024

Got a new callstack with GenAI's debug symbols.

#0  0x00007fff8cc5ec25 in std::__cxx1998::vector<std::filesystem::__cxx11::path::_Cmpt, std::allocator<std::filesystem::__cxx11::path::_Cmpt> >::~vector (this=0x23, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/stl_vector.h:567
#1  0x00007fff8cc5bb26 in std::filesystem::__cxx11::path::~path (this=0x3, __in_chrg=<optimized out>)
    at /usr/include/c++/8/bits/fs_path.h:209
#2  0x00007fff8cc62fb8 in std::filesystem::__cxx11::path::_Cmpt::~_Cmpt (this=0x3, __in_chrg=<optimized out>)
    at /usr/include/c++/8/bits/fs_path.h:644
#3  0x00007fff8cc62fd3 in std::_Destroy<std::filesystem::__cxx11::path::_Cmpt> (__pointer=0x3)
    at /usr/include/c++/8/bits/stl_construct.h:98
#4  0x00007fff8cc61efd in std::_Destroy_aux<false>::__destroy<std::filesystem::__cxx11::path::_Cmpt*> (__first=0x3,
    __last=0x0) at /usr/include/c++/8/bits/stl_construct.h:108
#5  0x00007fff8cc60cd5 in std::_Destroy<std::filesystem::__cxx11::path::_Cmpt*> (__first=0x3, __last=0x0)
    at /usr/include/c++/8/bits/stl_construct.h:137
#6  0x00007fff8cc5fcc5 in std::_Destroy<std::filesystem::__cxx11::path::_Cmpt*, std::filesystem::__cxx11::path::_Cmpt>
    (__first=0x3, __last=0x0) at /usr/include/c++/8/bits/stl_construct.h:206
#7  0x00007fff8cc5ec3b in std::__cxx1998::vector<std::filesystem::__cxx11::path::_Cmpt, std::allocator<std::filesystem::__cxx11::path::_Cmpt> >::~vector (this=0x7fffffffcdb0, __in_chrg=<optimized out>)
    at /usr/include/c++/8/bits/stl_vector.h:567
#8  0x00007fff8cc5bb26 in std::filesystem::__cxx11::path::~path (this=0x7fffffffcd90, __in_chrg=<optimized out>)
    at /usr/include/c++/8/bits/fs_path.h:209
#9  0x00007fff8cc71291 in std::make_unique<Generators::Config, char const*&> ()
    at /usr/include/c++/8/bits/unique_ptr.h:835
#10 0x00007fff8cc6cd0c in Generators::CreateModel (ort_env=..., config_path=0x7fffffffcfc0 "xxxx")
    at /ort_genai_src/src/models/model.cpp:345
#11 0x00007fff8cbee330 in Generators::<lambda(const string&)>::operator()(const std::__cxx11::string &) const (
    __closure=0x5555561a0c58, config_path="xxxx") at /ort_genai_src/src/python/python.cpp:212
#12 0x00007fff8cbefee6 in pybind11::detail::initimpl::factory<Generators::pybind11_init_onnxruntime_genai(pybind11::module_&)::<lambda(const string&)>, pybind11::detail::void_type (*)(), std::shared_ptr<Generators::Model>(const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&), pybind11::detail::void_type()>::<lambda(pybind11::detail::value_and_holder&, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)>::op--Type <RET> for more, q to quit, c to continue without paging--
erator()(pybind11::detail::value_and_holder &, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > &) const (this=0x5555561a0c58, v_h=..., args#0="xxxx")
    at /ort_genai_src/build/cuda/_deps/pybind11_project-src/include/pybind11/detail/init.h:297
#13 0x00007fff8cbf3a97 in pybind11::detail::argument_loader<pybind11::detail::value_and_holder&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>::call_impl<void, pybind11::detail::initimpl::factory<Func, pybind11::detail::void_type (*)(), Return(Args ...)>::execute(Class&, const Extra& ...) && [with Class = pybind11::class_<Generators::Model, std::shared_ptr<Generators::Model> >; Extra = {}; Func = Generators::pybind11_init_onnxruntime_genai(pybind11::module_&)::<lambda(const string&)>; Return = std::shared_ptr<Generators::Model>; Args = {const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&}]::<lambda(pybind11::detail::value_and_holder&, const std::__cxx11::basic_string<char>&)>&, 0, 1, pybind11::detail::void_type>(pybind11::detail::initimpl::factory<Generators::pybind11_init_onnxruntime_genai(pybind11::module_&)::<lambda(const string&)>, pybind11::detail::void_type (*)(), std::shared_ptr<Generators::Model>(const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&), pybind11::detail::void_type()>::<lambda(pybind11::detail::value_and_holder&, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)> &, std::index_sequence, pybind11::detail::void_type &&) (
    this=0x7fffffffcfb0, f=...) at /ort_genai_src/build/cuda/_deps/pybind11_project-src/include/pybind11/cast.h:1439
#14 0x00007fff8cbf3985 in pybind11::detail::argument_loader<pybind11::detail::value_and_holder&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>::call<void, pybind11::detail::void_type, pybind11::detail::initimpl::factory<Func, pybind11::detail::void_type (*)(), Return(Args ...)>::execute(Class&, const Extra& ...) && [with Class = pybind11::class_<Generators::Model, std::shared_ptr<Generators::Model> >; Extra = {}; Func = Generators::pybind11_init_onnxruntime_genai(pybind11::module_&)::<lambda(const string&)>; Return = std::shared_ptr<Generators::Model>; Args = {const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&}]::<lambda(pybind11::detail::value_and_holder&, const std::__cxx11::basic_string<char>&)>&>(pybind11::detail::initimpl::factory<Generators::pybind11_init_onnxruntime_genai(pybind11::module_&)::<lambda(const string&)>, pybind11::detail::void_type (*)(), std::shared_ptr<Generators::Model>(const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&), pybind11::detail::void_type()>::<lambda(pybind11::detail::value_and_holder&, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)> &) (this=0x7fffffffcfb0, f=...)
    at /ort_genai_src/build/cuda/_deps/pybind11_project-src/include/pybind11/cast.h:1413
#15 0x00007fff8cbf3463 in pybind11::cpp_function::<lambda(pybind11::detail::function_call&)>::operator()(pybind11::detail::function_call &) const (this=0x0, call=...)
--Type <RET> for more, q to quit, c to continue without paging--
    at /ort_genai_src/build/cuda/_deps/pybind11_project-src/include/pybind11/pybind11.h:249
#16 0x00007fff8cbf34e6 in pybind11::cpp_function::<lambda(pybind11::detail::function_call&)>::_FUN(pybind11::detail::function_call &) () at /ort_genai_src/build/cuda/_deps/pybind11_project-src/include/pybind11/pybind11.h:224
#17 0x00007fff8cc049bb in pybind11::cpp_function::dispatcher (self=0x7fff90097ed0, args_in=0x7ffff77cc500,
    kwargs_in=0x0) at /ort_genai_src/build/cuda/_deps/pybind11_project-src/include/pybind11/pybind11.h:929
#18 0x00007ffff7d849d2 in cfunction_call (func=0x7fff9009e270, args=<optimized out>, kwargs=<optimized out>)
    at Objects/methodobject.c:543
#19 0x00007ffff7d68390 in _PyObject_MakeTpCall (tstate=0x555555559bc0, callable=0x7fff9009e270, args=0x7fffffffd780,
    nargs=2, keywords=0x0) at Objects/call.c:191
#20 0x00007ffff7cc351c in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7fffffffd780,
    callable=0x7fff9009e270, tstate=0x555555559bc0) at ./Include/cpython/abstract.h:116
#21 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=2, args=0x7fffffffd780, callable=0x7fff9009e270,
    tstate=0x555555559bc0) at ./Include/cpython/abstract.h:103
#22 method_vectorcall (method=<optimized out>, args=0x7ffff784a808, nargsf=<optimized out>, kwnames=0x0)
    at Objects/classobject.c:83
#23 0x00007ffff7d8f6a0 in slot_tp_init (self=self@entry=0x7fff9007b270, args=args@entry=0x7ffff784a7f0,
    kwds=kwds@entry=0x0) at Objects/typeobject.c:6974
#24 0x00007ffff7d8e59f in type_call (type=<optimized out>, args=0x7ffff784a7f0, kwds=0x0) at Objects/typeobject.c:1028
#25 0x00007fff8cbfe650 in pybind11::detail::pybind11_meta_call (type=0x5555561a0780, args=0x7ffff784a7f0, kwargs=0x0)
    at /ort_genai_src/build/cuda/_deps/pybind11_project-src/include/pybind11/detail/class.h:187
#26 0x00007ffff7d68390 in _PyObject_MakeTpCall (tstate=0x555555559bc0, callable=0x5555561a0780, args=0x555555e01050,
    nargs=1, keywords=0x0) at Objects/call.c:191
#27 0x00007ffff7daf4ee in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=9223372036854775809, args=0x555555e01050,
    callable=0x5555561a0780, tstate=<optimized out>) at ./Include/cpython/abstract.h:116
#28 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=9223372036854775809, args=0x555555e01050, callable=0x5555561a0780,
    tstate=<optimized out>) at ./Include/cpython/abstract.h:103
#29 PyObject_Vectorcall (kwnames=0x0, nargsf=9223372036854775809, args=0x555555e01050, callable=0x5555561a0780)
    at ./Include/cpython/abstract.h:127
#30 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x555555559bc0)
--Type <RET> for more, q to quit, c to continue without paging--
    at Python/ceval.c:5077
#31 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x555555e00e40, throwflag=<optimized out>)
    at Python/ceval.c:3489
#32 0x00007ffff7da9b2c in _PyEval_EvalFrame (throwflag=0, f=0x555555e00e40, tstate=0x555555559bc0)
    at ./Include/internal/pycore_ceval.h:40
#33 _PyEval_EvalCode (tstate=tstate@entry=0x555555559bc0, _co=<optimized out>, globals=<optimized out>,
    locals=locals@entry=0x0, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, kwargs=0x5555555b3ec0,
    kwcount=0, kwstep=1, defs=0x7ffff784ad78, defcount=1, kwdefs=0x0, closure=0x0, name=0x7ffff781d5f0,
    qualname=0x7ffff781d5f0) at Python/ceval.c:4329
#34 0x00007ffff7d68ab5 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>,
    kwnames=<optimized out>) at Objects/call.c:396
#35 0x00007ffff7daaaaa in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x5555555b3eb0,
    callable=0x7ffff78cc0d0, tstate=0x555555559bc0) at ./Include/cpython/abstract.h:118
#36 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x5555555b3eb0, callable=<optimized out>)
    at ./Include/cpython/abstract.h:127
#37 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x555555559bc0)
    at Python/ceval.c:5077
#38 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x5555555b3d40, throwflag=<optimized out>)
    at Python/ceval.c:3520
#39 0x00007ffff7da9b2c in _PyEval_EvalFrame (throwflag=0, f=0x5555555b3d40, tstate=0x555555559bc0)
    at ./Include/internal/pycore_ceval.h:40
#40 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=<optimized out>,
    args=<optimized out>, argcount=<optimized out>, kwnames=0x0, kwargs=0x0, kwcount=0, kwstep=2, defs=0x0,
    defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at Python/ceval.c:4329
#41 0x00007ffff7e1f675 in _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>,
    locals=0x7ffff788d480, args=<optimized out>, argcount=<optimized out>, kwnames=<optimized out>, kwargs=0x0,
    kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at Python/ceval.c:4361
#42 0x00007ffff7e1f60d in PyEval_EvalCodeEx (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>,
    args=<optimized out>, argcount=<optimized out>, kws=<optimized out>, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0,
--Type <RET> for more, q to quit, c to continue without paging--
    closure=0x0) at Python/ceval.c:4377
#43 0x00007ffff7e1f5bf in PyEval_EvalCode (co=co@entry=0x7ffff781f500, globals=globals@entry=0x7ffff788d480,
    locals=locals@entry=0x7ffff788d480) at Python/ceval.c:828
#44 0x00007ffff7e326d4 in run_eval_code_obj (tstate=0x555555559bc0, co=0x7ffff781f500, globals=0x7ffff788d480,
    locals=0x7ffff788d480) at Python/pythonrun.c:1221
#45 0x00007ffff7e32666 in run_mod (mod=<optimized out>, filename=<optimized out>, globals=0x7ffff788d480,
    locals=0x7ffff788d480, flags=<optimized out>, arena=<optimized out>) at Python/pythonrun.c:1242
#46 0x00007ffff7d1ded0 in pyrun_file (fp=fp@entry=0x555555559520, filename=filename@entry=0x7ffff790b760,
    start=start@entry=257, globals=globals@entry=0x7ffff788d480, locals=locals@entry=0x7ffff788d480,
    closeit=closeit@entry=1, flags=0x7fffffffe038) at Python/pythonrun.c:1140
#47 0x00007ffff7d1dc27 in pyrun_simple_file (flags=0x7fffffffe038, closeit=1, filename=0x7ffff790b760,
    fp=0x555555559520) at Python/pythonrun.c:450
#48 PyRun_SimpleFileExFlags (fp=fp@entry=0x555555559520, filename=<optimized out>, closeit=closeit@entry=1,
    flags=flags@entry=0x7fffffffe038) at Python/pythonrun.c:483
#49 0x00007ffff7d1e9c3 in PyRun_AnyFileExFlags (fp=fp@entry=0x555555559520, filename=<optimized out>,
    closeit=closeit@entry=1, flags=flags@entry=0x7fffffffe038) at Python/pythonrun.c:92
#50 0x00007ffff7e3acbd in pymain_run_file (cf=0x7fffffffe038, config=0x55555555b580) at Modules/main.c:373
#51 pymain_run_python (exitcode=0x7fffffffe030) at Modules/main.c:598
#52 Py_RunMain () at Modules/main.c:677
#53 0x00007ffff7e3a85d in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:731
#54 0x00007ffff7a6e57d in __libc_start_call_main (main=main@entry=0x555555555050 <main>, argc=argc@entry=2,
    argv=argv@entry=0x7fffffffe258) at ../sysdeps/nptl/libc_start_call_main.h:58
#55 0x00007ffff7a6e630 in __libc_start_main_impl (main=0x555555555050 <main>, argc=2, argv=0x7fffffffe258,
    init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe248)
    at ../csu/libc-start.c:392
#56 0x0000555555555081 in _start () at ../sysdeps/x86_64/start.S:115

Update: the above call stack is not very useful , because the program already went wrong before that.

@trajepl
Copy link
Author

trajepl commented Apr 17, 2024

Any updates? :)

@baijumeswani
Copy link
Collaborator

A few solutions for us to explore:

  1. Try making the pybind symbols private using version_script.lds (the way ort does).
  2. If that does not help, we can try avoiding the use of std::filesystem
  3. And try using a newer GCC compiler.

I'll work on finding the right path forward.

@snnn
Copy link
Member

snnn commented Apr 17, 2024

I think I can confirm the root cause is https://stackoverflow.com/questions/63902528/program-crashes-when-filesystempath-is-destroyed , because after I set a breakpoint at "std::filesystem::__cxx11::path::_M_split_cmpts()", I got the following stacktrace:

#0  0x00007ffff55e14a0 in std::filesystem::__cxx11::path::_M_split_cmpts() () from /lib/libstdc++.so.6
#1  0x00007fff8cc6efb6 in std::filesystem::__cxx11::path::path<char const*, std::filesystem::__cxx11::path> (this=0x7fffffffcdf0,
    __source=@0x7fffffffce58: 0x7fffffffd020 "xxxx") at /usr/include/c++/8/bits/fs_path.h:185
#2  0x00007fff8cc71234 in std::make_unique<Generators::Config, char const*&> () at /usr/include/c++/8/bits/unique_ptr.h:835
#3  0x00007fff8cc6cd0c in Generators::CreateModel (ort_env=..., config_path=0x7fffffffd020 "xxxx") at /ort_genai_src/src/models/model.cpp:345
#4  0x00007fff8cbee330 in Generators::<lambda(const string&)>::operator()(const std::__cxx11::string &) const (__closure=0x5555561a0ad8,
    config_path="xxxx") at /ort_genai_src/src/python/python.cpp:212
#5  0x00007fff8cbefee6 in pybind11::detail::initimpl::factory<Generators::pybind11_init_onnxruntime_genai(pybind11::module_&)::<lambda(const string&)>, pybind11::detail::void_type (*)(), std::shared_ptr<Generators::Model>(const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&), pybind11::detail::void_type()>::<lambda(pybind11::detail::value_and_holder&, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)>::operator()(pybind11::detail::value_and_holder &, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > &) const (this=0x5555561a0ad8, v_h=..., args#0="xxxx") at /ort_genai_src/build/cuda/_deps/pybind11_project-src/include/pybind11/detail/init.h:297
#6  0x00007fff8cbf3a97 in pybind11::detail::argument_loader<pybind11::detail::value_and_holder&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>::call_impl<void, pybind11::detail::initimpl::factory<Func, pybind11::detail::void_type (*)(), Return(Args ...)>::execute(Class&, const Extra& ...) && [with Class = pybind11::class_<Generators::Model, std::shared_ptr<Generators::Model> >; Extra = {}; Func = Generators::pybind11_init_onnxruntime_genai(pybind11::module_&)::<lambda(const string&)>; Return = std::shared_ptr<Generators::Model>; Args = {const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&}]::<lambda(pybind11::detail::value_and_holder&, const std::__cxx11::basic_string<char>&)>&, 0, 1, pybind11::detail::void_type>(pybind11::detail::initimpl::factory<Generators::pybind11_init_onnxruntime_genai(pybind11::module_&)::<lambda(const string&)>, pybind11::detail::void_type (*)(), std::shared_ptr<Generators::Model>(const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&), pybind11::detail::void_type()>::<lambda(pybind11::detail::value_and_holder&, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)> &, std::index_sequence, pybind11::detail::void_type &&) (this=0x7fffffffd010, f=...)
    at /ort_genai_src/build/cuda/_deps/pybind11_project-src/include/pybind11/cast.h:1439
#7  0x00007fff8cbf3985 in pybind11::detail::argument_loader<pybind11::detail::value_and_holder&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>::call<void, pybind11::detail::void_type, pybind11::detail::initimpl::factory<Func, pybind11::detail::void_type (*)(), Return(Args ...)>::execute(Class&, const Extra& ...) && [with Class = pybind11::class_<Generators::Model, std::shared_ptr<Generators::Model> >; Extra = {}; Func = Generators::pybind11_init_onnxruntime_genai(pybind11::module_&)::<lambda(const string&)>; Return = std::shared_ptr<Generators::Model>; Args = {const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&}]::<lambda(pybind11::detail::value_and_holder&, const std::__cxx11::basic_string<char>&)>&>(pybind11::detail::initimpl::factory<Generators::pybind11_init_onnxruntime_genai(pybind11::module_&)::<lambda(const string&)>, pybind11::detail::void_type (*)(), std::shared_ptr<Generators::Model>(const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&), pybind11::detail::void_type()>::<lambda(pybind11::detail::value_and_holder&, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)> &) (this=0x7fffffffd010, f=...) at /ort_genai_src/build/cuda/_deps/pybind11_project-src/include/pybind11/cast.h:1413
#8  0x00007fff8cbf3463 in pybind11::cpp_function::<lambda(pybind11::detail::function_call&)>::operator()(pybind11::detail::function_call &) const (
    this=0x0, call=...) at /ort_genai_src/build/cuda/_deps/pybind11_project-src/include/pybind11/pybind11.h:249
#9  0x00007fff8cbf34e6 in pybind11::cpp_function::<lambda(pybind11::detail::function_call&)>::_FUN(pybind11::detail::function_call &) ()
    at /ort_genai_src/build/cuda/_deps/pybind11_project-src/include/pybind11/pybind11.h:224
#10 0x00007fff8cc049bb in pybind11::cpp_function::dispatcher (self=0x7fff90095ed0, args_in=0x7ffff77cf580, kwargs_in=0x0)
    at /ort_genai_src/build/cuda/_deps/pybind11_project-src/include/pybind11/pybind11.h:929
#11 0x00007ffff7d849d2 in cfunction_call (func=0x7fff9009c220, args=<optimized out>, kwargs=<optimized out>) at Objects/methodobject.c:543
#12 0x00007ffff7d68390 in _PyObject_MakeTpCall (tstate=0x555555559b80, callable=0x7fff9009c220, args=0x7fffffffd7e0, nargs=2, keywords=0x0)
    at Objects/call.c:191

Though onnxruntime_genai.cpython-39-x86_64-linux-gnu.so contains a copy of the function:

$ nm -C  /home/chasun/.local/lib/python3.9/site-packages/onnxruntime_genai/onnxruntime_genai.cpython-39-x86_64-linux-gnu.so |grep std::filesystem::__cxx11::path::_M_split_cmpts
00000000002d2140 T std::filesystem::__cxx11::path::_M_split_cmpts()
00000000001ec838 t std::filesystem::__cxx11::path::_M_split_cmpts() [clone .cold.121]

At runtime from the callstack you can see actually an implementation from /lib/libstdc++.so.6 was used. Given the layout of std::filesystem::path object between GCC 8 and higher GCC versions are different(and incompatible), we should not see one calls into another.

@trajepl
Copy link
Author

trajepl commented Apr 18, 2024

Thanks!

@snnn
Copy link
Member

snnn commented May 24, 2024

I started upgrading the GCC.

snnn added a commit to microsoft/onnxruntime that referenced this issue May 24, 2024
### Description
Use a common set of prebuilt manylinux base images to build the
packages, to avoid building the manylinux part again and again. The base
images can be used in GenAI and other projects too.
This PR also updates the GCC version for inference python CUDA11/CUDA12
builds from 8 to 11. Later on I will update all other CUDA pipelines to
use GCC 11, to avoid the issue described in
onnx/onnx#6047 and
microsoft/onnxruntime-genai#257 .

### Motivation and Context
To extract the common part as a reusable build infra among different
ONNX Runtime projects.
snnn added a commit that referenced this issue May 31, 2024
1. Use an internal prebuilt base image instead of a public image, so
that we do not need to rebuild the manylinux part again and again.
2. The new CUDA 11 base image is based on almalinux 8 instead of ubi8,
so that we can get GCC 11. ubi8 only has GCC 8 and GCC 12, but GCC 12 is
compatible with CUDA 11. So before this change we use GCC 8 in CUDA 11
build. After this change we will use GCC 11 instead.
3. Drop the support for GCC 10 and below.

This PR provides another solution for #257 .
baijumeswani pushed a commit that referenced this issue Jun 12, 2024
1. Use an internal prebuilt base image instead of a public image, so
that we do not need to rebuild the manylinux part again and again.
2. The new CUDA 11 base image is based on almalinux 8 instead of ubi8,
so that we can get GCC 11. ubi8 only has GCC 8 and GCC 12, but GCC 12 is
compatible with CUDA 11. So before this change we use GCC 8 in CUDA 11
build. After this change we will use GCC 11 instead.
3. Drop the support for GCC 10 and below.

This PR provides another solution for #257 .
yf711 pushed a commit to microsoft/onnxruntime that referenced this issue Jun 18, 2024
### Description
Use a common set of prebuilt manylinux base images to build the
packages, to avoid building the manylinux part again and again. The base
images can be used in GenAI and other projects too.
This PR also updates the GCC version for inference python CUDA11/CUDA12
builds from 8 to 11. Later on I will update all other CUDA pipelines to
use GCC 11, to avoid the issue described in
onnx/onnx#6047 and
microsoft/onnxruntime-genai#257 .

### Motivation and Context
To extract the common part as a reusable build infra among different
ONNX Runtime projects.
baijumeswani pushed a commit to microsoft/onnxruntime that referenced this issue Jun 20, 2024
Use a common set of prebuilt manylinux base images to build the
packages, to avoid building the manylinux part again and again. The base
images can be used in GenAI and other projects too.
This PR also updates the GCC version for inference python CUDA11/CUDA12
builds from 8 to 11. Later on I will update all other CUDA pipelines to
use GCC 11, to avoid the issue described in
onnx/onnx#6047 and
microsoft/onnxruntime-genai#257 .

To extract the common part as a reusable build infra among different
ONNX Runtime projects.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants