terminated by signal SIGSEGV (Address boundary error) if import onnxruntime before onnxruntime-genai #257

trajepl · 2024-04-10T06:07:27Z

Cannot create og.Model if user import onnxruntime before the onnxruntime-gneai.
The job is terminated by signal SIGSEGV (Address boundary error)

import onnxruntime as ort

def genai_run(prompt, model_path, max_length=200):

    import time

    import onnxruntime_genai as og

    print("Loading model...")
    app_started_timestamp = time.time()
    model = og.Model(model_path)
    model_loaded_timestamp = time.time()
    print("Model loaded in {:.2f} seconds".format(model_loaded_timestamp - app_started_timestamp))
    tokenizer = og.Tokenizer(model)
    tokenizer_stream = tokenizer.create_stream()
    input_tokens = tokenizer.encode(prompt)
    started_timestamp = time.time()

    print("Creating generator ...")
    params = og.GeneratorParams(model)
    params.set_search_options(
        {
            "do_sample": False,
            "max_length": max_length,
            "min_length": 0,
            "top_p": 0.9,
            "top_k": 40,
            "temperature": 1.0,
            "repetition_penalty": 1.0,
        }
    )
    params.input_ids = input_tokens
    generator = og.Generator(model, params)
    print("Generator created")

    first = True
    new_tokens = []

    while not generator.is_done():
        generator.compute_logits()
        generator.generate_next_token()
        if first:
            first_token_timestamp = time.time()
            first = False

        new_token = generator.get_next_tokens()[0]
        print(tokenizer_stream.decode(new_token), end="")
        new_tokens.append(new_token)

    run_time = time.time() - started_timestamp
    print(
        f"Prompt tokens: {len(input_tokens)}, New tokens: {len(new_tokens)},"
        f" Time to first: {(first_token_timestamp - started_timestamp):.2f}s,"
        f" New tokens per second: {len(new_tokens)/run_time:.2f} tps"
    )


model_path = "xxxx"
genai_run("helo world", model_path)

How to reproduce:

python -m onnxruntime_genai.models.builder -m microsoft/phi-2 -e cpu -p int4 -o ./models/phi2
run above scripts

trajepl · 2024-04-10T06:17:28Z

Not only onnxruntime, it failed when import torch, transformer.

baijumeswani · 2024-04-10T17:55:33Z

Could you share more details on this error? And if possible, a stacktrace. That would help me reproduce this.

I tried running your script and was able to successfully execute it. Here is the output:

Loading model...
Model loaded in 15.46 seconds
Creating generator ...
Generator created
.

In the end, the friends realized that their journey had not only deepened their understanding of the world but also strengthened their bond. They had learned the importance of embracing diversity, respecting different perspectives, and finding common ground.

As they bid farewell to the Enchanted Forest, they carried with them the memories of their adventure and the wisdom gained from their encounters. They knew that their shared experiences would forever shape their lives and inspire them to continue exploring the wonders of the world.

And so, the three friends returned to their small town, forever changed by their journey. They became advocates for unity, spreading the message of acceptance and understanding wherever they went. Their story became a testament to the power of friendship, curiosity, and the pursuit of knowledge.

As they looked back on their adventure, they realized that the Enchanted Forest had not only taught them about the world but also about themselves. They had discovered their own strengths, overcome their fears, andPrompt tokens: 3, New tokens: 197, Time to first: 0.25s, New tokens per second: 14.55 tps

I am using onnxruntime-genai 0.1.0 from PyPI for this test. Could you share the following information:

Platform: windows, linux...
ort version
ort-genai version
python version
torch version
transformers version
stacktrace if possible

trajepl · 2024-04-11T03:01:07Z

Platform: linux ubuntu20.04
ort version: onnxruntime-gpu 1.17.1
ort-genai version: onnxruntime-genai-cuda (pip install onnxruntime-genai-cuda --pre --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-genai/pypi/simple/)

I tested cpu version of ort-genai just now and it worked for me. The failure happened on genai-cuda version no matter we target to run the mode in cpu or gpu.
python version: 3.8
torch version: 2.2.2
transformers version: 4.39.3
stacktrace if possible: only terminated by signal SIGSEGV (Address boundary error) was throwed when model = og.Model(model_path)

trajepl · 2024-04-11T03:22:35Z

Thanks for you trying @baijumeswani :) , please let me know if anything is missed. Thanks!!

baijumeswani · 2024-04-11T17:55:14Z

Ok, I was able to reproduce the problem on my end. I was also able to narrow down the problem to be related to std::filesystem.

Summary of the problem:

onnxruntime-genai-cuda is built with GCC 8.5.
onnxruntime-genai (cpu variant) is built with GCC 12.
onnxruntime-gpu and probably torch/transformers use GCC > 8 for their build.

The problem is introduced because when importing onnxruntime/torch/transformers first, the symbols for std::filesystem are loaded from libstdc++.so.6. These symbols are incompatible with the symbols needed for std::filesystem for GCC 8. There is more meaningful information provided here: https://bugs.launchpad.net/ubuntu/+source/gcc-8/+bug/1824721/comments/6

I think to resolve this problem, we might need publish a patch release that is built with a higher GCC version.

cc @jchen351

snnn · 2024-04-11T20:56:45Z

Platform: linux ubuntu20.04

ort version: onnxruntime-gpu 1.17.1

ort-genai version: onnxruntime-genai-cuda (pip install onnxruntime-genai-cuda --pre --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-genai/pypi/simple/)

I can reproduce the problem on Mariner2 Linux. Supposedly the two packages use the same compiler and there is no discrepancy between them. I could see the genai package uses a fs implementation from itself, instead of libstdc++.

snnn · 2024-04-11T20:58:52Z

I am rebuilding the packages with symbols.

snnn · 2024-04-12T03:50:13Z

Now I have an onnxruntime-gpu debug package, I don't have a genai debug package yet. So I still do not have enough clues. But I see when it crashed the callstack has both packages' native code there. It seems that the onnxruntime-gpu package somehow calls into GenAI, which I didn't expect to happen. Is it possible that the onnxruntime-gpu code constructed a std::filesystem::path object then passed it to the GenAI package's C++ code?

snnn · 2024-04-12T03:51:35Z

Do the two packages pass C++ objects around?

baijumeswani · 2024-04-12T04:23:27Z

No, there should be no interaction between the two packages.

My hypothesis is that this is caused because we statically link against stdc++fs and when importing any python module that requires libstdc++.so, the symbols are not distinguishable (between the statically linked code and the shared lib) and ort-genai invokes std::filesystem symbols from libstdc++.so which would result in the crash.

baijumeswani · 2024-04-12T04:24:31Z

I am guessing that if we tried doing this same thing on an ubuntu 18.04 machine, we wouldn't see this crash.

snnn · 2024-04-12T17:26:28Z

But when it crashed in the callstack I see both packages' native code, which is abnormal.

baijumeswani · 2024-04-12T17:30:31Z

ort-genai calls into onnxruntime.so library which is embedded inside the ort-genai python package. Are you sure you saw native code from ort python package in the callstack? Or from ort-genai calling into ort.so inside the ort-genai python package? Could you paste your callstack?

snnn · 2024-04-12T17:35:12Z

Native code from ort python package in the callstack. I am 100% sure on that.

snnn · 2024-04-12T17:36:56Z

Here is the callstack.

#0  0x00007fff8c858d57 in std::filesystem::__cxx11::path::~path() ()
   from /home/chasun/.local/lib/python3.9/site-packages/onnxruntime_genai/onnxruntime_genai.cpython-39-x86_64-linux-gnu.so
#1  0x00007fff8c858d74 in std::filesystem::__cxx11::path::~path() ()
   from /home/chasun/.local/lib/python3.9/site-packages/onnxruntime_genai/onnxruntime_genai.cpython-39-x86_64-linux-gnu.so
#2  0x00007fff8c836972 in ?? ()
   from /home/chasun/.local/lib/python3.9/site-packages/onnxruntime_genai/onnxruntime_genai.cpython-39-x86_64-linux-gnu.so
#3  0x00007fff8c84159a in ?? ()
   from /home/chasun/.local/lib/python3.9/site-packages/onnxruntime_genai/onnxruntime_genai.cpython-39-x86_64-linux-gnu.so
#4  0x00007fff8c84fb5f in ?? ()
   from /home/chasun/.local/lib/python3.9/site-packages/onnxruntime_genai/onnxruntime_genai.cpython-39-x86_64-linux-gnu.so
#5  0x00007ffff7d849d2 in cfunction_call (func=0x7fff8fd0fef0, args=<optimized out>, kwargs=<optimized out>)
    at Objects/methodobject.c:543
#6  0x00007ffff7d68390 in _PyObject_MakeTpCall (tstate=0x555555559b80, callable=0x7fff8fd0fef0, args=0x7fffffffd7e0,
    nargs=2, keywords=0x0) at Objects/call.c:191
#7  0x00007ffff7cc351c in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7fffffffd7e0,
    callable=0x7fff8fd0fef0, tstate=0x555555559b80) at ./Include/cpython/abstract.h:116
#8  _PyObject_VectorcallTstate (kwnames=0x0, nargsf=2, args=0x7fffffffd7e0, callable=0x7fff8fd0fef0,
    tstate=0x555555559b80) at ./Include/cpython/abstract.h:103
#9  method_vectorcall (method=<optimized out>, args=0x7ffff784a838, nargsf=<optimized out>, kwnames=0x0)
    at Objects/classobject.c:83
#10 0x00007ffff7d8f6a0 in slot_tp_init (self=self@entry=0x7fff8fcffa70, args=args@entry=0x7ffff784a820,
    kwds=kwds@entry=0x0) at Objects/typeobject.c:6974
#11 0x00007ffff7d8e59f in type_call (type=<optimized out>, args=0x7ffff784a820, kwds=0x0) at Objects/typeobject.c:1028
#12 0x00007ffff528b911 in pybind11::detail::pybind11_meta_call (type=0x55555619b370, args=0x7ffff784a820, kwargs=0x0)
--Type <RET> for more, q to quit, c to continue without paging--
    at /build/Debug/_deps/pybind11_project-src/include/pybind11/detail/class.h:187
#13 0x00007ffff7d68390 in _PyObject_MakeTpCall (tstate=0x555555559b80, callable=0x55555619b370, args=0x5555555904b0,
    nargs=1, keywords=0x0) at Objects/call.c:191
#14 0x00007ffff7daf4ee in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=9223372036854775809, args=0x5555555904b0,
    callable=0x55555619b370, tstate=<optimized out>) at ./Include/cpython/abstract.h:116
#15 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=9223372036854775809, args=0x5555555904b0, callable=0x55555619b370,
    tstate=<optimized out>) at ./Include/cpython/abstract.h:103
#16 PyObject_Vectorcall (kwnames=0x0, nargsf=9223372036854775809, args=0x5555555904b0, callable=0x55555619b370)
    at ./Include/cpython/abstract.h:127
#17 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x555555559b80)
    at Python/ceval.c:5077
#18 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x5555555902a0, throwflag=<optimized out>)
    at Python/ceval.c:3489
#19 0x00007ffff7da9b2c in _PyEval_EvalFrame (throwflag=0, f=0x5555555902a0, tstate=0x555555559b80)
    at ./Include/internal/pycore_ceval.h:40
#20 _PyEval_EvalCode (tstate=tstate@entry=0x555555559b80, _co=<optimized out>, globals=<optimized out>,
    locals=locals@entry=0x0, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, kwargs=0x5555555b3f80,
    kwcount=0, kwstep=1, defs=0x7ffff784ada8, defcount=1, kwdefs=0x0, closure=0x0, name=0x7ffff781d5b0,
    qualname=0x7ffff781d5b0) at Python/ceval.c:4329
#21 0x00007ffff7d68ab5 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>,
    kwnames=<optimized out>) at Objects/call.c:396
#22 0x00007ffff7daaaaa in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x5555555b3f70,
    callable=0x7ffff78cc160, tstate=0x555555559b80) at ./Include/cpython/abstract.h:118
#23 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x5555555b3f70, callable=<optimized out>)
    at ./Include/cpython/abstract.h:127
#24 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x555555559b80)
    at Python/ceval.c:5077
#25 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x5555555b3e00, throwflag=<optimized out>)
    at Python/ceval.c:3520
--Type <RET> for more, q to quit, c to continue without paging--
#26 0x00007ffff7da9b2c in _PyEval_EvalFrame (throwflag=0, f=0x5555555b3e00, tstate=0x555555559b80)
    at ./Include/internal/pycore_ceval.h:40
#27 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=<optimized out>,
    args=<optimized out>, argcount=<optimized out>, kwnames=0x0, kwargs=0x0, kwcount=0, kwstep=2, defs=0x0,
    defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at Python/ceval.c:4329
#28 0x00007ffff7e1f675 in _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>,
    locals=0x7ffff788d440, args=<optimized out>, argcount=<optimized out>, kwnames=<optimized out>, kwargs=0x0,
    kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at Python/ceval.c:4361
#29 0x00007ffff7e1f60d in PyEval_EvalCodeEx (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>,
    args=<optimized out>, argcount=<optimized out>, kws=<optimized out>, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0,
    closure=0x0) at Python/ceval.c:4377
#30 0x00007ffff7e1f5bf in PyEval_EvalCode (co=co@entry=0x7ffff781f500, globals=globals@entry=0x7ffff788d440,
    locals=locals@entry=0x7ffff788d440) at Python/ceval.c:828
#31 0x00007ffff7e326d4 in run_eval_code_obj (tstate=0x555555559b80, co=0x7ffff781f500, globals=0x7ffff788d440,
    locals=0x7ffff788d440) at Python/pythonrun.c:1221
#32 0x00007ffff7e32666 in run_mod (mod=<optimized out>, filename=<optimized out>, globals=0x7ffff788d440,
    locals=0x7ffff788d440, flags=<optimized out>, arena=<optimized out>) at Python/pythonrun.c:1242
#33 0x00007ffff7d1ded0 in pyrun_file (fp=fp@entry=0x555555559520, filename=filename@entry=0x7ffff790b760,
    start=start@entry=257, globals=globals@entry=0x7ffff788d440, locals=locals@entry=0x7ffff788d440,
    closeit=closeit@entry=1, flags=0x7fffffffe098) at Python/pythonrun.c:1140
#34 0x00007ffff7d1dc27 in pyrun_simple_file (flags=0x7fffffffe098, closeit=1, filename=0x7ffff790b760,
    fp=0x555555559520) at Python/pythonrun.c:450
#35 PyRun_SimpleFileExFlags (fp=fp@entry=0x555555559520, filename=<optimized out>, closeit=closeit@entry=1,
    flags=flags@entry=0x7fffffffe098) at Python/pythonrun.c:483
#36 0x00007ffff7d1e9c3 in PyRun_AnyFileExFlags (fp=fp@entry=0x555555559520, filename=<optimized out>,
    closeit=closeit@entry=1, flags=flags@entry=0x7fffffffe098) at Python/pythonrun.c:92
#37 0x00007ffff7e3acbd in pymain_run_file (cf=0x7fffffffe098, config=0x55555555b580) at Modules/main.c:373
#38 pymain_run_python (exitcode=0x7fffffffe090) at Modules/main.c:598
#39 Py_RunMain () at Modules/main.c:677
--Type <RET> for more, q to quit, c to continue without paging--
#40 0x00007ffff7e3a85d in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:731
#41 0x00007ffff7a6e57d in __libc_start_call_main (main=main@entry=0x555555555050 <main>, argc=argc@entry=2,
    argv=argv@entry=0x7fffffffe2b8) at ../sysdeps/nptl/libc_start_call_main.h:58
#42 0x00007ffff7a6e630 in __libc_start_main_impl (main=0x555555555050 <main>, argc=2, argv=0x7fffffffe2b8,
    init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe2a8)
    at ../csu/libc-start.c:392
#43 0x0000555555555081 in _start () at ../sysdeps/x86_64/start.S:115

snnn · 2024-04-12T17:38:39Z

The function call at the line "# 12 0x00007ffff528b911 in pybind11::detail::pybind11_meta_call" was from ORT's python package.

pybind11_meta_call + 77 in section .text of /home/chasun/.local/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_pybind11_state.cpython-39-x86_64-linux-gnu.so

baijumeswani · 2024-04-12T17:42:41Z

It is unexpected for ort and ort-genai to share any objects specially since the python script does not invoke anything on onnxruntime (except for importing it and the crash does not happen during importing onnxruntime).

snnn · 2024-04-12T19:14:03Z

I got a genai debug package, but the callstack is wired. But, I found an issue.
When I ran

nm -C -g --defined-only /home/chasun/.local/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_pybind11_state.cpython-39-x86_64-linux-gnu.so

It only prints two things:

00000000000e8595 T PyInit_onnxruntime_pybind11_state
0000000000000000 A VERS_1.0

However, with the same command the GenAI package's binary prints a lot of stdc++ symbols. Which means you didn't hide the symbols per suggestion from https://gcc.gnu.org/wiki/Visibility

snnn · 2024-04-12T19:15:59Z

Does GenAI has this file? python/version_script.lds

snnn · 2024-04-12T19:18:07Z

The symbol exported from GenAI should be PyInit_onnxruntime_genai

snnn · 2024-04-13T00:37:49Z

Got a new callstack with GenAI's debug symbols.

#0  0x00007fff8cc5ec25 in std::__cxx1998::vector<std::filesystem::__cxx11::path::_Cmpt, std::allocator<std::filesystem::__cxx11::path::_Cmpt> >::~vector (this=0x23, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/stl_vector.h:567
#1  0x00007fff8cc5bb26 in std::filesystem::__cxx11::path::~path (this=0x3, __in_chrg=<optimized out>)
    at /usr/include/c++/8/bits/fs_path.h:209
#2  0x00007fff8cc62fb8 in std::filesystem::__cxx11::path::_Cmpt::~_Cmpt (this=0x3, __in_chrg=<optimized out>)
    at /usr/include/c++/8/bits/fs_path.h:644
#3  0x00007fff8cc62fd3 in std::_Destroy<std::filesystem::__cxx11::path::_Cmpt> (__pointer=0x3)
    at /usr/include/c++/8/bits/stl_construct.h:98
#4  0x00007fff8cc61efd in std::_Destroy_aux<false>::__destroy<std::filesystem::__cxx11::path::_Cmpt*> (__first=0x3,
    __last=0x0) at /usr/include/c++/8/bits/stl_construct.h:108
#5  0x00007fff8cc60cd5 in std::_Destroy<std::filesystem::__cxx11::path::_Cmpt*> (__first=0x3, __last=0x0)
    at /usr/include/c++/8/bits/stl_construct.h:137
#6  0x00007fff8cc5fcc5 in std::_Destroy<std::filesystem::__cxx11::path::_Cmpt*, std::filesystem::__cxx11::path::_Cmpt>
    (__first=0x3, __last=0x0) at /usr/include/c++/8/bits/stl_construct.h:206
#7  0x00007fff8cc5ec3b in std::__cxx1998::vector<std::filesystem::__cxx11::path::_Cmpt, std::allocator<std::filesystem::__cxx11::path::_Cmpt> >::~vector (this=0x7fffffffcdb0, __in_chrg=<optimized out>)
    at /usr/include/c++/8/bits/stl_vector.h:567
#8  0x00007fff8cc5bb26 in std::filesystem::__cxx11::path::~path (this=0x7fffffffcd90, __in_chrg=<optimized out>)
    at /usr/include/c++/8/bits/fs_path.h:209
#9  0x00007fff8cc71291 in std::make_unique<Generators::Config, char const*&> ()
    at /usr/include/c++/8/bits/unique_ptr.h:835
#10 0x00007fff8cc6cd0c in Generators::CreateModel (ort_env=..., config_path=0x7fffffffcfc0 "xxxx")
    at /ort_genai_src/src/models/model.cpp:345
#11 0x00007fff8cbee330 in Generators::<lambda(const string&)>::operator()(const std::__cxx11::string &) const (
    __closure=0x5555561a0c58, config_path="xxxx") at /ort_genai_src/src/python/python.cpp:212
#12 0x00007fff8cbefee6 in pybind11::detail::initimpl::factory<Generators::pybind11_init_onnxruntime_genai(pybind11::module_&)::<lambda(const string&)>, pybind11::detail::void_type (*)(), std::shared_ptr<Generators::Model>(const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&), pybind11::detail::void_type()>::<lambda(pybind11::detail::value_and_holder&, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)>::op--Type <RET> for more, q to quit, c to continue without paging--
erator()(pybind11::detail::value_and_holder &, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > &) const (this=0x5555561a0c58, v_h=..., args#0="xxxx")
    at /ort_genai_src/build/cuda/_deps/pybind11_project-src/include/pybind11/detail/init.h:297
#13 0x00007fff8cbf3a97 in pybind11::detail::argument_loader<pybind11::detail::value_and_holder&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>::call_impl<void, pybind11::detail::initimpl::factory<Func, pybind11::detail::void_type (*)(), Return(Args ...)>::execute(Class&, const Extra& ...) && [with Class = pybind11::class_<Generators::Model, std::shared_ptr<Generators::Model> >; Extra = {}; Func = Generators::pybind11_init_onnxruntime_genai(pybind11::module_&)::<lambda(const string&)>; Return = std::shared_ptr<Generators::Model>; Args = {const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&}]::<lambda(pybind11::detail::value_and_holder&, const std::__cxx11::basic_string<char>&)>&, 0, 1, pybind11::detail::void_type>(pybind11::detail::initimpl::factory<Generators::pybind11_init_onnxruntime_genai(pybind11::module_&)::<lambda(const string&)>, pybind11::detail::void_type (*)(), std::shared_ptr<Generators::Model>(const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&), pybind11::detail::void_type()>::<lambda(pybind11::detail::value_and_holder&, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)> &, std::index_sequence, pybind11::detail::void_type &&) (
    this=0x7fffffffcfb0, f=...) at /ort_genai_src/build/cuda/_deps/pybind11_project-src/include/pybind11/cast.h:1439
#14 0x00007fff8cbf3985 in pybind11::detail::argument_loader<pybind11::detail::value_and_holder&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>::call<void, pybind11::detail::void_type, pybind11::detail::initimpl::factory<Func, pybind11::detail::void_type (*)(), Return(Args ...)>::execute(Class&, const Extra& ...) && [with Class = pybind11::class_<Generators::Model, std::shared_ptr<Generators::Model> >; Extra = {}; Func = Generators::pybind11_init_onnxruntime_genai(pybind11::module_&)::<lambda(const string&)>; Return = std::shared_ptr<Generators::Model>; Args = {const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&}]::<lambda(pybind11::detail::value_and_holder&, const std::__cxx11::basic_string<char>&)>&>(pybind11::detail::initimpl::factory<Generators::pybind11_init_onnxruntime_genai(pybind11::module_&)::<lambda(const string&)>, pybind11::detail::void_type (*)(), std::shared_ptr<Generators::Model>(const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&), pybind11::detail::void_type()>::<lambda(pybind11::detail::value_and_holder&, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)> &) (this=0x7fffffffcfb0, f=...)
    at /ort_genai_src/build/cuda/_deps/pybind11_project-src/include/pybind11/cast.h:1413
#15 0x00007fff8cbf3463 in pybind11::cpp_function::<lambda(pybind11::detail::function_call&)>::operator()(pybind11::detail::function_call &) const (this=0x0, call=...)
--Type <RET> for more, q to quit, c to continue without paging--
    at /ort_genai_src/build/cuda/_deps/pybind11_project-src/include/pybind11/pybind11.h:249
#16 0x00007fff8cbf34e6 in pybind11::cpp_function::<lambda(pybind11::detail::function_call&)>::_FUN(pybind11::detail::function_call &) () at /ort_genai_src/build/cuda/_deps/pybind11_project-src/include/pybind11/pybind11.h:224
#17 0x00007fff8cc049bb in pybind11::cpp_function::dispatcher (self=0x7fff90097ed0, args_in=0x7ffff77cc500,
    kwargs_in=0x0) at /ort_genai_src/build/cuda/_deps/pybind11_project-src/include/pybind11/pybind11.h:929
#18 0x00007ffff7d849d2 in cfunction_call (func=0x7fff9009e270, args=<optimized out>, kwargs=<optimized out>)
    at Objects/methodobject.c:543
#19 0x00007ffff7d68390 in _PyObject_MakeTpCall (tstate=0x555555559bc0, callable=0x7fff9009e270, args=0x7fffffffd780,
    nargs=2, keywords=0x0) at Objects/call.c:191
#20 0x00007ffff7cc351c in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7fffffffd780,
    callable=0x7fff9009e270, tstate=0x555555559bc0) at ./Include/cpython/abstract.h:116
#21 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=2, args=0x7fffffffd780, callable=0x7fff9009e270,
    tstate=0x555555559bc0) at ./Include/cpython/abstract.h:103
#22 method_vectorcall (method=<optimized out>, args=0x7ffff784a808, nargsf=<optimized out>, kwnames=0x0)
    at Objects/classobject.c:83
#23 0x00007ffff7d8f6a0 in slot_tp_init (self=self@entry=0x7fff9007b270, args=args@entry=0x7ffff784a7f0,
    kwds=kwds@entry=0x0) at Objects/typeobject.c:6974
#24 0x00007ffff7d8e59f in type_call (type=<optimized out>, args=0x7ffff784a7f0, kwds=0x0) at Objects/typeobject.c:1028
#25 0x00007fff8cbfe650 in pybind11::detail::pybind11_meta_call (type=0x5555561a0780, args=0x7ffff784a7f0, kwargs=0x0)
    at /ort_genai_src/build/cuda/_deps/pybind11_project-src/include/pybind11/detail/class.h:187
#26 0x00007ffff7d68390 in _PyObject_MakeTpCall (tstate=0x555555559bc0, callable=0x5555561a0780, args=0x555555e01050,
    nargs=1, keywords=0x0) at Objects/call.c:191
#27 0x00007ffff7daf4ee in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=9223372036854775809, args=0x555555e01050,
    callable=0x5555561a0780, tstate=<optimized out>) at ./Include/cpython/abstract.h:116
#28 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=9223372036854775809, args=0x555555e01050, callable=0x5555561a0780,
    tstate=<optimized out>) at ./Include/cpython/abstract.h:103
#29 PyObject_Vectorcall (kwnames=0x0, nargsf=9223372036854775809, args=0x555555e01050, callable=0x5555561a0780)
    at ./Include/cpython/abstract.h:127
#30 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x555555559bc0)
--Type <RET> for more, q to quit, c to continue without paging--
    at Python/ceval.c:5077
#31 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x555555e00e40, throwflag=<optimized out>)
    at Python/ceval.c:3489
#32 0x00007ffff7da9b2c in _PyEval_EvalFrame (throwflag=0, f=0x555555e00e40, tstate=0x555555559bc0)
    at ./Include/internal/pycore_ceval.h:40
#33 _PyEval_EvalCode (tstate=tstate@entry=0x555555559bc0, _co=<optimized out>, globals=<optimized out>,
    locals=locals@entry=0x0, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, kwargs=0x5555555b3ec0,
    kwcount=0, kwstep=1, defs=0x7ffff784ad78, defcount=1, kwdefs=0x0, closure=0x0, name=0x7ffff781d5f0,
    qualname=0x7ffff781d5f0) at Python/ceval.c:4329
#34 0x00007ffff7d68ab5 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>,
    kwnames=<optimized out>) at Objects/call.c:396
#35 0x00007ffff7daaaaa in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x5555555b3eb0,
    callable=0x7ffff78cc0d0, tstate=0x555555559bc0) at ./Include/cpython/abstract.h:118
#36 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x5555555b3eb0, callable=<optimized out>)
    at ./Include/cpython/abstract.h:127
#37 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x555555559bc0)
    at Python/ceval.c:5077
#38 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x5555555b3d40, throwflag=<optimized out>)
    at Python/ceval.c:3520
#39 0x00007ffff7da9b2c in _PyEval_EvalFrame (throwflag=0, f=0x5555555b3d40, tstate=0x555555559bc0)
    at ./Include/internal/pycore_ceval.h:40
#40 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=<optimized out>,
    args=<optimized out>, argcount=<optimized out>, kwnames=0x0, kwargs=0x0, kwcount=0, kwstep=2, defs=0x0,
    defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at Python/ceval.c:4329
#41 0x00007ffff7e1f675 in _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>,
    locals=0x7ffff788d480, args=<optimized out>, argcount=<optimized out>, kwnames=<optimized out>, kwargs=0x0,
    kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at Python/ceval.c:4361
#42 0x00007ffff7e1f60d in PyEval_EvalCodeEx (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>,
    args=<optimized out>, argcount=<optimized out>, kws=<optimized out>, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0,
--Type <RET> for more, q to quit, c to continue without paging--
    closure=0x0) at Python/ceval.c:4377
#43 0x00007ffff7e1f5bf in PyEval_EvalCode (co=co@entry=0x7ffff781f500, globals=globals@entry=0x7ffff788d480,
    locals=locals@entry=0x7ffff788d480) at Python/ceval.c:828
#44 0x00007ffff7e326d4 in run_eval_code_obj (tstate=0x555555559bc0, co=0x7ffff781f500, globals=0x7ffff788d480,
    locals=0x7ffff788d480) at Python/pythonrun.c:1221
#45 0x00007ffff7e32666 in run_mod (mod=<optimized out>, filename=<optimized out>, globals=0x7ffff788d480,
    locals=0x7ffff788d480, flags=<optimized out>, arena=<optimized out>) at Python/pythonrun.c:1242
#46 0x00007ffff7d1ded0 in pyrun_file (fp=fp@entry=0x555555559520, filename=filename@entry=0x7ffff790b760,
    start=start@entry=257, globals=globals@entry=0x7ffff788d480, locals=locals@entry=0x7ffff788d480,
    closeit=closeit@entry=1, flags=0x7fffffffe038) at Python/pythonrun.c:1140
#47 0x00007ffff7d1dc27 in pyrun_simple_file (flags=0x7fffffffe038, closeit=1, filename=0x7ffff790b760,
    fp=0x555555559520) at Python/pythonrun.c:450
#48 PyRun_SimpleFileExFlags (fp=fp@entry=0x555555559520, filename=<optimized out>, closeit=closeit@entry=1,
    flags=flags@entry=0x7fffffffe038) at Python/pythonrun.c:483
#49 0x00007ffff7d1e9c3 in PyRun_AnyFileExFlags (fp=fp@entry=0x555555559520, filename=<optimized out>,
    closeit=closeit@entry=1, flags=flags@entry=0x7fffffffe038) at Python/pythonrun.c:92
#50 0x00007ffff7e3acbd in pymain_run_file (cf=0x7fffffffe038, config=0x55555555b580) at Modules/main.c:373
#51 pymain_run_python (exitcode=0x7fffffffe030) at Modules/main.c:598
#52 Py_RunMain () at Modules/main.c:677
#53 0x00007ffff7e3a85d in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:731
#54 0x00007ffff7a6e57d in __libc_start_call_main (main=main@entry=0x555555555050 <main>, argc=argc@entry=2,
    argv=argv@entry=0x7fffffffe258) at ../sysdeps/nptl/libc_start_call_main.h:58
#55 0x00007ffff7a6e630 in __libc_start_main_impl (main=0x555555555050 <main>, argc=2, argv=0x7fffffffe258,
    init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe248)
    at ../csu/libc-start.c:392
#56 0x0000555555555081 in _start () at ../sysdeps/x86_64/start.S:115

Update: the above call stack is not very useful , because the program already went wrong before that.

trajepl · 2024-04-17T03:46:34Z

Any updates? :)

baijumeswani · 2024-04-17T16:34:07Z

A few solutions for us to explore:

Try making the pybind symbols private using version_script.lds (the way ort does).
If that does not help, we can try avoiding the use of std::filesystem
And try using a newer GCC compiler.

I'll work on finding the right path forward.

snnn · 2024-04-17T21:05:27Z

I think I can confirm the root cause is https://stackoverflow.com/questions/63902528/program-crashes-when-filesystempath-is-destroyed , because after I set a breakpoint at "std::filesystem::__cxx11::path::_M_split_cmpts()", I got the following stacktrace:

#0  0x00007ffff55e14a0 in std::filesystem::__cxx11::path::_M_split_cmpts() () from /lib/libstdc++.so.6
#1  0x00007fff8cc6efb6 in std::filesystem::__cxx11::path::path<char const*, std::filesystem::__cxx11::path> (this=0x7fffffffcdf0,
    __source=@0x7fffffffce58: 0x7fffffffd020 "xxxx") at /usr/include/c++/8/bits/fs_path.h:185
#2  0x00007fff8cc71234 in std::make_unique<Generators::Config, char const*&> () at /usr/include/c++/8/bits/unique_ptr.h:835
#3  0x00007fff8cc6cd0c in Generators::CreateModel (ort_env=..., config_path=0x7fffffffd020 "xxxx") at /ort_genai_src/src/models/model.cpp:345
#4  0x00007fff8cbee330 in Generators::<lambda(const string&)>::operator()(const std::__cxx11::string &) const (__closure=0x5555561a0ad8,
    config_path="xxxx") at /ort_genai_src/src/python/python.cpp:212
#5  0x00007fff8cbefee6 in pybind11::detail::initimpl::factory<Generators::pybind11_init_onnxruntime_genai(pybind11::module_&)::<lambda(const string&)>, pybind11::detail::void_type (*)(), std::shared_ptr<Generators::Model>(const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&), pybind11::detail::void_type()>::<lambda(pybind11::detail::value_and_holder&, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)>::operator()(pybind11::detail::value_and_holder &, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > &) const (this=0x5555561a0ad8, v_h=..., args#0="xxxx") at /ort_genai_src/build/cuda/_deps/pybind11_project-src/include/pybind11/detail/init.h:297
#6  0x00007fff8cbf3a97 in pybind11::detail::argument_loader<pybind11::detail::value_and_holder&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>::call_impl<void, pybind11::detail::initimpl::factory<Func, pybind11::detail::void_type (*)(), Return(Args ...)>::execute(Class&, const Extra& ...) && [with Class = pybind11::class_<Generators::Model, std::shared_ptr<Generators::Model> >; Extra = {}; Func = Generators::pybind11_init_onnxruntime_genai(pybind11::module_&)::<lambda(const string&)>; Return = std::shared_ptr<Generators::Model>; Args = {const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&}]::<lambda(pybind11::detail::value_and_holder&, const std::__cxx11::basic_string<char>&)>&, 0, 1, pybind11::detail::void_type>(pybind11::detail::initimpl::factory<Generators::pybind11_init_onnxruntime_genai(pybind11::module_&)::<lambda(const string&)>, pybind11::detail::void_type (*)(), std::shared_ptr<Generators::Model>(const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&), pybind11::detail::void_type()>::<lambda(pybind11::detail::value_and_holder&, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)> &, std::index_sequence, pybind11::detail::void_type &&) (this=0x7fffffffd010, f=...)
    at /ort_genai_src/build/cuda/_deps/pybind11_project-src/include/pybind11/cast.h:1439
#7  0x00007fff8cbf3985 in pybind11::detail::argument_loader<pybind11::detail::value_and_holder&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>::call<void, pybind11::detail::void_type, pybind11::detail::initimpl::factory<Func, pybind11::detail::void_type (*)(), Return(Args ...)>::execute(Class&, const Extra& ...) && [with Class = pybind11::class_<Generators::Model, std::shared_ptr<Generators::Model> >; Extra = {}; Func = Generators::pybind11_init_onnxruntime_genai(pybind11::module_&)::<lambda(const string&)>; Return = std::shared_ptr<Generators::Model>; Args = {const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&}]::<lambda(pybind11::detail::value_and_holder&, const std::__cxx11::basic_string<char>&)>&>(pybind11::detail::initimpl::factory<Generators::pybind11_init_onnxruntime_genai(pybind11::module_&)::<lambda(const string&)>, pybind11::detail::void_type (*)(), std::shared_ptr<Generators::Model>(const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&), pybind11::detail::void_type()>::<lambda(pybind11::detail::value_and_holder&, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)> &) (this=0x7fffffffd010, f=...) at /ort_genai_src/build/cuda/_deps/pybind11_project-src/include/pybind11/cast.h:1413
#8  0x00007fff8cbf3463 in pybind11::cpp_function::<lambda(pybind11::detail::function_call&)>::operator()(pybind11::detail::function_call &) const (
    this=0x0, call=...) at /ort_genai_src/build/cuda/_deps/pybind11_project-src/include/pybind11/pybind11.h:249
#9  0x00007fff8cbf34e6 in pybind11::cpp_function::<lambda(pybind11::detail::function_call&)>::_FUN(pybind11::detail::function_call &) ()
    at /ort_genai_src/build/cuda/_deps/pybind11_project-src/include/pybind11/pybind11.h:224
#10 0x00007fff8cc049bb in pybind11::cpp_function::dispatcher (self=0x7fff90095ed0, args_in=0x7ffff77cf580, kwargs_in=0x0)
    at /ort_genai_src/build/cuda/_deps/pybind11_project-src/include/pybind11/pybind11.h:929
#11 0x00007ffff7d849d2 in cfunction_call (func=0x7fff9009c220, args=<optimized out>, kwargs=<optimized out>) at Objects/methodobject.c:543
#12 0x00007ffff7d68390 in _PyObject_MakeTpCall (tstate=0x555555559b80, callable=0x7fff9009c220, args=0x7fffffffd7e0, nargs=2, keywords=0x0)
    at Objects/call.c:191

Though onnxruntime_genai.cpython-39-x86_64-linux-gnu.so contains a copy of the function:

$ nm -C  /home/chasun/.local/lib/python3.9/site-packages/onnxruntime_genai/onnxruntime_genai.cpython-39-x86_64-linux-gnu.so |grep std::filesystem::__cxx11::path::_M_split_cmpts
00000000002d2140 T std::filesystem::__cxx11::path::_M_split_cmpts()
00000000001ec838 t std::filesystem::__cxx11::path::_M_split_cmpts() [clone .cold.121]

At runtime from the callstack you can see actually an implementation from /lib/libstdc++.so.6 was used. Given the layout of std::filesystem::path object between GCC 8 and higher GCC versions are different(and incompatible), we should not see one calls into another.

trajepl · 2024-04-18T02:08:38Z

Thanks!

snnn · 2024-05-24T03:08:24Z

I started upgrading the GCC.

### Description Use a common set of prebuilt manylinux base images to build the packages, to avoid building the manylinux part again and again. The base images can be used in GenAI and other projects too. This PR also updates the GCC version for inference python CUDA11/CUDA12 builds from 8 to 11. Later on I will update all other CUDA pipelines to use GCC 11, to avoid the issue described in onnx/onnx#6047 and microsoft/onnxruntime-genai#257 . ### Motivation and Context To extract the common part as a reusable build infra among different ONNX Runtime projects.

1. Use an internal prebuilt base image instead of a public image, so that we do not need to rebuild the manylinux part again and again. 2. The new CUDA 11 base image is based on almalinux 8 instead of ubi8, so that we can get GCC 11. ubi8 only has GCC 8 and GCC 12, but GCC 12 is compatible with CUDA 11. So before this change we use GCC 8 in CUDA 11 build. After this change we will use GCC 11 instead. 3. Drop the support for GCC 10 and below. This PR provides another solution for #257 .

### Description Use a common set of prebuilt manylinux base images to build the packages, to avoid building the manylinux part again and again. The base images can be used in GenAI and other projects too. This PR also updates the GCC version for inference python CUDA11/CUDA12 builds from 8 to 11. Later on I will update all other CUDA pipelines to use GCC 11, to avoid the issue described in onnx/onnx#6047 and microsoft/onnxruntime-genai#257 . ### Motivation and Context To extract the common part as a reusable build infra among different ONNX Runtime projects.

Use a common set of prebuilt manylinux base images to build the packages, to avoid building the manylinux part again and again. The base images can be used in GenAI and other projects too. This PR also updates the GCC version for inference python CUDA11/CUDA12 builds from 8 to 11. Later on I will update all other CUDA pipelines to use GCC 11, to avoid the issue described in onnx/onnx#6047 and microsoft/onnxruntime-genai#257 . To extract the common part as a reusable build infra among different ONNX Runtime projects.

snnn mentioned this issue Apr 11, 2024

Replace some old file system calls with C++17 std::filesystem APIs. microsoft/onnxruntime#19196

Merged

baijumeswani mentioned this issue May 3, 2024

Use std::experimental::filesystem instead of std::filesystem #393

Merged

baijumeswani closed this as completed in #393 May 3, 2024

baijumeswani mentioned this issue May 6, 2024

Loading an og.Model sometimes throws a segfault #409

Closed

snnn mentioned this issue May 24, 2024

Remove manylinux build scripts from python packaging pipeline microsoft/onnxruntime#20786

Merged

snnn mentioned this issue May 31, 2024

Update base image and upgrade GCC version #548

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

terminated by signal SIGSEGV (Address boundary error) if import onnxruntime before onnxruntime-genai #257

terminated by signal SIGSEGV (Address boundary error) if import onnxruntime before onnxruntime-genai #257

trajepl commented Apr 10, 2024

trajepl commented Apr 10, 2024

baijumeswani commented Apr 10, 2024

trajepl commented Apr 11, 2024

trajepl commented Apr 11, 2024

baijumeswani commented Apr 11, 2024 •

edited

Loading

snnn commented Apr 11, 2024

snnn commented Apr 11, 2024

snnn commented Apr 12, 2024

snnn commented Apr 12, 2024

baijumeswani commented Apr 12, 2024

baijumeswani commented Apr 12, 2024

snnn commented Apr 12, 2024

baijumeswani commented Apr 12, 2024

snnn commented Apr 12, 2024

snnn commented Apr 12, 2024

snnn commented Apr 12, 2024 •

edited

Loading

baijumeswani commented Apr 12, 2024

snnn commented Apr 12, 2024

snnn commented Apr 12, 2024

snnn commented Apr 12, 2024

snnn commented Apr 13, 2024 •

edited

Loading

trajepl commented Apr 17, 2024

baijumeswani commented Apr 17, 2024

snnn commented Apr 17, 2024

trajepl commented Apr 18, 2024

snnn commented May 24, 2024

terminated by signal SIGSEGV (Address boundary error) if import onnxruntime before onnxruntime-genai #257

terminated by signal SIGSEGV (Address boundary error) if import onnxruntime before onnxruntime-genai #257

Comments

trajepl commented Apr 10, 2024

trajepl commented Apr 10, 2024

baijumeswani commented Apr 10, 2024

trajepl commented Apr 11, 2024

trajepl commented Apr 11, 2024

baijumeswani commented Apr 11, 2024 • edited Loading

snnn commented Apr 11, 2024

snnn commented Apr 11, 2024

snnn commented Apr 12, 2024

snnn commented Apr 12, 2024

baijumeswani commented Apr 12, 2024

baijumeswani commented Apr 12, 2024

snnn commented Apr 12, 2024

baijumeswani commented Apr 12, 2024

snnn commented Apr 12, 2024

snnn commented Apr 12, 2024

snnn commented Apr 12, 2024 • edited Loading

baijumeswani commented Apr 12, 2024

snnn commented Apr 12, 2024

snnn commented Apr 12, 2024

snnn commented Apr 12, 2024

snnn commented Apr 13, 2024 • edited Loading

trajepl commented Apr 17, 2024

baijumeswani commented Apr 17, 2024

snnn commented Apr 17, 2024

trajepl commented Apr 18, 2024

snnn commented May 24, 2024

baijumeswani commented Apr 11, 2024 •

edited

Loading

snnn commented Apr 12, 2024 •

edited

Loading

snnn commented Apr 13, 2024 •

edited

Loading