Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MLPerf R50 inference with tensorflow backend is segfaulting while using protobuf version 4.24.0 #13484

Closed
arjunsuresh opened this issue Aug 9, 2023 · 2 comments
Assignees

Comments

@arjunsuresh
Copy link

What version of protobuf and what language are you using?
Version: main/v2.4.0

What operating system (Linux, Windows, ...) and version?
Ubuntu 22.04

What did you do?
Steps to reproduce the behavior:

python3 -m pip install cmind
cm run script --tags=generate-run-cmds,inference --backend=tf --model=resnet50  --adr.protobuf.version=4.24.0  --rerun

The above command segfaults

Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
0x0000ffff8ab095c8 in upb_Message_DeepCopy () from /home/cmuser/.local/lib/python3.10/site-packages/google/_upb/_message.abi3.so
(gdb) bt
#0  0x0000ffff8ab095c8 in upb_Message_DeepCopy () from /home/cmuser/.local/lib/python3.10/site-packages/google/_upb/_message.abi3.so
#1  0x0000ffff8ab04e04 in PyUpb_Message_CopyFrom () from /home/cmuser/.local/lib/python3.10/site-packages/google/_upb/_message.abi3.so
#2  0x0000aaaac9b133c4 in ?? ()
#3  0x0000aaaac9b02ce8 in _PyEval_EvalFrameDefault ()
#4  0x0000aaaac9b14808 in _PyFunction_Vectorcall ()
#5  0x0000aaaac9b003b0 in _PyEval_EvalFrameDefault ()
#6  0x0000aaaac9b14808 in _PyFunction_Vectorcall ()
#7  0x0000aaaac9b02ce8 in _PyEval_EvalFrameDefault ()
#8  0x0000aaaac9b14808 in _PyFunction_Vectorcall ()
#9  0x0000aaaac9b02ce8 in _PyEval_EvalFrameDefault ()
#10 0x0000aaaac9b14808 in _PyFunction_Vectorcall ()
#11 0x0000aaaac9afeb94 in _PyEval_EvalFrameDefault ()
#12 0x0000aaaac9b2347c in ?? ()
#13 0x0000aaaac9affb9c in _PyEval_EvalFrameDefault ()
#14 0x0000aaaac9b14808 in _PyFunction_Vectorcall ()
#15 0x0000aaaac9afeb94 in _PyEval_EvalFrameDefault ()
#16 0x0000aaaac9bf7604 in ?? ()
#17 0x0000aaaac9bf7514 in PyEval_EvalCode ()
#18 0x0000aaaac9c2d6ec in ?? ()
#19 0x0000aaaac9c24d94 in ?? ()
#20 0x0000aaaac9c2d37c in ?? ()
#21 0x0000aaaac9c2c464 in _PyRun_SimpleFileObject ()
#22 0x0000aaaac9c2c104 in _PyRun_AnyFileObject ()
#23 0x0000aaaac9c1ab5c in Py_RunMain ()
#24 0x0000aaaac9be8b48 in Py_BytesMain ()
#25 0x0000ffff97667780 in __libc_start_call_main (main=main@entry=0xaaaac9be8b20, argc=argc@entry=25, argv=argv@entry=0xffffd9faf308) at ../sysdeps/nptl/libc_start_call_main.h:58
#26 0x0000ffff97667858 in __libc_start_main_impl (main=0xaaaac9be8b20, argc=25, argv=0xffffd9faf308, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=<optimized out>) at ../csu/libc-start.c:381
#27 0x0000aaaac9be8a30 in _start ()

This works as expected

python3 -m pip install cmind
cm run script --tags=generate-run-cmds,inference --backend=tf --model=resnet50  --adr.protobuf.version=4.23.4  --rerun
@arjunsuresh arjunsuresh added the untriaged auto added to all issues by default when created. label Aug 9, 2023
@fowles fowles added python 24.x and removed untriaged auto added to all issues by default when created. labels Aug 9, 2023
@anandolee
Copy link
Contributor

I can not reproduce the error. Even if the error can be reproduced by the steps, we may not able to debug into it because we do not know where segment fault is raised in the python's source code

looks like duplicate with #13485 will close this one and focus on the other as the other issue provided more info

@arjunsuresh
Copy link
Author

Thank you @anandolee for looking into this and #13485 looks like the same issue. Will wait for a resolution there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants