Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coredump caused by modifying the same vector in multiple Driver threads #5169

Open
mayunlei opened this issue Jun 7, 2023 · 0 comments
Open
Labels
bug Something isn't working triage Newly created issue that needs attention.

Comments

@mayunlei
Copy link

mayunlei commented Jun 7, 2023

Bug description

How to reproduce coredump

select strpos(key_0,'$') from table limit 100

table is any table
key_0 is any varchar key

in fact, the key point is the function ‘strpos’ and ‘limit 100’. if running strpos with limit 100, crashing will happen with 90% probability.

SQL Plan:

Fragment 1 [SINGLE]
    Cost: CPU 1.15ms, Input: 0 rows (0B), Output: 0 rows (0B)
    Output layout: [strpos]
    Output partitioning: SINGLE []
    - Project[] => [strpos:bigint]
            Cost: 0.00%, Input: 0 rows (0B), Output: 0 rows (0B), Filtered: ?%
            Input avg.: 0.00 lines, Input std.dev.: ?%
            strpos := "strpos"("key", '$')              <---------------------------- coredump in this operator
        - LocalExchange[ROUND_ROBIN] () => key:varchar
                Cost: 0.00%, Output: 0 rows (0B)
                Input avg.: 0.00 lines, Input std.dev.: ?%
            - Limit[100] => [key:varchar]
                    Cost: 0.00%, Output: 0 rows (0B)
                    Input avg.: 0.00 lines, Input std.dev.: ?%
                - LocalExchange[SINGLE] () => key:varchar
                        Cost: 0.00%, Output: 0 rows (0B)
                        Input avg.: 0.00 lines, Input std.dev.: ?%
                    - RemoteSource[2] => [key:varchar]
                            Cost: 0.00%, Output: 0 rows (0B)
                            Input avg.: 0.00 lines, Input std.dev.: ?%

Fragment 2 [SOURCE]
    Cost: CPU 981.78us, Input: 0 rows (0B), Output: 0 rows (0B)
    Output layout: [key]
    Output partitioning: SINGLE []
    - LimitPartial[100] => [key:varchar]
            Cost: 0.00%, Output: 0 rows (0B)
            Input avg.: 0.00 lines, Input std.dev.: ?%
        - TableScan[ , originalConstraint = true] => [key:varchar]
                Cost: 100.00%, Output: 0 rows (0B)
                Input avg.: 0.00 lines, Input std.dev.: ?%
                key := ColumnHandle{connectorId= , columnName=key_0, columnType=varchar, ordinalPosition=7}

Core stack:

#0 0x0000000006283888 in std::__fill_n_a<unsigned long*, unsigned long, unsigned long> (__first=0xffffffff6ffff830, __n=2, __value=@0x7ec9c83b5d78: 0) at /opt/rh/devtoolset-9/root/usr/include/c++/9/bits/stl_algobase.h:772
#1 0x000000000627d852 in std::fill_n<unsigned long*, unsigned long, unsigned long> (__first=0xffffffff6ffff830, __n=2, __value=@0x7ec9c83b5d78: 0) at /opt/rh/devtoolset-9/root/usr/include/c++/9/bits/stl_algobase.h:808
#2 0x00000000062776ff in std::__uninitialized_fill_n::__uninit_fill_n<unsigned long*, unsigned long, unsigned long> (__first=0xffffffff6ffff830, __n=2, __x=@0x7ec9c83b5d78: 0)
at /opt/rh/devtoolset-9/root/usr/include/c++/9/bits/stl_uninitialized.h:240
#3 0x000000000626ee55 in std::uninitialized_fill_n<unsigned long*, unsigned long, unsigned long> (__first=0xffffffff6ffff830, __n=2, __x=@0x7ec9c83b5d78: 0)
at /opt/rh/devtoolset-9/root/usr/include/c++/9/bits/stl_uninitialized.h:273
#4 0x0000000006263e28 in std::__uninitialized_fill_n_a<unsigned long*, unsigned long, unsigned long, unsigned long> (__first=0xffffffff6ffff830, __n=2, __x=@0x7ec9c83b5d78: 0)
at /opt/rh/devtoolset-9/root/usr/include/c++/9/bits/stl_uninitialized.h:384
#5 0x000000000625b444 in std::vector<unsigned long, std::allocator >::_M_fill_insert (this=0x7ec9000020f8, __position=<error reading variable: Cannot access memory at address 0x0>, __n=2, __x=@0x7ec9c83b5d78: 0)
at /opt/rh/devtoolset-9/root/usr/include/c++/9/bits/vector.tcc:566
#6 0x000000000625262b in std::vector<unsigned long, std::allocator >::resize (this=0x7ec9000020f8, __new_size=2, __x=@0x7ec9c83b5d78: 0) at /opt/rh/devtoolset-9/root/usr/include/c++/9/bits/stl_vector.h:957
#7 0x000000000624b2e8 in facebook::velox::SelectivityVector::resize (this=0x7ec9000020e8, size=98, value=false) at /workspace/sls-sql-velox/presto-native-execution/velox/velox/vector/SelectivityVector.h:181
#8 0x0000000009595657 in facebook::velox::SimpleVectorfacebook::velox::StringView::ensureIsAsciiCapacityfacebook::velox::StringView (this=0x7ec900002080, size=98)
at /workspace/sls-sql-velox/presto-native-execution/velox/velox/vector/SimpleVector.h:277
#9 0x000000000b6cde2b in facebook::velox::SimpleVectorfacebook::velox::StringView::computeAndSetIsAsciifacebook::velox::StringView (this=0x7ec900002080, rows=...)
at /workspace/sls-sql-velox/presto-native-execution/velox/velox/vector/SimpleVector.h:220
#10 0x000000000b6c6aa0 in facebook::velox::exec::(anonymous namespace)::computeIsAsciiForInputs (vectorFunction=0x7ec920017070, inputValues=std::vector of length 2, capacity 2 = {...}, rows=...)
at /workspace/sls-sql-velox/presto-native-execution/velox/velox/expression/Expr.cpp:1193
#11 0x000000000b6c8485 in facebook::velox::exec::Expr::applyFunction (this=0x7ec920017800, rows=..., context=..., result=std::shared_ptr (empty) 0x0)
at /workspace/sls-sql-velox/presto-native-execution/velox/velox/expression/Expr.cpp:1544
#12 0x000000000b6c7634 in facebook::velox::exec::Expr::evalAllImpl (this=0x7ec920017800, rows=..., context=..., result=std::shared_ptr (empty) 0x0)
at /workspace/sls-sql-velox/presto-native-execution/velox/velox/expression/Expr.cpp:1379
#13 0x000000000b6c713c in facebook::velox::exec::Expr::evalAll (this=0x7ec920017800, rows=..., context=..., result=std::shared_ptr (empty) 0x0)
at /workspace/sls-sql-velox/presto-native-execution/velox/velox/expression/Expr.cpp:1326
#14 0x000000000b6c5eeb in facebook::velox::exec::Expr::evalWithNulls (this=0x7ec920017800, rows=..., context=..., result=std::shared_ptr (empty) 0x0)
at /workspace/sls-sql-velox/presto-native-execution/velox/velox/expression/Expr.cpp:1074
#15 0x000000000b6c651b in facebook::velox::exec::Expr::evalWithMemo (this=0x7ec920017800, rows=..., context=..., result=std::shared_ptr (empty) 0x0)
at /workspace/sls-sql-velox/presto-native-execution/velox/velox/expression/Expr.cpp:1142
#16 0x000000000b6c5489 in facebook::velox::exec::Expr::evalEncodings (this=0x7ec920017800, rows=..., context=..., result=std::shared_ptr (empty) 0x0)
at /workspace/sls-sql-velox/presto-native-execution/velox/velox/expression/Expr.cpp:941
#17 0x000000000b6c41b7 in facebook::velox::exec::Expr::eval (this=0x7ec920017800, rows=..., context=..., result=std::shared_ptr (empty) 0x0, topLevel=true)
at /workspace/sls-sql-velox/presto-native-execution/velox/velox/expression/Expr.cpp:622
#18 0x000000000b6ca0fa in facebook::velox::exec::ExprSet::eval (this=0x7ec920016da0, begin=0, end=1, initialize=true, rows=..., context=..., result=std::vector of length 1, capacity 1 = {...})
at /workspace/sls-sql-velox/presto-native-execution/velox/velox/expression/Expr.cpp:1806
#19 0x00000000098ccf2a in facebook::velox::exec::FilterProject::project (this=0x7ec920015f60, rows=..., evalCtx=...) at /workspace/sls-sql-velox/presto-native-execution/velox/velox/exec/FilterProject.cpp:187
#20 0x00000000098ccb7e in facebook::velox::exec::FilterProject::getOutput (this=0x7ec920015f60) at /workspace/sls-sql-velox/presto-native-execution/velox/velox/exec/FilterProject.cpp:148
#21 0x00000000096d3d42 in facebook::velox::exec::Driver::runInternal (this=0x7ec9200176c0, self=std::shared_ptr (count 3, weak 1) 0x7ec9200176c0, blockingState=std::shared_ptr (empty) 0x0, result=std::shared_ptr (empty) 0x0)
at /workspace/sls-sql-velox/presto-native-execution/velox/velox/exec/Driver.cpp:374
#22 0x00000000096d4a24 in facebook::velox::exec::Driver::run (self=std::shared_ptr (count 3, weak 1) 0x7ec9200176c0) at /workspace/sls-sql-velox/presto-native-execution/velox/velox/exec/Driver.cpp:494
#23 0x00000000096d27d8 in facebook::velox::exec::Driver::<lambda()>::operator()(void) const (__closure=0x7ec9c83b6b70) at /workspace/sls-sql-velox/presto-native-execution/velox/velox/exec/Driver.cpp:181
#24 0x00000000096d71f4 in folly::detail::function::FunctionTraits<void()>::callSmall<facebook::velox::exec::Driver::enqueue(std::shared_ptrfacebook::velox::exec::Driver)::<lambda()> >(folly::detail::function::Data &) (p=...)
at /usr/local/include/folly/Function.h:363

coredump in frame 7 . I can find another thread calling resize on the same object

the stack of another thread is :

#0 0x00007f1a6c762e9d in nanosleep () from /lib64/libpthread.so.0
#1 0x000000000bc3c65b in folly::symbolizer::(anonymous namespace)::innerSignalHandler (info=0x7ec9c87b95f0, signum=11)
at /build_workspace/sls-sql-velox/presto-native-execution/sls_deploy/_build/folly/folly/experimental/symbolizer/SignalHandler.cpp:442
#2 folly::symbolizer::(anonymous namespace)::signalHandler (signum=11, info=0x7ec9c87b95f0, uctx=)
at /build_workspace/sls-sql-velox/presto-native-execution/sls_deploy/_build/folly/folly/experimental/symbolizer/SignalHandler.cpp:470
#3
#4 0x00007f1a6abfab86 in __memmove_ssse3_back () from /lib64/libc.so.6
#5 0x000000000628b6bb in std::__copy_move<true, true, std::random_access_iterator_tag>::__copy_m (__first=0x7ec970001aa0, __last=0x0, __result=0x7ec8f0001bf0)
at /opt/rh/devtoolset-9/root/usr/include/c++/9/bits/stl_algobase.h:386
#6 0x000000000628a476 in std::__copy_move_a<true, unsigned long*, unsigned long*> (__first=0x7ec970001aa0, __last=0x0, __result=0x7ec8f0001bf0) at /opt/rh/devtoolset-9/root/usr/include/c++/9/bits/stl_algobase.h:404
#7 0x00000000062873b1 in std::__copy_move_a2<true, unsigned long*, unsigned long*> (__first=0x7ec970001aa0, __last=0x0, __result=0x7ec8f0001bf0) at /opt/rh/devtoolset-9/root/usr/include/c++/9/bits/stl_algobase.h:440
#8 0x000000000628384f in std::copy<std::move_iterator<unsigned long*>, unsigned long*> (__first=..., __last=..., __result=0x7ec8f0001bf0) at /opt/rh/devtoolset-9/root/usr/include/c++/9/bits/stl_algobase.h:474
#9 0x000000000627d7a8 in std::__uninitialized_copy::__uninit_copy<std::move_iterator<unsigned long*>, unsigned long*> (__first=..., __last=..., __result=0x7ec8f0001bf0)
at /opt/rh/devtoolset-9/root/usr/include/c++/9/bits/stl_uninitialized.h:101
#10 0x00000000062776a1 in std::uninitialized_copy<std::move_iterator<unsigned long*>, unsigned long*> (__first=..., __last=..., __result=0x7ec8f0001bf0) at /opt/rh/devtoolset-9/root/usr/include/c++/9/bits/stl_uninitialized.h:140
#11 0x000000000626ed72 in std::__uninitialized_copy_a<std::move_iterator<unsigned long*>, unsigned long*, unsigned long> (__first=..., __last=..., __result=0x7ec8f0001bf0)
at /opt/rh/devtoolset-9/root/usr/include/c++/9/bits/stl_uninitialized.h:307
#12 0x0000000006263f40 in std::__uninitialized_move_if_noexcept_a<unsigned long*, unsigned long*, std::allocator > (__first=0x7ec970001aa0, __last=0x0, __result=0x7ec8f0001bf0, __alloc=...)
at /opt/rh/devtoolset-9/root/usr/include/c++/9/bits/stl_uninitialized.h:329
#13 0x000000000625b480 in std::vector<unsigned long, std::allocator >::_M_fill_insert (this=0x7ec9000020f8, __position=<error reading variable: Cannot access memory at address 0x0>, __n=2, __x=@0x7ec9c87b9d78: 0)
at /opt/rh/devtoolset-9/root/usr/include/c++/9/bits/vector.tcc:573
#14 0x000000000625262b in std::vector<unsigned long, std::allocator >::resize (this=0x7ec9000020f8, __new_size=2, __x=@0x7ec9c87b9d78: 0) at /opt/rh/devtoolset-9/root/usr/include/c++/9/bits/stl_vector.h:957
#15 0x000000000624b2e8 in facebook::velox::SelectivityVector::resize (this=0x7ec9000020e8, size=99, value=false) at /workspace/sls-sql-velox/presto-native-execution/velox/velox/vector/SelectivityVector.h:181
#16 0x0000000009595657 in facebook::velox::SimpleVectorfacebook::velox::StringView::ensureIsAsciiCapacityfacebook::velox::StringView (this=0x7ec900002080, size=99)
at /workspace/sls-sql-velox/presto-native-execution/velox/velox/vector/SimpleVector.h:277
#17 0x000000000b6cde2b in facebook::velox::SimpleVectorfacebook::velox::StringView::computeAndSetIsAsciifacebook::velox::StringView (this=0x7ec900002080, rows=...)
at /workspace/sls-sql-velox/presto-native-execution/velox/velox/vector/SimpleVector.h:220
#18 0x000000000b6c6aa0 in facebook::velox::exec::(anonymous namespace)::computeIsAsciiForInputs (vectorFunction=0x7ec920019840, inputValues=std::vector of length 2, capacity 2 = {...}, rows=...)
at /workspace/sls-sql-velox/presto-native-execution/velox/velox/expression/Expr.cpp:1193
#19 0x000000000b6c8485 in facebook::velox::exec::Expr::applyFunction (this=0x7ec920019fd0, rows=..., context=..., result=std::shared_ptr (empty) 0x0)
at /workspace/sls-sql-velox/presto-native-execution/velox/velox/expression/Expr.cpp:1544
#20 0x000000000b6c7634 in facebook::velox::exec::Expr::evalAllImpl (this=0x7ec920019fd0, rows=..., context=..., result=std::shared_ptr (empty) 0x0)
at /workspace/sls-sql-velox/presto-native-execution/velox/velox/expression/Expr.cpp:1379
#21 0x000000000b6c713c in facebook::velox::exec::Expr::evalAll (this=0x7ec920019fd0, rows=..., context=..., result=std::shared_ptr (empty) 0x0)
at /workspace/sls-sql-velox/presto-native-execution/velox/velox/expression/Expr.cpp:1326
#22 0x000000000b6c5eeb in facebook::velox::exec::Expr::evalWithNulls (this=0x7ec920019fd0, rows=..., context=..., result=std::shared_ptr (empty) 0x0)
at /workspace/sls-sql-velox/presto-native-execution/velox/velox/expression/Expr.cpp:1074
#23 0x000000000b6c651b in facebook::velox::exec::Expr::evalWithMemo (this=0x7ec920019fd0, rows=..., context=..., result=std::shared_ptr (empty) 0x0)
at /workspace/sls-sql-velox/presto-native-execution/velox/velox/expression/Expr.cpp:1142
#24 0x000000000b6c5489 in facebook::velox::exec::Expr::evalEncodings (this=0x7ec920019fd0, rows=..., context=..., result=std::shared_ptr (empty) 0x0)
at /workspace/sls-sql-velox/presto-native-execution/velox/velox/expression/Expr.cpp:941
#25 0x000000000b6c41b7 in facebook::velox::exec::Expr::eval (this=0x7ec920019fd0, rows=..., context=..., result=std::shared_ptr (empty) 0x0, topLevel=true)
at /workspace/sls-sql-velox/presto-native-execution/velox/velox/expression/Expr.cpp:622
#26 0x000000000b6ca0fa in facebook::velox::exec::ExprSet::eval (this=0x7ec920019530, begin=0, end=1, initialize=true, rows=..., context=..., result=std::vector of length 1, capacity 1 = {...})
at /workspace/sls-sql-velox/presto-native-execution/velox/velox/expression/Expr.cpp:1806
#27 0x00000000098ccf2a in facebook::velox::exec::FilterProject::project (this=0x7ec920018650, rows=..., evalCtx=...) at /workspace/sls-sql-velox/presto-native-execution/velox/velox/exec/FilterProject.cpp:187
#28 0x00000000098ccb7e in facebook::velox::exec::FilterProject::getOutput (this=0x7ec920018650) at /workspace/sls-sql-velox/presto-native-execution/velox/velox/exec/FilterProject.cpp:148
#29 0x00000000096d3d42 in facebook::velox::exec::Driver::runInternal (this=0x7ec92001a7f0, self=std::shared_ptr (count 3, weak 1) 0x7ec92001a7f0, blockingState=std::shared_ptr (empty) 0x0, result=std::shared_ptr (empty) 0x0)
at /workspace/sls-sql-velox/presto-native-execution/velox/velox/exec/Driver.cpp:374
#30 0x00000000096d4a24 in facebook::velox::exec::Driver::run (self=std::shared_ptr (count 3, weak 1) 0x7ec92001a7f0) at /workspace/sls-sql-velox/presto-native-execution/velox/velox/exec/Driver.cpp:494
#31 0x00000000096d27d8 in facebook::velox::exec::Driver::<lambda()>::operator()(void) const (__closure=0x7ec9c87bab70) at /workspace/sls-sql-velox/presto-native-execution/velox/velox/exec/Driver.cpp:181
#32 0x00000000096d71f4 in folly::detail::function::FunctionTraits<void()>::callSmall<facebook::velox::exec::Driver::enqueue(std::shared_ptrfacebook::velox::exec::Driver)::<lambda()> >(folly::detail::function::Data &) (p=...)
at /usr/local/include/folly/Function.h:363

The observed facts and my guess

it is clear that the this pointer of frame 7 of stack 1 and the this pointer in frame 15 of stack 2 are exactly the same. both threads are trying to resize a std::vector.

It seems multiple driver threads of the same pipeline are processing the same batch of vector data with different SelectivityVector. But the vector data calls computeIsAsciiForInputs for varchar type, in which function the vector data may be modified .

If multiple driver threads were processing the same batch of vector data , the data should be readonly, right?

LocalExchange[ROUND_ROBIN] () after Limit operator will shuffle the same batch data into different drivers, each driver processes only a few rows of the same batch data , marked by a selectivityVector. Before calling strpos, Expr will first calculate whether the varchar data only contains ascii character, which will be helpful when deciding whether envoke call(UTF) or callAscii. callAscii implementation has a better performance. but the problem is multiple threads are calling computeIsAsciiForInputs on the same SimpleVector, which causes the crash.

System information

Velox System Info v0.0.2
Commit: 336ab79319d72fa97b39b8d36a769c75ec6d2276
CMake Version: 3.22.3
System: Linux-3.10.0-327..x86_64
Arch: x86_64
C++ Compiler: /opt/rh/devtoolset-9/root/bin/g++
C++ Compiler Version: 9.3.1
C Compiler: /opt/rh/devtoolset-9/root/bin/gcc
C Compiler Version: 9.3.1
CMake Prefix Path: /usr/local;/usr;/;/usr/local;/usr/local;/usr/X11R6;/usr/pkg;/opt

Relevant logs

No response

@mayunlei mayunlei added bug Something isn't working triage Newly created issue that needs attention. labels Jun 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Newly created issue that needs attention.
Projects
None yet
Development

No branches or pull requests

1 participant