Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double free issue when using a cursor #133

Open
nwrbs opened this issue Jul 6, 2018 · 24 comments
Open

Double free issue when using a cursor #133

nwrbs opened this issue Jul 6, 2018 · 24 comments
Assignees
Labels
investigation Not sure what the problem is, yet. Not even sure if it's a bug or a feature, yet.

Comments

@nwrbs
Copy link

nwrbs commented Jul 6, 2018

I'm afraid this issue is probably scant on details, but here it goes.
I'm getting an error in a program I wrote using node-lmdb. The error only occurs when I run the program, not when I debug the program, so I'm unable to tell where exactly in the program the error is occurring. The backtrace seems to indicate the issue is in CursorWrap...

I have 2 functions that I call 2 times each:

convertStringKeyTable(env, dbiA);
convertStringKeyTable(env, dbiB);
convertIntKeyTable(env, dbiC);
convertIntKeyTable(env, dbiD);

Both functions access the same env, but different dbi's. I can run each function individually, and I have no problem. The error only occurs when I call all four functions in a row. I can run any combination of the 4 (ie the first 3, the last 3, etc) and no error occurs. It is only when I call all four that the issue occurs.

The message from node is below. Is there anything more I can do to see where the error is coming from please let me know.

*** Error in `node': double free or corruption (top): 0x000000000279ff30 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x81499)[0x7f686c880499]
../node_modules/node-lmdb/build/Release/node-lmdb.node(_ZN10CursorWrap5closeERKN3Nan20FunctionCallbackInfoIN2v85ValueEEE+0x64)[0x7f686a5edb24]
../node_modules/node-lmdb/build/Release/node-lmdb.node(+0x219c7)[0x7f686a5ed9c7]
node(_ZN2v88internal25FunctionCallbackArguments4CallEPFvRKNS_20FunctionCallbackInfoINS_5ValueEEEE+0x193)[0xa5ff63]
node[0xad710c]
node(_ZN2v88internal21Builtin_HandleApiCallEiPPNS0_6ObjectEPNS0_7IsolateE+0xaf)[0xad7d5f]
[0x22b24cb842fd]
@nwrbs
Copy link
Author

nwrbs commented Jul 6, 2018

I think I have located the possible cause.
Reuse of a cursor variable:
The functions contain the following:

let cursor = new lmdb.Cursor(txn, tblDbi, { keyIsBuffer: true });
...
cursor.close();
...
cursor = new lmdb.Cursor(txnTmp, dbiTmpTbl, { keyIsBuffer: true });
...
cursor.close();

If I create a separate cursor variable ("cursor2) for the 2nd use of a cursor, the issue does not occur. It's only if I reuse the 1st declaration of "cursor"

@Venemo
Copy link
Owner

Venemo commented Jul 6, 2018

@nwbrad Thanks for reporting this. Can you please run node-lmdb inside gdb to see a better backtrace? Also, do you use the latest version?

If you are unfamiliar with gdb, this is how to launch it:

gdb --args node your-app.js

Inside gdb use r to run your app and when it crashes use bt to produce a backtrace. Please paste the backtrace here.

@Venemo Venemo self-assigned this Jul 6, 2018
@Venemo
Copy link
Owner

Venemo commented Jul 6, 2018

I would also appreciate it if you could give me a test case that reproduces this.

@nwrbs
Copy link
Author

nwrbs commented Jul 6, 2018

below is the backtrace.

*** Error in `/usr/bin/node': double free or corruption (top): 0x000000000226de70 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x81499)[0x7ffff6c48499]
/test/node_modules/node-lmdb/build/Release/node-lmdb.node(_ZN10CursorWrap5closeERKN3Nan20FunctionCallbackInfoIN2v85ValueEEE+0x64)[0x7ffff49b5b24]
/test/node_modules/node-lmdb/build/Release/node-lmdb.node(+0x219c7)[0x7ffff49b59c7]
/usr/bin/node(_ZN2v88internal25FunctionCallbackArguments4CallEPFvRKNS_20FunctionCallbackInfoINS_5ValueEEEE+0x193)[0xa5ff63]
/usr/bin/node[0xad710c]
/usr/bin/node(_ZN2v88internal21Builtin_HandleApiCallEiPPNS0_6ObjectEPNS0_7IsolateE+0xaf)[0xad7d5f]
[0x133a2f9842fd]

@Venemo
Copy link
Owner

Venemo commented Jul 7, 2018

@nwbrad Can you please repeat that but using a debug build? So that we can see line numbers in there. The backtrace itself is enough, I can't use the memory map.

@nwrbs
Copy link
Author

nwrbs commented Jul 7, 2018

I ran debuginfo-install nodejs-8.11.3-1nodesource.x86_64 to install debug source, but I got an error message: Could not find debuginfo for main pkg: 2:nodejs-8.11.3-1nodesource.x86_64
Everything thing else on the install ran fine. Any idea on where to get the debuginfo for nodejs-8.11.3-1nodesource.x86_64

@Venemo
Copy link
Owner

Venemo commented Jul 8, 2018

@nwbrad What I would like to see is line numbers in node-lmdb. For that you don't need the nodejs debuginfo, it is enough to make a debug build of node-lmdb.

@nwrbs
Copy link
Author

nwrbs commented Jul 8, 2018

I don't think this worked like expected because I don't see any line numbers for node-lmdb.
I installed and built nodejs from source and then ran the following:

gdb --args node_g taxonomy_convert.js
#0  0x00007ffff6bfd277 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff6bfe968 in __GI_abort () at abort.c:90
#2  0x00007ffff6c3fd37 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7ffff6d51d58 "*** Error in `%s': %s: 0x%s ***\n")
    at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
#3  0x00007ffff6c48499 in malloc_printerr (ar_ptr=0x7ffff6f8d760 <main_arena>, ptr=<optimized out>, str=0x7ffff6d51e40 "double free or corruption (top)", action=3)
    at malloc.c:5025
#4  _int_free (av=0x7ffff6f8d760 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:3847
#5  0x00007ffff49b5b24 in CursorWrap::close(Nan::FunctionCallbackInfo<v8::Value> const&) ()
   from /node_modules/node-lmdb/build/Release/node-lmdb.node
#6  0x00007ffff49b59c7 in Nan::imp::FunctionCallbackWrapper () from /node_modules/node-lmdb/build/Release/node-lmdb.node
#7  0x00000000016bfc62 in v8::internal::FunctionCallbackArguments::Call (this=0x7fffffffbac0, 
    f=0x7ffff49b5950 <_ZN3Nan3impL23FunctionCallbackWrapperERKN2v820FunctionCallbackInfoINS1_5ValueEEE>) at ../deps/v8/src/api-arguments.cc:25
#8  0x0000000001775a93 in v8::internal::(anonymous namespace)::HandleApiCallHelper<false> (isolate=0x3818b70, function=..., new_target=..., fun_data=..., receiver=..., args=...)
    at ../deps/v8/src/builtins/builtins-api.cc:112
#9  0x0000000001773e21 in v8::internal::Builtin_Impl_HandleApiCall (args=..., isolate=0x3818b70) at ../deps/v8/src/builtins/builtins-api.cc:142
#10 0x0000000001773b93 in v8::internal::Builtin_HandleApiCall (args_length=4, args_object=0x7fffffffbec8, isolate=0x3818b70) at ../deps/v8/src/builtins/builtins-api.cc:130
#11 0x00002051c90043c4 in ?? ()
#12 0x0000006600000000 in ?? ()
#13 0x00002051c9004301 in ?? ()
#14 0x00007fffffffbe80 in ?? ()
#15 0x0000000000000006 in ?? ()
#16 0x00007fffffffbf70 in ?? ()
#17 0x00002051c90f153f in ?? ()
#18 0x0000074cf66022d1 in ?? ()
#19 0x000029731aa077e9 in ?? ()
#20 0x0000000400000000 in ?? ()
#21 0x0000226624a23661 in ?? ()
#22 0x0000074cf66022d1 in ?? ()
#23 0x0000074cf66022d1 in ?? ()
#24 0x0000074cf66022d1 in ?? ()
#25 0x0000074cf66022d1 in ?? ()
#26 0x0000226624a23631 in ?? ()
#27 0x000034fbf0544271 in ?? ()
#28 0x0000226624a23611 in ?? ()
#29 0x000029731aa077e9 in ?? ()
#30 0x0000000000000000 in ?? ()

@Venemo
Copy link
Owner

Venemo commented Jul 8, 2018

I installed and built nodejs from source

But you don't need to. You only need a debug build of node-lmdb, not node.

git clone git@github.com:Venemo/node-lmdb.git
cd node-lmdb
node-gyp configure --debug
node-gyp build --debug

Then use the resulting binaries from this build.

@nwrbs
Copy link
Author

nwrbs commented Jul 8, 2018

Ok. Tried above and got the following for line 3

node-gyp configure --debug
gyp info it worked if it ends with ok
gyp info using node-gyp@3.7.0
gyp info using node@8.11.3 | linux | x64
gyp http GET https://nodejs.org/download/release/v8.11.3/node-v8.11.3-headers.tar.gz
gyp http 200 https://nodejs.org/download/release/v8.11.3/node-v8.11.3-headers.tar.gz
gyp http GET https://nodejs.org/download/release/v8.11.3/SHASUMS256.txt
gyp http 200 https://nodejs.org/download/release/v8.11.3/SHASUMS256.txt
gyp info spawn /usr/bin/python2
gyp info spawn args [ '/usr/lib/node_modules/node-gyp/gyp/gyp_main.py',
gyp info spawn args   'binding.gyp',
gyp info spawn args   '-f',
gyp info spawn args   'make',
gyp info spawn args   '-I',
gyp info spawn args   '/nodelmdb/node-lmdb/build/config.gypi',
gyp info spawn args   '-I',
gyp info spawn args   '/usr/lib/node_modules/node-gyp/addon.gypi',
gyp info spawn args   '-I',
gyp info spawn args   '/home/xyz/.node-gyp/8.11.3/include/node/common.gypi',
gyp info spawn args   '-Dlibrary=shared_library',
gyp info spawn args   '-Dvisibility=default',
gyp info spawn args   '-Dnode_root_dir=/home/xyz/.node-gyp/8.11.3',
gyp info spawn args   '-Dnode_gyp_dir=/usr/lib/node_modules/node-gyp',
gyp info spawn args   '-Dnode_lib_file=/home/xyz/.node-gyp/8.11.3/<(target_arch)/node.lib',
gyp info spawn args   '-Dmodule_root_dir=/nodelmdb/node-lmdb',
gyp info spawn args   '-Dnode_engine=v8',
gyp info spawn args   '--depth=.',
gyp info spawn args   '--no-parallel',
gyp info spawn args   '--generator-output',
gyp info spawn args   'build',
gyp info spawn args   '-Goutput_dir=.' ]
module.js:549
    throw err;
    ^

Error: Cannot find module 'nan'
    at Function.Module._resolveFilename (module.js:547:15)
    at Function.Module._load (module.js:474:25)
    at Module.require (module.js:596:17)
    at require (internal/module.js:11:18)
    at [eval]:1:1
    at ContextifyScript.Script.runInThisContext (vm.js:50:33)
    at Object.runInThisContext (vm.js:139:38)
    at Object.<anonymous> ([eval]-wrapper:6:22)
    at Module._compile (module.js:652:30)
    at evalScript (bootstrap_node.js:466:27)
gyp: Call to 'node -e "require('nan')"' returned exit status 1 while in binding.gyp. while trying to load binding.gyp
gyp ERR! configure error 
gyp ERR! stack Error: `gyp` failed with exit code: 1
gyp ERR! stack     at ChildProcess.onCpExit (/usr/lib/node_modules/node-gyp/lib/configure.js:345:16)
gyp ERR! stack     at emitTwo (events.js:126:13)
gyp ERR! stack     at ChildProcess.emit (events.js:214:7)
gyp ERR! stack     at Process.ChildProcess._handle.onexit (internal/child_process.js:198:12)
gyp ERR! System Linux 3.10.0-862.6.3.el7.x86_64
gyp ERR! command "/usr/bin/node" "/usr/bin/node-gyp" "configure" "--debug"
gyp ERR! cwd /nodelmdb/node-lmdb
gyp ERR! node -v v8.11.3
gyp ERR! node-gyp -v v3.7.0
gyp ERR! not ok 

@nwrbs
Copy link
Author

nwrbs commented Jul 8, 2018

I think I have it now:

gdb) bt
#0  0x00007ffff6bfd277 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff6bfe968 in __GI_abort () at abort.c:90
#2  0x00007ffff6c3fd37 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7ffff6d51d58 "*** Error in `%s': %s: 0x%s ***\n")
    at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
#3  0x00007ffff6c48499 in malloc_printerr (ar_ptr=0x7ffff6f8d760 <main_arena>, ptr=<optimized out>, str=0x7ffff6d51e40 "double free or corruption (top)", action=3)
    at malloc.c:5025
#4  _int_free (av=0x7ffff6f8d760 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:3847
#5  0x00007ffff499b78e in mdb_cursor_close (mc=0x226c1b0) at ../dependencies/lmdb/libraries/liblmdb/mdb.c:8550
#6  0x00007ffff49b1690 in CursorWrap::close (info=...) at ../src/cursor.cpp:94
#7  0x00007ffff49a3a83 in Nan::imp::FunctionCallbackWrapper (info=...) at ../node_modules/nan/nan_callbacks_12_inl.h:174
#8  0x0000000000a5ff63 in v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&)) ()
#9  0x0000000000ad710c in v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) ()
#10 0x0000000000ad7d5f in v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*) ()
#11 0x000008c5422842fd in ?? ()
#12 0x0000006600000000 in ?? ()
#13 0x000008c542284241 in ?? ()
#14 0x00007fffffffc1a0 in ?? ()
#15 0x0000000000000006 in ?? ()
#16 0x00007fffffffc290 in ?? ()
#17 0x000008c54233d1d6 in ?? ()
#18 0x00001f6cc1e822d1 in ?? ()
#19 0x0000049b6d104099 in ?? ()
#20 0x0000000400000000 in ?? ()
#21 0x00000baf842c2c11 in ?? ()
#22 0x00001f6cc1e822d1 in ?? ()
#23 0x00001f6cc1e822d1 in ?? ()
#24 0x00001f6cc1e822d1 in ?? ()
#25 0x00001f6cc1e822d1 in ?? ()
#26 0x00000baf842c2be1 in ?? ()
#27 0x000002c11475b111 in ?? ()
#28 0x00000baf842c2bc1 in ?? ()
#29 0x0000049b6d104099 in ?? ()
#30 0x0000000000000000 in ?? ()

@Venemo
Copy link
Owner

Venemo commented Jul 9, 2018

Thank you @nwbrad the relevant line seems to be CursorWrap::close (info=...) at ../src/cursor.cpp:94 which tells us that there is a double free somehow in there.

Can you give me a small javascript testcase, so that I can experiment with it?

@nwrbs
Copy link
Author

nwrbs commented Jul 9, 2018

Below is the function in which the error occurs. All logic related to other things has been removed, so it just shows the LMDB logic I used. The issue appears to occur when I reuse the Cursor variable. If I create a second cursor variable and close the cursors before I commit the transactions, the issue does not appear to happen...
I run the function on multiple tables and the issue only happens with one table that has more rows than my other tables. As a result, the issue may not be reproducible.

convertTable(env, tblDbi, tblName, dataType) {
let dbiTmpTbl = this.envTmp.openDbi({
name: 'tmpTbl',
keyIsBuffer: true,
create: true
});

let txnTmp = this.envTmp.beginTxn();
let txn = env.beginTxn();

let keyBufX = null;
let valBuf = null;
let valBufX = null;
let iCnt = 0;

let cursor = new lmdb.Cursor(txn, tblDbi, { keyIsBuffer: true });

for (let key = cursor.goToFirst(); key !== null; key = cursor.goToNext()) {
    iCnt++;
    try {
        valBuf = cursor.getCurrentBinary();

        if (dataType === 's'){
            txnTmp.putBinary(dbiTmpTbl, keyBufX, valBufX, { keyIsBuffer: true });
        } else {
            txnTmp.putBinary(dbiTmpTbl, keyBufX, valBuf, { keyIsBuffer: true });
        }
        txn.del(tblDbi, key);

    } catch (e) {
        console.log(e);
    }
}

txnTmp.commit();
txn.commit();
cursor.close();

txnTmp = this.envTmp.beginTxn();
txn = env.beginTxn();

cursor = new lmdb.Cursor(txnTmp, dbiTmpTbl, { keyIsBuffer: true });

for (let key = cursor.goToFirst(); key !== null; key = cursor.goToNext()) {
    try {
        valBuf = cursor.getCurrentBinary();
        if (dataType === 's'){
            txn.putBinary(tblDbi, key, valBuf, { keyIsBuffer: true });
        } else {
            txn.putBinary(tblDbi, key, valJSONBuf, { keyIsBuffer: true });
        }
    } catch (e) {
        console.log(e);
    }
}

txn.commit();
txnTmp.commit();
cursor.close();

dbiTmpTbl.drop(true);

@Venemo
Copy link
Owner

Venemo commented Jul 11, 2018

Not sure what would cause this but I'll investigate. Sorry for the slow reply.

@nwrbs
Copy link
Author

nwrbs commented Jul 12, 2018

It does seem a bit odd how it occurs. Thanks for the help.

@Venemo
Copy link
Owner

Venemo commented Jul 13, 2018

@nwbrad Does it only occur with one specific database, though?

@nwrbs
Copy link
Author

nwrbs commented Jul 13, 2018

Sort of. It is one specific Environment, I tried multiple copies of the Environment, restored from backup and tried older versions, all the same result. Nothing fancy about the Environment. It has databases that originally used Int Keys and String Keys to store either strings or numbers. The app that has the issue access LMDB through using binary/buffer access methods to convert utf16 string to utf8 strings and in some databases converts values from number values to utf8 strings. (Now all my programs use the binary/buffer process with LMDB. As I started using C and Java to access the same LMDB the utf16 and null terminator started causing issues. Plus I wanted to support 64 bit integers.)

The environment in question has 4 databases in it and seems to work fine other than this issue.
I'll call the database with the issue Database X the others as A , B and C respectively.
If I run the conversion on just X, I have no problem.
If I run the conversion A, B and C have no problem.
If I run the conversion on A, B, C and X then the problem occurs.

@Venemo
Copy link
Owner

Venemo commented Jul 14, 2018

@nwbrad Can you give me some version of this environment (possibly with some dummy data) where I can reproduce this problem?

@nwrbs
Copy link
Author

nwrbs commented Jul 16, 2018

I won't have time this week to do it, but I will try the week after next.

@Venemo
Copy link
Owner

Venemo commented Jul 19, 2018

Thanks @nwbrad I'm looking forward to seeing it.

@Venemo Venemo added the investigation Not sure what the problem is, yet. Not even sure if it's a bug or a feature, yet. label Aug 3, 2018
@Venemo Venemo changed the title *** Error in `node': double free or corruption (top): 0x000000000279ff30 *** Double free issue when using a cursor Aug 3, 2018
@Venemo
Copy link
Owner

Venemo commented Aug 3, 2018

@nwbrad Any news about this?

If you can give me a dataset and some code with which the issue is reproducible I am fairly sure I can fix the problem. However without that I would be just shooting in the dark.

@nwrbs
Copy link
Author

nwrbs commented Aug 3, 2018

Sorry, I'm caught up in a few other things I need to get done. I probably won't be able to look at this for a couple of weeks. If you want to close it and I can reopen it when I have time to create data for it, that would be okay.

@Venemo
Copy link
Owner

Venemo commented Aug 3, 2018

@nwbrad No problem, we can keep this open for a few more weeks. In the meantime if any other user encounters the same or similar issue, they can chime in too.

@KBInmarsat
Copy link

Five years late to the party but for what it's worth I had a similar problem and found that closing any cursor before commiting the transaction fixed the issue for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
investigation Not sure what the problem is, yet. Not even sure if it's a bug or a feature, yet.
Projects
None yet
Development

No branches or pull requests

3 participants