Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[regression] Undefined symbols in sdk-nightly-1.37.2-2017_02_03_23_19-64bit #4960

Closed
trzecieu opened this issue Feb 20, 2017 · 15 comments
Closed

Comments

@trzecieu
Copy link
Contributor

trzecieu commented Feb 20, 2017

Hi,
sdk-nightly-1.37.2-2017_02_03_23_19-64bit has introduced to our compilation process problem with undefined symbols, it looks that every further release has this problem.
All symbols come form our game implementation.

Tested project was compiled with 'debug' mode. Please let me know how I can provide you more information.

@kripken
Copy link
Member

kripken commented Feb 23, 2017

Strange, I'm not aware of changes around that time that seem like they could influence that. Bisecting to the specific commit that changed things for you would help. If that's not possible a testcase that reproduces the issue would be good.

@artur-matuszewski
Copy link

This still holds true for sdk-nightly-1.37.3-2017_02_23_15_27-64bit and building in 'release' mode as well.
The worst part is that it looks to be non-deterministic.
When compiling the same code with the same SDK multiple times in a row, some of the builds will succeed, some will fail, and the failures might be on different symbols missing. But all missing symbols are from our project that is being compiled.

@juj
Copy link
Collaborator

juj commented Feb 24, 2017

Does building with the environment variable EMCC_CORES=1 work around the issue?

@artur-matuszewski
Copy link

After the last few hours of testing, I think I can say it is related to linking ASM.js and WASM targets in parallel. Our setup looks more or less like this:

add_library("sample_lib" ...)

add_executable("sample" ...)
target_link_libraries("sample" "sample_lib")

add_executable("sample_wasm" ...)
target_link_libraries("sample_wasm" "sample_lib")
set_target_properties("sample_wasm" PROPERTIES LINK_FLAGS "-s WASM=1")

This works fine for small sample projects. It also mostly works fine on bigger projects on a machine with excess power. However when building our big projects on a VM (4 cores, 8 GB) this setup has a ~60% failure rate with random symbols reported missing.
Adding a dependency between the targets like follows, seems to remove the issue (at least a couple successful builds in a row so far).

add_dependencies("sample_wasm" "sample")

This issue is present in all recent nightlies, including sdk-nightly-1.37.3-2017_02_23_22_38-64bit, but did not occur on sdk-nightly-1.37.1-2017_01_19_18_44-64bit.

@artur-matuszewski
Copy link

Setting EMCC_CORES=1 does not prevent this issue. Unresolved symbols still occur.

@trzecieu
Copy link
Contributor Author

trzecieu commented Mar 6, 2017

Hi,
Sorry, It takes some time for us to narrow it down.
What I can say is that from version sdk-nightly-1.37.1-2017_01_28_03_57-64bit we started to see:

warning: unresolved symbol: $emscriptenWebGLGetHeapForType
warning: unresolved symbol: $emscriptenWebGLGetShiftForType
warning: unresolved symbol: $emscriptenWebGLGetHeapForType
warning: unresolved symbol: $emscriptenWebGLGetShiftForType

Then starting from sdk-nightly-1.37.2-2017_02_03_23_19-64bit we see a legitimate missing symbols.

Because of non-deterministic nature of this issue, I'm not 100% positive yet that mentioned version is the one what introduced this behavior.

@artur-matuszewski
Copy link

The issue still persists, in latest nightlies as well as latest incomming (as of this morning, March 7th, 7:30 CET).

The workaround described above (preventing linking ASM.js and WASM in parallel) does not completely remove the problem, but drastically reduces the failure rate. It failed once with random symbols missing in a period of a week or so.

The random missing symbols happen when linking ASM.js.

@juj
Copy link
Collaborator

juj commented Mar 7, 2017

I suppose the VM client OS is 64-bit? How much swap space does it have, can running out of swap space be safely ruled out? Have you ever seen the issue outside that VM environment? Which VM software and host/client OS are being used?

The call stack text file I got via email suggests that an earlier llvm link/output run has written out a corrupted object file, which causes the later llvm-nm call to crash.

@trzecieu
Copy link
Contributor Author

trzecieu commented Mar 7, 2017

Hi @juj, in my cases it was native Linux Ubuntu 16.04, also I reproduced this on Fedora 25. 32GB of RAM for both cases.
It's a test project - not such complex as a regular game.

@artur-matuszewski
Copy link

@juj the issues are reproduced on multiple machines (both VM and bare metal), running out of memory can be safely ruled out as the memory usage never goes above 50%.

I am testing our big project on various versions of 64-bit Ubuntu Linux (14.04 LTS, 15.04, and 16.04.1 LTS).
The VM where I test most often is a 64-bit Ubuntu 15.04 running in a VirtualBox with 4 cores, 8 GB ram and 4 GB swap on a 64-bit Windows 7 host. The bare metal is a 64-bit Ubuntu 16.04.1 LTS with 8 cores and 32 GB ram (in this case memory usage never goes above 20% during build). The disk usage in the /tmp partition does not go above 50% either, so it is not running out of disk space as far as I can tell.

The call stack I mailed you was running on bare metal.

juj added a commit to juj/emscripten that referenced this issue Mar 7, 2017
…ss from subprocesses. Clean up temporary directory created for llvm-ar at process exit. Fix race condition in llvm-ar temporary directory creation. Fixes emscripten-core#4960.
juj added a commit to juj/emscripten that referenced this issue Mar 7, 2017
…ss from subprocesses. Clean up temporary directory created for llvm-ar at process exit. Fix race condition in llvm-ar temporary directory creation. Fixes emscripten-core#4960.
@juj
Copy link
Collaborator

juj commented Mar 7, 2017

I was able to reproduce this by building another big project multiple times in parallel, and I think I found the issue. The above PR should provide a fix.

@trzecieu
Copy link
Contributor Author

trzecieu commented Mar 7, 2017

Hi,
In order to speed up testing I took the latest nightly - sdk-nightly-1.37.3-2017_03_07_04_07-64bit.

Test configuration:
SDK: sdk-nightly-1.37.3-2017_03_07_04_07-64bit
OS: Native Linux Ubuntu 16.04, 32GB of RAM
Parallel asm.js and wasm.js linkage, explained here: #4960 (comment)

Test unmodified nightly, 10 times

  • 3 times - unresolved symbol, each time with a different set of symbols.
  • 1 time:
/home/piotr/Projects/ff-sample/mobile/emsdk/clang/nightly-e1.37.3-2017_03_07_04_07/llvm-nm: /tmp/_home_piotr_Projects_ff-sample_mobile_projects_emscripten_release_wasm_v2_ff_resources_libresources.a.archive_contents/FFSomeRandomClassImplementation.cpp.o: The file was not recognized as a valid object file.
  • 1 time:
#0 0x00000000004fb3b8 (/home/piotr/Projects/ff-sample/mobile/emsdk/clang/nightly-e1.37.3-2017_03_07_04_07/llvm-nm+0x4fb3b8)
#1 0x00000000004f9bde (/home/piotr/Projects/ff-sample/mobile/emsdk/clang/nightly-e1.37.3-2017_03_07_04_07/llvm-nm+0x4f9bde)
#2 0x00000000004f9d2c (/home/piotr/Projects/ff-sample/mobile/emsdk/clang/nightly-e1.37.3-2017_03_07_04_07/llvm-nm+0x4f9d2c)
#3 0x00007f1b9d1c7390 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x11390)
#4 0x00007f1b9c495435 (/lib/x86_64-linux-gnu/libc.so.6+0x14d435)
#5 0x00000000005add54 (/home/piotr/Projects/ff-sample/mobile/emsdk/clang/nightly-e1.37.3-2017_03_07_04_07/llvm-nm+0x5add54)
#6 0x000000000051ebdc (/home/piotr/Projects/ff-sample/mobile/emsdk/clang/nightly-e1.37.3-2017_03_07_04_07/llvm-nm+0x51ebdc)
#7 0x0000000000508ead (/home/piotr/Projects/ff-sample/mobile/emsdk/clang/nightly-e1.37.3-2017_03_07_04_07/llvm-nm+0x508ead)
#8 0x0000000000513da8 (/home/piotr/Projects/ff-sample/mobile/emsdk/clang/nightly-e1.37.3-2017_03_07_04_07/llvm-nm+0x513da8)
#9 0x000000000051e51c (/home/piotr/Projects/ff-sample/mobile/emsdk/clang/nightly-e1.37.3-2017_03_07_04_07/llvm-nm+0x51e51c)
#10 0x000000000051e5e3 (/home/piotr/Projects/ff-sample/mobile/emsdk/clang/nightly-e1.37.3-2017_03_07_04_07/llvm-nm+0x51e5e3)
#11 0x00000000004b05de (/home/piotr/Projects/ff-sample/mobile/emsdk/clang/nightly-e1.37.3-2017_03_07_04_07/llvm-nm+0x4b05de)
#12 0x00000000004b7022 (/home/piotr/Projects/ff-sample/mobile/emsdk/clang/nightly-e1.37.3-2017_03_07_04_07/llvm-nm+0x4b7022)
#13 0x00000000004ab4ef (/home/piotr/Projects/ff-sample/mobile/emsdk/clang/nightly-e1.37.3-2017_03_07_04_07/llvm-nm+0x4ab4ef)
#14 0x000000000041cbdb (/home/piotr/Projects/ff-sample/mobile/emsdk/clang/nightly-e1.37.3-2017_03_07_04_07/llvm-nm+0x41cbdb)
#15 0x000000000040e25c _Unwind_Backtrace (/home/piotr/Projects/ff-sample/mobile/emsdk/clang/nightly-e1.37.3-2017_03_07_04_07/llvm-nm+0x40e25c)
#16 0x00007f1b9c368830 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x20830)
#17 0x0000000000412fd9 _Unwind_Backtrace (/home/piotr/Projects/ff-sample/mobile/emsdk/clang/nightly-e1.37.3-2017_03_07_04_07/llvm-nm+0x412fd9)
Stack dump:
0.	Program arguments: /home/piotr/Projects/ff-sample/mobile/emsdk/clang/nightly-e1.37.3-2017_03_07_04_07/llvm-nm /tmp/_home_piotr_Projects_ff-sample_mobile_projects_emscripten_release_wasm_v2_ff_scene_libscene.a.archive_contents/FFSomeRandomClassImplementation.cpp.o 
  • 5 times build was successful and both ASM and WASM version work

Test of patched nightly version with #5004

  • 10 successful builds 👍

In meantime while I will do a stress test of this solution with more samples, @artur-matuszewski might try to confirm another case with a larger project and sequential linkage.


Edit: Text correction, ATM: Artur didn't confirm that it works for his use-case.

@trzecieu
Copy link
Contributor Author

trzecieu commented Mar 8, 2017

Hi @juj
Some more test data:

Test configuration

SDK: 'sdk-incoming-64bit' compiled with --build=RelWithDebInfo
emcc: emcc (Emscripten gcc/clang-like replacement) 1.37.3 (commit eac8e0b0142ae5466a79b8aec070521dfd75ea76)
OS: Native Linux Ubuntu 16.04, 32GB of RAM
Compilation:

Unmodified SDK

  • 30 samples in total
  • 10 samples with missing symbols, each time different set of symbols
  • 1 sample with llvm error like below (likely it was the last sample)
#0 0x000000000052e2f8 llvm::sys::PrintStackTrace(llvm::raw_ostream&) /home/piotr/Projects/ff-sample/mobile/emsdk/clang/fastcomp/src/lib/Support/Unix/Signals.inc:406:0
#1 0x000000000052c94e llvm::sys::RunSignalHandlers() /home/piotr/Projects/ff-sample/mobile/emsdk/clang/fastcomp/src/lib/Support/Signals.cpp:45:0
#2 0x000000000052cac2 SignalHandler(int) /home/piotr/Projects/ff-sample/mobile/emsdk/clang/fastcomp/src/lib/Support/Unix/Signals.inc:246:0
#3 0x00007f308c490390 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x11390)
#4 0x00007f308b544435 /build/glibc-t3gR2i/glibc-2.23/string/../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:136:0
#5 0x00000000005ec960 (anonymous namespace)::RawMemoryObject::readBytes(unsigned char*, unsigned long, unsigned long) const /home/piotr/Projects/ff-sample/mobile/emsdk/clang/fastcomp/src/lib/Support/StreamingMemoryObject.cpp:63:0
#6 0x0000000000555c83 llvm::SimpleBitstreamCursor::fillCurWord() /home/piotr/Projects/ff-sample/mobile/emsdk/clang/fastcomp/src/include/llvm/Bitcode/BitstreamReader.h:248:0
#7 0x0000000000555c83 llvm::SimpleBitstreamCursor::Read(unsigned int) /home/piotr/Projects/ff-sample/mobile/emsdk/clang/fastcomp/src/include/llvm/Bitcode/BitstreamReader.h:284:0
#8 0x0000000000555c83 llvm::SimpleBitstreamCursor::ReadVBR(unsigned int) /home/piotr/Projects/ff-sample/mobile/emsdk/clang/fastcomp/src/include/llvm/Bitcode/BitstreamReader.h:303:0
#9 0x0000000000555c83 llvm::BitstreamCursor::readRecord(unsigned int, llvm::SmallVectorImpl<unsigned long>&, llvm::StringRef*) /home/piotr/Projects/ff-sample/mobile/emsdk/clang/fastcomp/src/lib/Bitcode/Reader/BitstreamReader.cpp:180:0
#10 0x000000000053a586 (anonymous namespace)::BitcodeReader::parseAttributeGroupBlock() /home/piotr/Projects/ff-sample/mobile/emsdk/clang/fastcomp/src/lib/Bitcode/Reader/BitcodeReader.cpp:1524:0
#11 0x0000000000548518 (anonymous namespace)::BitcodeReader::parseModule(unsigned long, bool) /home/piotr/Projects/ff-sample/mobile/emsdk/clang/fastcomp/src/lib/Bitcode/Reader/BitcodeReader.cpp:3589:0
#12 0x0000000000554946 parseBitcodeInto /home/piotr/Projects/ff-sample/mobile/emsdk/clang/fastcomp/src/lib/Bitcode/Reader/BitcodeReader.cpp:4073:0
#13 0x0000000000554946 getBitcodeModuleImpl(std::unique_ptr<llvm::DataStreamer, std::default_delete<llvm::DataStreamer> >, llvm::StringRef, (anonymous namespace)::BitcodeReader*, llvm::LLVMContext&, bool, bool) /home/piotr/Projects/ff-sample/mobile/emsdk/clang/fastcomp/src/lib/Bitcode/Reader/BitcodeReader.cpp:6556:0
#14 0x00000000005549fa getLazyBitcodeModuleImpl(std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer> >&&, llvm::LLVMContext&, bool, bool) /usr/include/c++/5/bits/unique_ptr.h:235:0
#15 0x0000000000554ab3 llvm::getLazyBitcodeModule(std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer> >&&, llvm::LLVMContext&, bool) /home/piotr/Projects/ff-sample/mobile/emsdk/clang/fastcomp/src/lib/Bitcode/Reader/BitcodeReader.cpp:6600:0
#16 0x00000000004e0cc2 llvm::ErrorOr<std::unique_ptr<llvm::Module, std::default_delete<llvm::Module> > >::getError() const /home/piotr/Projects/ff-sample/mobile/emsdk/clang/fastcomp/src/include/llvm/Support/ErrorOr.h:170:0
#17 0x00000000004e0cc2 llvm::object::IRObjectFile::create(llvm::MemoryBufferRef, llvm::LLVMContext&) /home/piotr/Projects/ff-sample/mobile/emsdk/clang/fastcomp/src/lib/Object/IRObjectFile.cpp:323:0
#18 0x00000000004e93cd llvm::object::SymbolicFile::createSymbolicFile(llvm::MemoryBufferRef, llvm::sys::fs::file_magic, llvm::LLVMContext*) /home/piotr/Projects/ff-sample/mobile/emsdk/clang/fastcomp/src/lib/Object/SymbolicFile.cpp:38:0
#19 0x00000000004db957 llvm::object::createBinary(llvm::MemoryBufferRef, llvm::LLVMContext*) /home/piotr/Projects/ff-sample/mobile/emsdk/clang/fastcomp/src/lib/Object/Binary.cpp:66:0
#20 0x000000000041f3a6 llvm::Expected<std::unique_ptr<llvm::object::Binary, std::default_delete<llvm::object::Binary> > >::operator bool() /home/piotr/Projects/ff-sample/mobile/emsdk/clang/fastcomp/src/include/llvm/Support/Error.h:685:0
#21 0x000000000041f3a6 dumpSymbolNamesFromFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) /home/piotr/Projects/ff-sample/mobile/emsdk/clang/fastcomp/src/tools/llvm-nm/llvm-nm.cpp:1086:0
#22 0x000000000040fcbc void (*std::for_each<__gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, void (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)>(__gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, __gnu_cxx::__normal_iterator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, void (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)))(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) /usr/include/c++/5/bits/stl_algo.h:3766:0
#23 0x000000000040fcbc main /home/piotr/Projects/ff-sample/mobile/emsdk/clang/fastcomp/src/tools/llvm-nm/llvm-nm.cpp:1417:0
#24 0x00007f308b417830 __libc_start_main /build/glibc-t3gR2i/glibc-2.23/csu/../csu/libc-start.c:325:0
#25 0x0000000000415069 _start (/home/piotr/Projects/ff-sample/mobile/emsdk/clang/fastcomp/build_incoming_64/bin/llvm-nm+0x415069)
Stack dump:
0.	Program arguments: /home/piotr/Projects/ff-sample/mobile/emsdk/clang/fastcomp/build_incoming_64/bin/llvm-nm /tmp/_home_piotr_Projects_ff-sample_mobile_projects_emscripten_release_wasm_v2_ff_resources_libresources.a.archive_contents/FFMeshData.cpp.o 
wrote symbol map file to build/enginesample.js.symbols

Please note that in this test error rate is lower than in nightly build.

Patched SDK with #5004

  • 200 samples in total
  • 0 samples with missing symbols
  • 0 samples with llvm error

poke @kripken:
Just in case I verified sizes of *.wasm builds and I found inconsistency:

  • 5 builds have a different size by 20-100 bytes. Full application size is 1792597B
    Unfortunately I don't have neither wast nor base asm.js code for those builds.
  • these 5 builds are running just fine in the browser, but it might be something non deterministic.
  • example diff of wast build (wast generated by wasm-dis tool form wasm.
513,515c513,515
<   (export "_testSetjmp" (func $4282))
<   (export "_saveSetjmp" (func $4278))
<   (export "_free" (func $3870))
---
>   (export "_testSetjmp" (func $4283))
>   (export "_saveSetjmp" (func $4279))
>   (export "_free" (func $3871))

later in a whole file enumeration of functions is shifted by 1,

  • I haven't noticed this inconsistency in separated ASM.js build.
  • Once I have more data, I create a new issue.

@artur-matuszewski
Copy link

Since #5004 is already merged, I tested the latest incoming version on our big project on bare metal (described in my previous comment). 20 successful builds with no random errors nor missing symbols. So I can confirm that this either fixes this issue, or significantly reduces the failure rates. I'll set up a longer test in a VM and report back if I run into any issues.

@trzecieu
Copy link
Contributor Author

trzecieu commented Mar 8, 2017

I think this case is closed, I spawned: WebAssembly/binaryen#936 to investigate wasm build corruption.
Thank you!

@trzecieu trzecieu closed this as completed Mar 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants