Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exit when compiling library build with RustlerPrecompiled on OTP 23 (NIF version 2.14) #23

Closed
whatyouhide opened this issue Jun 22, 2022 · 7 comments

Comments

@whatyouhide
Copy link
Contributor

It's me, again! After dealing with #21, I’m now left with a problem that seems a bit harder to debug.

I’m trying to use nimble_lz4 (the library I built with rustler_precompiled) as a dependency. I can't get it to compile in an Alpine container.

The smallest case I could reproduce is with this Dockerfile:

FROM hexpm/elixir:1.13.4-erlang-23.3.4.9-alpine-3.14.2

COPY reproduce_bug.exs reproduce_bug.exs

RUN mix do local.hex --force, local.rebar --force
RUN elixir reproduce_bug.exs

and this reproduce_bug.exs file:

Mix.install([{:nimble_lz4, "~> 0.1.2"}])

The Docker build exits with exit code 139, which is usually segmentation fault. The output of the Docker build is in the details below.

Output of the Docker build
#1 [internal] load build definition from Dockerfile
#1 sha256:871067c4caa77dfbe0a654d2a01a9bdbe6389ee58db65b3ba8ddc23fb03086b4
#1 transferring dockerfile: 305B done
#1 DONE 0.0s

#2 [internal] load .dockerignore
#2 sha256:5a334d37c1da119701ceb248e83bd4879f882e86589b5268143a39eca073ed5c
#2 transferring context: 2B done
#2 DONE 0.0s

#3 [internal] load metadata for docker.io/hexpm/elixir:1.13.4-erlang-23.3.4.9-alpine-3.14.2
#3 sha256:b8e862e530da8bdbcc6bcc9c374ed15293bcad31cc3f9afb487fd342a51a2c1c
#3 DONE 0.4s

#4 [1/5] FROM docker.io/hexpm/elixir:1.13.4-erlang-23.3.4.9-alpine-3.14.2@sha256:df3e9bf760df89c644d28d9c91e8d25ba52ab7759b54b5cbae2301c4f17a0cb9
#4 sha256:524ba9af531aef5ed83ba2f537f6c271bc5f417402597cddfbae2ad06d01299c
#4 DONE 0.0s

#5 [2/5] WORKDIR /app
#5 sha256:4e298d723636cf44229207b8e9ff0a96c22570110c3f318ea81e18c16c43eb2b
#5 CACHED

#6 [internal] load build context
#6 sha256:ee20b71ca3f1b3bde3d0b1065def2f21b7a46120426d8c5ab2160076f3bf0eae
#6 transferring context: 38B done
#6 DONE 0.0s

#7 [3/5] COPY reproduce_bug.exs reproduce_bug.exs
#7 sha256:6d990d326d70d25618512e982c831aba45312e3428ccfe28715ba175c6eb66af
#7 DONE 0.0s

#8 [4/5] RUN mix do local.hex --force, local.rebar --force
#8 sha256:04809ac8f07a861001ddb2ce81e115835ff1456736d437602c091953b22d8625
#8 1.135 * creating /root/.mix/archives/hex-1.0.1
#8 1.629 * creating /root/.mix/rebar
#8 2.250 * creating /root/.mix/rebar3
#8 DONE 2.3s

#9 [5/5] RUN elixir reproduce_bug.exs
#9 sha256:8c1177a76ba6d217ea4d63b6d2a9df18894ba31cf9e275882cf7ee6633b16f10
#9 0.687 Resolving Hex dependencies...
#9 0.785 Dependency resolution completed:
#9 0.785 New:
#9 0.785   castore 0.1.17
#9 0.785   jason 1.3.0
#9 0.785   nimble_lz4 0.1.2
#9 0.785   rustler 0.25.0
#9 0.785   rustler_precompiled 0.5.1
#9 0.785   toml 0.6.2
#9 0.787 * Getting nimble_lz4 (Hex package)
#9 0.874 * Getting rustler (Hex package)
#9 0.902 * Getting rustler_precompiled (Hex package)
#9 0.980 * Getting castore (Hex package)
#9 0.985 * Getting jason (Hex package)
#9 0.989 * Getting toml (Hex package)
#9 1.056 ==> jason
#9 1.056 Compiling 10 files (.ex)
#9 2.272 Generated jason app
#9 2.289 ==> castore
#9 2.289 Compiling 1 file (.ex)
#9 2.300 Generated castore app
#9 2.331 ==> toml
#9 2.331 Compiling 10 files (.ex)
#9 2.825 Generated toml app
#9 2.849 ==> rustler
#9 2.849 Compiling 7 files (.ex)
#9 2.977 Generated rustler app
#9 2.996 ==> rustler_precompiled
#9 2.996 Compiling 4 files (.ex)
#9 3.222 Generated rustler_precompiled app
#9 3.243 ==> nimble_lz4
#9 3.243 Compiling 1 file (.ex)
#9 3.273 
#9 3.273 08:04:42.533 [debug] Downloading NIF from https://github.com/whatyouhide/nimble_lz4/releases/download/v0.1.2/libnimblelz4-v0.1.2-nif-2.15-aarch64-unknown-linux-musl.so.tar.gz
#9 4.245 
#9 4.245 08:04:43.506 [debug] NIF cached at /root/.cache/rustler_precompiled/precompiled_nifs/libnimblelz4-v0.1.2-nif-2.15-aarch64-unknown-linux-musl.so.tar.gz and extracted to /root/.cache/mix/installs/elixir-1.13.4-erts-11.2.2.8/7e245c35d90a0fec919d11f60ee80a7b/_build/dev/lib/nimble_lz4/priv/native/libnimblelz4-v0.1.2-nif-2.15-aarch64-unknown-linux-musl.so
#9 ERROR: executor failed running [/bin/sh -c elixir reproduce_bug.exs]: exit code: 139
------
 > [5/5] RUN elixir reproduce_bug.exs:
------
executor failed running [/bin/sh -c elixir reproduce_bug.exs]: exit code: 139
                                                                                                                                                                                                                                                                                    
andrea ~/Code/xandra (andrea/public-lz4 *+)  
→ docker build . --no-cache -f Dockerfile --progress=plain -t tmp-test-to-remove && docker run --rm -it tmp-test-to-remove /bin/sh
#1 [internal] load build definition from Dockerfile
#1 sha256:1315230573096377824b74f209ac2ab9c5f62e7b29284278955bac9b3a8c2fd4
#1 transferring dockerfile: 236B 0.0s done
#1 DONE 0.0s

#2 [internal] load .dockerignore
#2 sha256:1e5862993a999d639eafcedfa8fb4942f13918a31611bb91d365e51db2594ab9
#2 transferring context: 2B done
#2 DONE 0.0s

#3 [internal] load metadata for docker.io/hexpm/elixir:1.13.4-erlang-23.3.4.9-alpine-3.14.2
#3 sha256:b8e862e530da8bdbcc6bcc9c374ed15293bcad31cc3f9afb487fd342a51a2c1c
#3 DONE 0.9s

#8 [1/5] FROM docker.io/hexpm/elixir:1.13.4-erlang-23.3.4.9-alpine-3.14.2@sha256:df3e9bf760df89c644d28d9c91e8d25ba52ab7759b54b5cbae2301c4f17a0cb9
#8 sha256:524ba9af531aef5ed83ba2f537f6c271bc5f417402597cddfbae2ad06d01299c
#8 DONE 0.0s

#4 [2/5] WORKDIR /app
#4 sha256:4e298d723636cf44229207b8e9ff0a96c22570110c3f318ea81e18c16c43eb2b
#4 CACHED

#9 [internal] load build context
#9 sha256:4f0e5f38fe6ec4187f5c7d50d9e489e04a3229b0450540b0a22b550c74a6ea3c
#9 transferring context: 85B 0.0s done
#9 DONE 0.0s

#5 [3/5] COPY reproduce_bug.exs reproduce_bug.exs
#5 sha256:72a1e0ead5620ca66bcbb5d56cbc19b9905ea27db07f6d209ec6f374e15b8543
#5 DONE 0.0s

#6 [4/5] RUN mix do local.hex --force, local.rebar --force
#6 sha256:9036ac68e8469719963de7ce1859ad18fbd4e0caedbca18b293bf692e5ba3444
#6 1.099 * creating /root/.mix/archives/hex-1.0.1
#6 1.576 * creating /root/.mix/rebar
#6 2.263 * creating /root/.mix/rebar3
#6 DONE 2.3s

#7 [5/5] RUN elixir reproduce_bug.exs
#7 sha256:d500fa3e278df5ea36ca6be8d8f0f536c059fb104cfc36f4f9548a712ac7d24d
#7 0.910 Resolving Hex dependencies...
#7 1.011 Dependency resolution completed:
#7 1.012 New:
#7 1.012   castore 0.1.17
#7 1.012   jason 1.3.0
#7 1.012   nimble_lz4 0.1.2
#7 1.012   rustler 0.25.0
#7 1.012   rustler_precompiled 0.5.1
#7 1.012   toml 0.6.2
#7 1.013 * Getting nimble_lz4 (Hex package)
#7 1.146 * Getting rustler (Hex package)
#7 1.177 * Getting rustler_precompiled (Hex package)
#7 1.184 * Getting castore (Hex package)
#7 1.190 * Getting jason (Hex package)
#7 1.196 * Getting toml (Hex package)
#7 1.276 ==> jason
#7 1.276 Compiling 10 files (.ex)
#7 2.433 Generated jason app
#7 2.453 ==> castore
#7 2.453 Compiling 1 file (.ex)
#7 2.467 Generated castore app
#7 2.502 ==> toml
#7 2.502 Compiling 10 files (.ex)
#7 3.039 Generated toml app
#7 3.065 ==> rustler
#7 3.065 Compiling 7 files (.ex)
#7 3.191 Generated rustler app
#7 3.210 ==> rustler_precompiled
#7 3.210 Compiling 4 files (.ex)
#7 3.436 Generated rustler_precompiled app
#7 3.458 ==> nimble_lz4
#7 3.458 Compiling 1 file (.ex)
#7 3.489 
#7 3.489 08:07:35.243 [debug] Downloading NIF from https://github.com/whatyouhide/nimble_lz4/releases/download/v0.1.2/libnimblelz4-v0.1.2-nif-2.15-aarch64-unknown-linux-musl.so.tar.gz
#7 4.231 
#7 4.231 08:07:35.987 [debug] NIF cached at /root/.cache/rustler_precompiled/precompiled_nifs/libnimblelz4-v0.1.2-nif-2.15-aarch64-unknown-linux-musl.so.tar.gz and extracted to /root/.cache/mix/installs/elixir-1.13.4-erts-11.2.2.8/7e245c35d90a0fec919d11f60ee80a7b/_build/dev/lib/nimble_lz4/priv/native/libnimblelz4-v0.1.2-nif-2.15-aarch64-unknown-linux-musl.so
#7 ERROR: executor failed running [/bin/sh -c elixir reproduce_bug.exs]: exit code: 139

Interestingly enough, this seems to work if I bump the OTP version (using the hexpm/elixir:1.13.4-erlang-24.2-alpine-3.14.2 Docker image). Is this a bug in OTP, or is something going on with RustlerPrecompiled as far as you can tell?

@philss
Copy link
Owner

philss commented Jun 22, 2022

@whatyouhide this is unfortunate :/

I think this might be a bug with this OTP version on systems with musl - I tested with a Debian container and it works fine.
Testing with other libs (html5ever and explorer) also give me a segfault :/

I will try to debug this later today.

@whatyouhide
Copy link
Contributor Author

Ah @philss, that's great to know that this is not an isolated issue. Thanks for investigating 💟

@philss
Copy link
Owner

philss commented Jun 22, 2022

@whatyouhide I made some progress in the debugging, but I couldn't find the root cause yet.

What I found:

  • compiling the NIF from source works good for that target, using the same image - this can be an quickfix for getting your system running. The only drawback is that you need to install the Rust toolchain and wait a little bit more.
  • compiling Erlang from source, from the same version, and installing it made the program at least fail with an error message. The error said that it couldn't find ld-linux-x86-64.so.2, and it was needed by the NIF. So I had to run this to make it work: ln -s /lib/libc.musl-x86_64.so.1 /lib/ld-linux-x86-64.so.2

My hypothesis is that something in the build step of the Erlang image broke the load of NIFs for that OTP version. But I'm not sure if this is correct, because the build for version 24 is identical.
Maybe this is related to stripped binaries.

I'm going to try building from source with that Dockerfile, but without stripping the binaries.

@philss
Copy link
Owner

philss commented Jun 23, 2022

Building from scratch does not help. It still have a segfault. But at least I could get more details about the error.
Running gdb gives me this backtrace:

warning: Could not load shared library symbols for /root/.cache/mix/installs/elixir-1.13.4-erts-11.2.2.8/7e245c35d90a0fec919d11f60ee80a7b/_build/dev/lib/nimble_lz4/priv/native/libnimblelz4-v0.1.2-nif-2.15-x86_64-unknown-linux-musl.so.
Do you need "set solib-search-path" or "set sysroot"?
--Type <RET> for more, q to quit, c to continue without paging--
Core was generated by `/usr/local/lib/erlang/erts-11.2.2.8/bin/beam.smp -- -root /usr/local/lib/erlang'.
Program terminated with signal SIGSEGV, Segmentation fault.

warning: Section `.reg-xstate/94' in core file too small.
#0  0x00007f7b7718980b in strlen () from /lib/ld-musl-x86_64.so.1
[Current thread is 1 (LWP 94)]
(gdb) backtrace
#0  0x00007f7b7718980b in strlen () from /lib/ld-musl-x86_64.so.1
#1  0x000056469785044d in erts_printf_format (fn=fn@entry=0x564697852130 <erts_write_ds>, arg=arg@entry=0x7f7b36b00230,
    fmt=fmt@entry=0x5646978b0f48 "That '%T' NIF library needs %s or newer. Either try to recompile the NIF lib or use a newer erts runtime.", ap=ap@entry=0x7f7b2e40f670)
    at common/erl_printf_format.c:770
#2  0x0000564697852e17 in erts_vdsprintf (dsbufp=dsbufp@entry=0x7f7b36b00230,
    format=0x5646978b0f48 "That '%T' NIF library needs %s or newer. Either try to recompile the NIF lib or use a newer erts runtime.", format@entry=0x7f7b2e40f668 "",
    arglist=arglist@entry=0x7f7b2e40f670) at common/erl_printf.c:471
#3  0x000056469779dfb4 in load_nif_error (p=p@entry=0x7f7b30a40f98, atom=atom@entry=0x5646978b1248 <bad_lib> "bad_lib", format=0x7f7b2e40f668 "") at beam/erl_nif.c:4162
#4  0x00005646977a7ba7 in erts_load_nif (c_p=0x7f7b30a40f98, I=<optimized out>, filename=<optimized out>, args=15) at beam/erl_nif.c:4432
#5  0x00005646975f5f7d in process_main (x_reg_array=0x7f7b2fe16580, f_reg_array=0x8080808080808080) at x86_64-pc-linux-musl/opt/smp/beam_cold.h:491
#6  0x00005646976027bc in sched_thread_func (vesdp=0x7f7b2fec5280) at beam/erl_process.c:8560
#7  0x000056469784dc93 in thr_wrapper (vtwd=0x7ffdb433d790) at pthread/ethread.c:122
#8  0x00007f7b7718c1bb in ?? () from /lib/ld-musl-x86_64.so.1
#9  0x0000000000000000 in ?? ()

Perhaps the machine/container that is compiling the NIF (at GitHub Actions CI server) is generating an outdated structure, or something like that. Tomorrow I will try to test releasing the NIF with another container.

@whatyouhide
Copy link
Contributor Author

That's awesome @philss, thank you so much for investigating. I will have a little bit more time over the weekend to dedicate to this, so let me know how things will be looking by then! 💟

@philss
Copy link
Owner

philss commented Jun 24, 2022

@whatyouhide I found the problem! It was a missing configuration for cross 😞

I totally missed the section about passing down environment variables.
In my mind it was being passed automatically from the host, but this is not true.

We need the RUSTLER_NIF_VERSION to be set in order to generate the correct lib binary. This explains why it works fine for OTP 24 and 25, because we use the latest NIF version in case that var is missing.

I'm going to update the docs and send PRs to the projects using rustler_precompiled.

philss added a commit to rusterlium/html5ever_elixir that referenced this issue Jun 24, 2022
This is important because `cross` don't have access by default to the
env vars in the system.

This is related to philss/rustler_precompiled#23
philss added a commit to rusterlium/html5ever_elixir that referenced this issue Jun 24, 2022
This is important because `cross` don't have access by default to the
env vars in the system.

This is related to philss/rustler_precompiled#23
philss added a commit to philss/nimble_lz4 that referenced this issue Jun 24, 2022
This is important because `cross` don't have access by default to the
env vars in the system.

This is related to philss/rustler_precompiled#23
philss added a commit to elixir-explorer/explorer that referenced this issue Jun 24, 2022
This is important because `cross` don't have access by default to the
env vars in the system.

This is related to philss/rustler_precompiled#23
philss added a commit to elixir-explorer/explorer that referenced this issue Jun 24, 2022
This is important because `cross` don't have access by default to the
env vars in the system.

This is related to philss/rustler_precompiled#23
@whatyouhide
Copy link
Contributor Author

Aaaaah, that's fantastic! Great catch. And thanks for the PR to nimble_lz4 💟

whatyouhide pushed a commit to whatyouhide/nimble_lz4 that referenced this issue Jun 25, 2022
This is important because `cross` don't have access by default to the
env vars in the system.

This is related to philss/rustler_precompiled#23
philss added a commit to philss/mjml_nif that referenced this issue Jun 26, 2022
This is important because `cross` don't have access by default to the
env vars in the system.

This is related to philss/rustler_precompiled#23
philss added a commit to philss/wireguardex that referenced this issue Jun 27, 2022
This change is necessary in order to use the correct version of NIF for
the target that is being built with `cross`.
By default, `cross` won´t  pass down the env vars from the host machine.

See philss/rustler_precompiled#23 for further details.
philss added a commit to philss/qrusty that referenced this issue Jun 27, 2022
This is important because `cross` don't have access by default to the
env vars of the system.

This is related to philss/rustler_precompiled#23
nbw pushed a commit to nbw/qrusty that referenced this issue Jun 28, 2022
This is important because `cross` don't have access by default to the
env vars of the system.

This is related to philss/rustler_precompiled#23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants