Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle (either support correctly, or forbid) non-fPIC shared objects #1004

Closed
nyh opened this issue Oct 21, 2018 · 6 comments
Closed

Handle (either support correctly, or forbid) non-fPIC shared objects #1004

nyh opened this issue Oct 21, 2018 · 6 comments

Comments

@nyh
Copy link
Contributor

nyh commented Oct 21, 2018

This issue comes from this thread by @wkozaczuk: https://groups.google.com/d/msg/osv-dev/F7OAss6WNb0/SbIo2jPqBwAJ

OSv needs to relocate a shared object to an arbitrary memory offset we loaded it. According to my current understanding, both position-dependent (no -fPIC) and position-independent (-fPIC) code can be relocated. But, to relocate position-dependent code, one needs to modify the executable code (text segment) itself, while position-independent code uses a separate relocation table (PLT).

The former (position-dependent-code relocation) has a bunch of downsides - most importantly it means a multi-process OS like Linux can't share multiple copies of the same executable code in memory. This is why it is discouraged. But it doesn't mean it can't work!

We may be able support non-fPIC shared objects by noticing the DT_TEXTREL marker exists, and then map all the segments with forced write permissions, do all the relocations up-front, and then in fix_permissions(), mprotect() all the segments with the desired final permissions - not just the relro segment.

Alternatively, we can decide not to bother with this, and insist that -fPIC be always used (as it is anyway recommended) when building code for OSv. In that case, we should check for the existence of DT_TEXTREL and refuse to load such object, stating that -fPIC should have been used (we already have other cases where we throw invalid_elf_error exceptions when we don't like the executable).

@nyh
Copy link
Contributor Author

nyh commented Oct 21, 2018

Is reproduced by

$ scripts/build image=graalvm-example
$ scripts/run.py
$ readelf -d apps/graalvm-example/libhello.so | grep TEXTREL
0x0000000000000016 (TEXTREL)            0x0

I don't know enough about this package to understand how libhello.so got compiled and where the non-fPIC code got in. @wkozaczuk maybe you can ask the Graalvm developers about that?

@wkozaczuk
Copy link
Collaborator

I am leaning towards implementing its because:

  1. Seems pretty easy
  2. OSv aims to be Linux compatible and Linux supports

However given we would want to create a unit test I wonder how one creates non-fPIC 64-bit shared library which would be needed by such test?

@nyh Yes I will open an issue with graalvm developers to see exactly why libhello.so is non-fPIC and where it comes from.

@wkozaczuk
Copy link
Collaborator

OK. Based on this article it looks like you can create non-fPIC 64-bit share libraries if you compile for "large model", whatever that means: https://eli.thegreenplace.net/2012/01/03/understanding-the-x64-code-models/

So this works without -fPIC:
-mcmodel=large

@nyh
Copy link
Contributor Author

nyh commented Oct 21, 2018

Interesting observation. Indeed sometimes trying to build a shared object without -fPIC fails, claiming unsupported relocations during the linking, but then with -mcmodel=large it succeeds and agrees to build a non-fPIC shared object:

When I do the following patch to apps/rogue/GET

diff --git a/rogue/GET b/rogue/GET
index fdf6273..ca56f74 100755
--- a/rogue/GET
+++ b/rogue/GET
@@ -18,7 +18,7 @@ patch -p1 < ../../ncurses-5.9-gcc-5.patch
 # fails to read the extended part (#if XNAMES). Try to think why. Is this
 # an OSv specific bug? But for now, rogue works fine if we just don't support
 # these extended entries.
-CFLAGS="-fPIC -O2" ./configure --prefix=/usr --without-debug --without-tests --without-shared --without-cxx-binding --without-ada --disable-tcap-names
+CFLAGS="-mcmodel=large -O2" ./configure --prefix=/usr --without-debug --without-tests --without-shared --without-cxx-binding --without-ada --disable-tcap-names
 make
 cd ..
 
@@ -30,9 +30,10 @@ tar zxvf master.tar.gz
 ln -s rogue-master rogue5.4.4
 
 cd rogue5.4.4
-CFLAGS="-fPIC -O2 -I../ncurses-5.9/include" ./configure --prefix=/usr
+CFLAGS="-mcmodel=large -O2 -I../ncurses-5.9/include" ./configure --prefix=/usr
 cat >> Makefile << END
 LDFLAGS=-pie
+#LDFLAGS=-shared
 LIBS=../ncurses-5.9/lib/libncurses.a
 END
 make

the result is a PIE excutable which runs correctly on Linux:

TERM=vt100 apps/rogue/upstream/rogue-master/rogue

But fails on OSv, exactly the way described in the mailing list thread:

$ scripts/run.py
OSv v0.52.0
eth0: 192.168.122.15
page fault outside application, addr: 0x0000100001435000
[registers]
RIP: 0x000000000039d1ff <elf::object::arch_relocate_rela(unsigned int, unsigned int, void*, long)+351>
RFL: 0x0000000000010202  CS:  0x0000000000000008  SS:  0x0000000000000010
RAX: 0x0000000000000001  RBX: 0xffffa00000d42c00  RCX: 0x0000100001435c8d  RDX: 0x0000000000000000
RSI: 0x0000000000000000  RDI: 0xffffa00000d42c00  RBP: 0x00002000000ff570  R8:  0x000010000143dd40
R9:  0x0000000000000008  R10: 0xffffa00002240140  R11: 0xffffa0000186a9b0  R12: 0x000000000003dd40
R13: 0x0000000000000008  R14: 0xffffa00002240140  R15: 0xffffa0000186a9b0  RSP: 0x00002000000ff530
Aborted

[backtrace]
0x000000000033fca8 <???+3407016>
0x000000000034118a <mmu::vm_fault(unsigned long, exception_frame*)+458>
0x0000000000399773 <page_fault+115>
0x00000000003985f6 <???+3769846>
0x0000000000351954 <elf::object::relocate_rela()+148>
0x0000000000352b87 <elf::object::relocate()+199>
0x00000000003560ea <elf::program::load_object(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<std::shared_ptr<elf::object>, std::allocator<std::shared_ptr<elf::object> > >&)+1866>
0x00000000003569f4 <elf::program::get_library(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, bool)+324>
...

@nyh
Copy link
Contributor Author

nyh commented Oct 21, 2018

I am leaning towards implementing its because:

1. Seems pretty easy

2. OSv aims to be Linux compatible and Linux supports

However given we would want to create a unit test I wonder how one creates non-fPIC 64-bit shared library which would be needed by such test?

Excellent. I think if you can't link the test compiled without -fPIC regularly, the -mcmodel=large which you suggested above, and I tested too, may indeed do exactly what you need.

@wkozaczuk
Copy link
Collaborator

Created related issue oracle/graal#750 on graalvm project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants