Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pydarshan can't load libdarshan-util-*.so from pypi on Cori&Perlmutter (NERSC/LBL) #656

Closed
bebosudo opened this issue Feb 14, 2022 · 10 comments

Comments

@bebosudo
Copy link

Installing pydarshan from pypi on Cori or Perlmutter causes pydarshan to fail at import time:

[chiusole@cori10:cori]$ python3 -c 'import darshan'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/global/homes/c/chiusole/.local/lib/python3.6/site-packages/darshan/__init__.py", line 19, in <module>
    from darshan.report import DarshanReport
  File "/global/homes/c/chiusole/.local/lib/python3.6/site-packages/darshan/report.py", line 10, in <module>
    import darshan.backend.cffi_backend as backend
  File "/global/homes/c/chiusole/.local/lib/python3.6/site-packages/darshan/backend/cffi_backend.py", line 47, in <module>
    libdutil = find_utils(ffi, libdutil)
  File "/global/homes/c/chiusole/.local/lib/python3.6/site-packages/darshan/discover_darshan.py", line 216, in find_utils
    raise RuntimeError('Could not find libdarshan-util.so! Is darshan-util installed? Please ensure one of the the following: 1) export LD_LIBRARY_PATH=<path-to-libdarshan-util.so>, or 2) darshan-parser can found using the PATH variable, or 3) pkg-config can resolve pkg-config --path darshan-util, or 4) install a wheel that includes darshan-utils via pip.')
RuntimeError: Could not find libdarshan-util.so! Is darshan-util installed? Please ensure one of the the following: 1) export LD_LIBRARY_PATH=<path-to-libdarshan-util.so>, or 2) darshan-parser can found using the PATH variable, or 3) pkg-config can resolve pkg-config --path darshan-util, or 4) install a wheel that includes darshan-utils via pip.

This can be solved by manually providing the path to the libdarshan-util-.so and libz-.so* library in LD_LIBRARY_PATH:

[chiusole@cori10:cori]$ LD_LIBRARY_PATH="$HOME/.local/lib/python3.6/site-packages/darshan.libs/:$LD_LIBRARY_PATH" python3 -c 'import darshan'; echo $?
0

The problem seems to be around this block:

if libdutil is None:
try:
darshan_path = discover_darshan_wheel()
import glob
library_path = glob.glob(f'{darshan_path}/libdarshan-util*.so')[0]
logger.debug(f"Attempting library_path={library_path} in case of binary wheel.")
save = os.getcwd()
os.chdir(darshan_path)
libdutil = ffi.dlopen(library_path)
os.chdir(save)
except:
libdutil = None
if libdutil is None:

By adding a breakpoint at line 195 I can see that ffi.dlopen is complaining about not finding libz-eb09ad1d.so.1.2.3:

[chiusole@cori10:cori]$ python3 -c 'import darshan'
> /global/homes/c/chiusole/.local/lib/python3.6/site-packages/darshan/discover_darshan.py(196)find_utils()
-> libdutil = ffi.dlopen(library_path)
(Pdb) l
191  	            library_path = glob.glob(f'{darshan_path}/libdarshan-util*.so')[0]
192  	            logger.debug(f"Attempting library_path={library_path} in case of binary wheel.")
193  	            save = os.getcwd()
194  	            os.chdir(darshan_path)
195  	            import pdb; pdb.set_trace()
196  ->	            libdutil = ffi.dlopen(library_path)
197  	            os.chdir(save)
198  	        except:
199  	            libdutil = None
200
201  	    if libdutil is None:
(Pdb) library_path
'/global/u1/c/chiusole/.local/lib/python3.6/site-packages/darshan.libs/libdarshan-util-0ca4caca.so'
(Pdb) next
OSError: cannot load library '/global/u1/c/chiusole/.local/lib/python3.6/site-packages/darshan.libs/libdarshan-util-0ca4caca.so': libz-eb09ad1d.so.1.2.3: cannot open shared object file: No such file or directory.  Additionally, ctypes.util.find_library() did not manage to locate a library called '/global/u1/c/chiusole/.local/lib/python3.6/site-packages/darshan.libs/libdarshan-util-0ca4caca.so'

Even though it's in the same directory of libdarshan-util-0ca4caca.so:

[chiusole@cori10:cori]$ ll $HOME/.local/lib/python3.6/site-packages/darshan.libs/{libdarshan-util-0ca4caca.so,libz-eb09ad1d.so.1.2.3}
-rwxrwx--x 1 chiusole chiusole 530K Feb 14 15:18 /global/homes/c/chiusole/.local/lib/python3.6/site-packages/darshan.libs/libdarshan-util-0ca4caca.so*
-rwxrwx--x 1 chiusole chiusole  90K Feb 14 15:18 /global/homes/c/chiusole/.local/lib/python3.6/site-packages/darshan.libs/libz-eb09ad1d.so.1.2.3*

This is not related to conda: all commands above are executed using the system Python 3.6 on Cori, here are the steps:

% ssh cori
$ python3 -m ensurepip --user
$ python3 -m pip install --user -U pip
$ python3 -m pip -V
pip 21.3.1 from /global/homes/c/chiusole/.local/lib/python3.6/site-packages/pip (python 3.6)

$ python3 -m pip install --user darshan
$ python3 -c 'import darshan'

On a different linux box, running Centos 7.3 and python 3.6, darshan from pypi works fine instead:

$ python3 -m pip install --user darshan
$ python3 -c 'import darshan'
> /home/c/chiusole/.local/lib/python3.6/site-packages/darshan/discover_darshan.py(196)find_utils()
-> libdutil = ffi.dlopen(library_path)
(Pdb) library_path
'/home/c/chiusole/.local/lib/python3.6/site-packages/darshan.libs/libdarshan-util-0ca4caca.so'
(Pdb) next
/home/c/chiusole/.local/lib/python3.6/site-packages/darshan/discover_darshan.py(197)find_utils()
(Pdb) libdutil
<cffi.api._make_ffi_library.<locals>.FFILibrary object at 0x7fffe9b92160>

Cori and Perlmutter are both running SLES 15. Let me know if you need other details.
CC: @glennklockwood

@kevin-harms
Copy link

So, the base reason is that this code is directly opening the libdarshan-util.so library but it depends on the libz. This library is not loaded directly, thus it relies on the loader to find it, either with LD_LIBRARY_PATH, ld.so.conf or PYTHONPATH or some other means.

@kevin-harms
Copy link

@bebosudo Can you provide an ldd of the libdarshan-util-0ca4caca.so file?

@bebosudo
Copy link
Author

Hi @kevin-harms, here it's a ldd on both .so files from pypi:

[chiusole@cori12:cori]$ ldd ~/.conda/envs/pydarshantest/lib/python3.9/site-packages/darshan.libs/*
~/.conda/envs/pydarshantest/lib/python3.9/site-packages/darshan.libs/libdarshan-util-0ca4caca.so:
	linux-vdso.so.1 (0x00002aaaaaad3000)
	libz-eb09ad1d.so.1.2.3 => not found
	libc.so.6 => /lib64/libc.so.6 (0x00002aaaaaf04000)
	/lib64/ld-linux-x86-64.so.2 (0x00002aaaaaaab000)

~/.conda/envs/pydarshantest/lib/python3.9/site-packages/darshan.libs/libz-eb09ad1d.so.1.2.3:
	linux-vdso.so.1 (0x00002aaaaaad3000)
	libc.so.6 => /lib64/libc.so.6 (0x00002aaaaaeea000)
	/lib64/ld-linux-x86-64.so.2 (0x00002aaaaaaab000)

@tylerjereddy
Copy link
Collaborator

tylerjereddy commented Feb 15, 2022

auditwheel doesn't seem too upset with the darshan wheels, although, I don't think we're actually 2010 standard compliant, which may cause some issues.. i.e.,

auditwheel show darshan-3.3.1.0-cp36-cp36m-manylinux2010_x86_64.whl


darshan-3.3.1.0-cp36-cp36m-manylinux2010_x86_64.whl is consistent with
the following platform tag: "manylinux_2_5_x86_64".

The wheel references external versioned symbols in these
system-provided shared libraries: libc.so.6 with versions
{'GLIBC_2.3.4', 'GLIBC_2.2.5', 'GLIBC_2.4'}

The following external shared libraries are required by the wheel:
{
    "libc.so.6": "/lib/x86_64-linux-gnu/libc-2.27.so",
    "libpthread.so.0": "/lib/x86_64-linux-gnu/libpthread-2.27.so"
}

2_5 is an alias for manylinux1 according to the standard here: https://www.python.org/dev/peps/pep-0600/#legacy-manylinux-tags

Let me check if a SciPy wheel that claims to be 2014 compliant matches the expected 2_17 alias:

auditwheel show scipy-1.8.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

scipy-1.8.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
is consistent with the following platform tag:
"manylinux_2_17_x86_64".

I can reproduce the issue locally ldd darshan.libs/libdarshan-util-0ca4caca.so:

	linux-vdso.so.1 (0x00007ffdfd3e9000)
	libz-eb09ad1d.so.1.2.3 => not found
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7a8d892000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f7a8deb4000)

It looks like vendoring libz doesn't really get you much with the manylinux standard: pypa/auditwheel#152

And it appears that later versions of the standard do allow linking instead of vendoring perhaps: pypa/auditwheel#334

See also i.e., : pypa/auditwheel#161

If I try to repair our wheel:

INFO:auditwheel.main_repair:Repairing darshan-3.3.1.0-cp36-cp36m-manylinux2010_x86_64.whl
Traceback (most recent call last):
  File "/home/tyler/venv_py_389/bin/auditwheel", line 8, in <module>
    sys.exit(main())
  File "/home/tyler/venv_py_389/lib/python3.8/site-packages/auditwheel/main.py", line 59, in main
    rval = args.func(args, p)
  File "/home/tyler/venv_py_389/lib/python3.8/site-packages/auditwheel/main_repair.py", line 161, in execute
    out_wheel = repair_wheel(
  File "/home/tyler/venv_py_389/lib/python3.8/site-packages/auditwheel/repair.py", line 74, in repair_wheel
    raise ValueError(
ValueError: Cannot repair wheel, because required library "libz-eb09ad1d.so.1.2.3" could not be located

But this seems to work:

export LD_LIBRARY_PATH=/home/tyler/darshan.libs:$LD_LIBRARY_PATH
 auditwheel repair darshan-3.3.1.0-cp36-cp36m-manylinux2010_x86_64.whl

INFO:auditwheel.main_repair:Repairing darshan-3.3.1.0-cp36-cp36m-manylinux2010_x86_64.whl
INFO:auditwheel.wheeltools:Previous filename tags: manylinux2010_x86_64
INFO:auditwheel.wheeltools:New filename tags: manylinux2010_x86_64, manylinux_2_5_x86_64, manylinux1_x86_64
INFO:auditwheel.wheeltools:Previous WHEEL info tags: cp36-cp36m-manylinux2010_x86_64
INFO:auditwheel.wheeltools:New WHEEL info tags: cp36-cp36m-manylinux2010_x86_64, cp36-cp36m-manylinux_2_5_x86_64, cp36-cp36m-manylinux1_x86_64
INFO:auditwheel.main_repair:
Fixed-up wheel written to /home/tyler/wheelhouse/darshan-3.3.1.0-cp36-cp36m-manylinux2010_x86_64.manylinux_2_5_x86_64.manylinux1_x86_64.whl

unzip darshan-3.3.1.0-cp36-cp36m-manylinux2010_x86_64.manylinux_2_5_x86_64.manylinux1_x86_64.whl

ldd darshan.libs/libdarshan-util-0ca4caca.so

	linux-vdso.so.1 (0x00007ffd75fec000)
	libz-eb09ad1d-39c737f1.so.1.2.3 => /home/tyler/wheelhouse/darshan.libs/./libz-eb09ad1d-39c737f1.so.1.2.3 (0x00007fecc0977000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fecc0586000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fecc0dc2000)

So, the auditwheel repair fix appears to work at the location of the delivery if we really need to do that (it will use patchelf).

Some conclusions might be:

  • we probably want to check more carefully/enforce more formally that our shipped binaries match the manylinux standard version that they are labelled to; this might happen in CI with i.e., https://cibuildwheel.readthedocs.io/en/stable/ ; @jakobluettgau and I discussed this briefly before--it would take some bandwidth to set up though
  • even in a less formal wheel build pipeline, it probably wouldn't be that hard to have some kind of CI check to verify that ldd resolves everything it needs from an unpacked wheel binary
  • if NumPy and SciPy, at the base of the ecosystem, have bumped up to manylinux2014 (2_17 standard), I can't think of a reason that we shouldn't do that as well (and it may come with libz benefits noted above, but would need to double check).

@kevin-harms
Copy link

So do we need to link libdarshan-util with -rpath so that it can be repaired during installation? Is there some general support in pypi deployment to handle loading of non-system library dependencies?

@tylerjereddy
Copy link
Collaborator

Normally you should vendor non-system libs, which we already do for libz. I think auditwheel should be setting RPATH.

When I look at what other libs do for their shared objects that depend on vendored libs, for example SciPy points shared objects at the scipy.libs folder in site-packages:

objdump -x ./linalg/_fblas.cpython-38-x86_64-linux-gnu.so | grep RPATH

RPATH $ORIGIN/../../scipy.libs

If we look at the result of my detailed auditwheel workflow above that repairs the missing link, we can see the new path looks sensible: objdump -x libdarshan-util-0ca4caca.so | grep RPATH

RPATH $ORIGIN/.

(libz is right there in the same folder, so looks good..)

While the original version we ship without auditwheel repair..:

objdump -x libdarshan-util-0ca4caca.so | grep RPATH

Produces no RPATH result whatsoever.

So, to me the conclusion is the same--we should follow standard/formal Python wheel build procedures more closely and RPATH will get set to an appropriate path relative to ORIGIN that will resolve the libz we ship with the library. We already have darshan-util/pydarshan/devel/build-wheels.sh that uses auditwheel repair, but I don't think our release process is standardized on i.e., a CI service, so I can't really comment on what did/did not actually happen at release time.

Do other Python libraries do this? Yes--here is the auditwheel repair line used by CI for much of the scientific python wheel building ecosystem: https://github.com/multi-build/multibuild/blob/devel/manylinux_utils.sh#L24

And, yes, cibuildwheel that Jakob and I discussed also uses auditwheel per the docs: https://cibuildwheel.readthedocs.io/en/stable/

@shanedsnyder
Copy link
Contributor

Thanks to @jakobluettgau for digging into this in #665. We've updated our wheels for PyDarshan to be based on a newer manylinux version that seems to prefer system libz. He and I have both tested and things appear to be working with these changes.

@bebosudo, would you mind trying out the new updates to ensure we've got this cleaned up. We temporarily put the release on test.pypi so you can test out before we release a new version with the changes. You should be able to pip install as follows:

pip install -i https://test.pypi.org/simple/ darshan==3.3.1.1

Then maybe just make sure you aren't seeing anything bomb when importing darshan?

@bebosudo
Copy link
Author

I confirm that build 3.3.1.1 from the test pypi repo works:

$ python3 -m ensurepip --user
$ python3 -m pip install -U --user pip
$ python3 -m pip -V
pip 21.3.1 from /global/homes/c/chiusole/.local/lib/python3.6/site-packages/pip (python 3.6)

$ python3 -m pip install darshan==3.3.1.0
$ python3 -m pip install -i https://test.pypi.org/simple/ darshan==3.3.1.1
$ python3
Python 3.6.12 (default, Nov 25 2020, 20:33:10) [GCC] on linux
>>> import darshan
>>> report = darshan.DarshanReport('/path/to/just-a-log-file.darshan', read_all=False)
>>> report.read_all_generic_records()
>>> darshan.enable_experimental()
>>> report.summarize()
>>> report.summary
{'agg_ioops': {'STDIO': {'STDIO_OPENS': 1, 'STDIO_FDOPENS': 0, ...}}}
>>> ^D

$ ldd ~/.local/lib/python3.6/site-packages/darshan.libs/libdarshan-util-9e5e50e2.so.0.0.0
	linux-vdso.so.1 (0x00002aaaaaad3000)
	libz.so.1 => /lib64/libz.so.1 (0x00002aaaaacd3000)
	libc.so.6 => /lib64/libc.so.6 (0x00002aaaaaeea000)
	/lib64/ld-linux-x86-64.so.2 (0x00002aaaaaaab000)

Does this mean that now pydarshan depends on libz being provided by the system?

@tylerjereddy
Copy link
Collaborator

Does this mean that now pydarshan depends on libz being provided by the system?

That sounds right, or rather that is the decision the Python packaging ecosystem appears to have made for the newer wheel standard the team is testing there. I believe the reason is that there's a tendency to ignore the vendored version of libz, even when packaged right in the wheel because of typical preload settings, so it was a bit useless anyway if you believe the upstream discussions.

shanedsnyder added a commit that referenced this issue Mar 23, 2022
…ad-libdarshan-util

Update PyDarshan wheel building process to manylinux2014 (fix for #656)
@shanedsnyder
Copy link
Contributor

Just double checked to make sure this is still resolved for our latest 3.4.0.0 release of pydarshan, and all looks well:

ssnyder@cori08:~/software/darshan/darshan-dev> python3 -m pip install darshan         
Defaulting to user installation because normal site-packages is not writeable
Collecting darshan
  Downloading darshan-3.4.0.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (876 kB)
...
Successfully installed darshan-3.4.0.0 mako-1.1.6 scipy-1.5.4 seaborn-0.11.2
ssnyder@cori08:~/software/darshan/darshan-dev> python3 -c 'import darshan'
ssnyder@cori08:~/software/darshan/darshan-dev> ldd ~/.local/lib/python3.6/site-packages/darshan.libs/libdarshan-util-4b1b7d69.so.0.0.0 
	linux-vdso.so.1 (0x0000155555551000)
	libz.so.1 => /lib64/libz.so.1 (0x0000155555117000)
	libc.so.6 => /lib64/libc.so.6 (0x0000155554d5c000)
	/lib64/ld-linux-x86-64.so.2 (0x000015555532e000)

I'm going to close this for now, but please let us know if you have more issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants