-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems when instrumenting MPI applications with HDF5 at runtime #989
Comments
Could you try our latest release (3.4.5) and see if you still have the issue? We reworked something in our HDF5 module that I think may resolve this issue. |
I repeated the installation process as described above and now a different but similar error occurred:
|
Hmm, maybe there's something still not quite right with how Darshan's HDF5 module interacts with HDF5 libraries at runtime. We've seen similar issues that we've tried to address in recent releases, but maybe need to rethink things again.. I'll see if I can reproduce this with DLIO and think more about it. I think you could probably avoid the issue entirely by modifying your setting of LD_PRELOAD to additionally reference the HDF5 library : |
Thanks for your help! |
So, DLIO installs h5py, which compiles h5py with a specific HDF5 lib. Plus you compile darshan with a specific HDF5 lib. I suspect the h5py version of HDF5 and the version darshan wants might be different. Causing this issue. Docs How to make sure u install h5py with correct hdf5. The main idea is to make sure what h5py was compiled with matches the darshan. |
I updated h5py as per the link you provided and reinstalled DLIO, then the version of HDF5 used by the package updated to the one used by Darshan:
However, in spite of this, the error persisted:
|
does ldd on libdarshan.so show hdf5 so and if so is it the same as one u need. If not, u can ldpreload the hdf5 so as well before darshan.so |
No, there is no hdf5 in the ldd output:
However, after adding the path to HDF5 in |
I think if u compile darshan with hdf5 the so should be linked to darshan. Maybe it is still a bug. @shanedsnyder thoughts? |
I'll have to dig into it more, but you may be on to something @hariharan-devarajan -- some improper linking of HDF5 could be leading to this error. It is a little tricky though, in that we really don't want the HDF5 library Darshan is using to override what the user wants. E.g., if Darshan was built against a 1.12.x version of HDF5, but the user is trying to build an app against a newer 1.14.x version, then we obviously need to be careful that the 1.12.x libraries aren't used at runtime. I think that's part of the reason that I'll leave the issue open so I don't forget to investigate. In the meantime, being careful to set |
Additionally, consider incorrect linking at runtime. I think u need ABI compatibility using libtool to ensure they match. In general if you use the c interface of HDF5 mismatch of version wont screw up things but I think u should be linking darshan with the one it compiled with otherwise, it confuses people of what version is needed (or was compiled with) by darshan. HDF5 also has macros as I remember to make sure u do a check at runtime as well. I believe this would need some work to make sure the stack has a consistent view of the libraries to be loaded/needed. |
When attempting to instrument DLIO at runtime as follows:
I get the following error:
I installed Darshan as follows:
And in the output I got:
Which means that during installation HDF5 is recognized by the installer (otherwise how would it know the version?)
Next is the output of
ldd libdarshan.so
, which may prove useful:I will note that running DLIO + HDF5 without Darshan does not cause any problems:
I also tried running Darshan with a simple program using HDF5 (code here) and had no problems doing so. So the issue may be related to the fact that Darshan does not track H5FDperform_init.
The text was updated successfully, but these errors were encountered: