Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent file locking for HDF5 files #1688

Closed
wants to merge 1 commit into from
Closed

Conversation

amarkpayne
Copy link
Member

@amarkpayne amarkpayne commented Aug 13, 2019

Motivation or Problem

There appears to be an issue with saving HDF5 files, which we now use to save the filter tensors (thanks to @mjohnson541 for discovering this). On remote servers, it is possible to get the following error on current master:

Making seed mechanism...
Traceback (most recent call last):
 File “RMG-Py/rmg.py”, line 185, in <module>
   main()
 File “RMG-Py/rmg.py”, line 179, in main
   rmg.execute(**kwargs)
 File “RMG-Py/rmgpy/rmg/main.py”, line 711, in execute
   self.makeSeedMech(firstTime=True)
 File “RMG-Py/rmgpy/rmg/main.py”, line 1350, in makeSeedMech
   raise e
IOError: Unable to create file (unable to lock file, errno = 37, error message = ‘No locks available’)
dbolsm(). more than (i1)=itmax iterations solving bounded least squares problem.
error number =    22, message level =     1
i1 =      110
/var/spool/slurm/spool/jobxxxx/slurm_script: line 33: deactivate: No such file or directory

I found this issue on the h5py Github, which suggests that this is a problem with the code trying to lock the file when writing, which causes problems for NFS mounted file systems (i.e. most network mounted drives on servers). The only suggested fix is to export and environment variable with export HDF5_USE_FILE_LOCKING=FALSE, though this fix does not appear to work for everyone.

Description of Changes

Use os.environ to set this environment variable whenever rmgpy.rmg.main is imported, which contains the code that saves the filter tensors as HDF5 files. This way, we do not have to tell people to add this on to their job submission scripts.

Testing

@mjohnson541 has confirmed that this fix removed this issue for him

@codecov
Copy link

codecov bot commented Aug 13, 2019

Codecov Report

Merging #1688 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1688      +/-   ##
==========================================
+ Coverage   41.66%   41.66%   +<.01%     
==========================================
  Files         176      176              
  Lines       30215    30216       +1     
  Branches     6256     6256              
==========================================
+ Hits        12588    12589       +1     
  Misses      16703    16703              
  Partials      924      924
Impacted Files Coverage Δ
rmgpy/rmg/main.py 21.9% <100%> (+0.05%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3846fcc...ea80df7. Read the comment docs.

@mliu49
Copy link
Contributor

mliu49 commented Aug 13, 2019

I previously had similar issues with the HDF5 files for the machine learning thermo model, except that it couldn't open them due to no locks being available. At the time, I addressed it by downgrading HDF5 from 1.10 to 1.8 which solved the issue.

I'm wondering what changed about file locking between 1.10 and 1.8, whether it's actually related to NFS, and if there are any downsides to disabling file locks.

@amarkpayne
Copy link
Member Author

Potentially relevant:
https://forum.hdfgroup.org/t/turn-off-file-locking/3809
https://support.hdfgroup.org/HDF5/doc/ADGuide/Changes.html

The last link mentions that HDF5 1.10.0 (the next release after 1.8.16) includes H5FDlock and
H5FDunlock as newly added features. Could this be the file locking feature? Their forums suggest that they did not used to support file locking.

As for if there are any downsides of not using file locking, I believe this is not an issue because our code for this is not in a parallelized section where more than one thread could be trying to write to the file. I could be wrong though.

@amarkpayne
Copy link
Member Author

As discussed offline, this change and adding h5py explicitly as a dependency have been incorporated into #1686. Therefore, this PR is now closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants