Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix WSL permisson bug for renaming directories #1796

Merged
merged 3 commits into from
Dec 10, 2019
Merged

Conversation

amarkpayne
Copy link
Member

Motivation or Problem

fixes #1786

As noted in the issue this fixes, the Windows Subsystem for Linux sometimes throws a permission error
when trying to renaming non-empty directories, even when the user has the permissions set properly on the directories and files to do so. This is problematic, because we rename the seed directory folder temporarily every time we save a new seed mechanism so that the previous seed mechanism is not lost in case we encounter an error when saving the new one. Renaming the directory does not require copying large amounts of data, so this is the preferred solution in general, but as noted can cause problems for WSL users.

There appear to be several issues opened over a few years complaining about this issue if I am not mistaken, so it appears to be a reoccurring problem. Given this, it is best to handle this case ourselves.

Description of Changes

A try/except statement was added that avoids renaming the directory in the event of a permission error and instead copies the directory to a new location, which does not throw a permission error as far as I can tell in WSL. Note that this solution is much slower than renaming, so it should only be done as a last resort, which is exactly what we have implemented here.

Testing

See the issue. We have tested this branch to confirm that at the very least the WSL bug goes away. It might be worth double checking that the files are copied over and then later deleted as desired.

@amarkpayne amarkpayne requested review from ajocher and alongd October 30, 2019 03:46
@amarkpayne amarkpayne self-assigned this Oct 30, 2019
@amarkpayne amarkpayne requested review from alongd and removed request for alongd October 30, 2019 03:47
@amarkpayne
Copy link
Member Author

@ajocher do you have time to be the reviewer/merger for this PR? I wanted someone who was familiar with the filter tensor/seed mech stuff. @alongd I am tagging you because I think you have run into similar issues in Windows before.

@codecov
Copy link

codecov bot commented Oct 30, 2019

Codecov Report

Merging #1796 into master will increase coverage by 0.02%.
The diff coverage is 36%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1796      +/-   ##
==========================================
+ Coverage    44.2%   44.22%   +0.02%     
==========================================
  Files          83       83              
  Lines       21529    21539      +10     
  Branches     5644     5644              
==========================================
+ Hits         9517     9526       +9     
+ Misses      10959    10943      -16     
- Partials     1053     1070      +17
Impacted Files Coverage Δ
rmgpy/rmg/main.py 22.12% <36%> (+0.49%) ⬆️
rmgpy/data/statmech.py 42.2% <0%> (ø) ⬆️
rmgpy/rmg/pdep.py 12.21% <0%> (ø) ⬆️
arkane/kinetics.py 12.14% <0%> (ø) ⬆️
rmgpy/data/kinetics/family.py 49.06% <0%> (ø) ⬆️
rmgpy/statmech/ndTorsions.py 59.78% <0%> (ø) ⬆️
rmgpy/yml.py 15.71% <0%> (ø) ⬆️
arkane/sensitivity.py 10% <0%> (ø) ⬆️
rmgpy/data/kinetics/database.py 50.61% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0d63c33...5839d70. Read the comment docs.

@@ -1268,7 +1268,16 @@ def make_seed_mech(self, first_time=False):
if os.path.exists(seed_dir): # This is a seed from a previous RMG run. Delete it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why you delete the seed from a previous RMG run while you move the seed from a previous iteration to a temporary directory in case you run into errors. Probably I am not fully aware how restart and seed works, but what happens if I want to use a seed from a previous RMG run and do not copy it out of the seed folder? Will it be deleted at this point and my seed from the previous RMG run is gone for good? In this case wouldn't it be good to also just move the seed to a temporary directory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! Currently, if the user restarts from the seed in this folder, we make a copy in a new folder called previous_restart so that the seed doesn't get lost. See below.

RMG-Py/rmgpy/rmg/main.py

Lines 488 to 499 in 6483418

# Copy the restart files to a separate folder so that the job does not overwrite it
restart_dir = os.path.join(self.output_directory, 'previous_restart')
core_restart = os.path.join(restart_dir, 'restart')
edge_restart = os.path.join(restart_dir, 'restart_edge')
filters_restart = os.path.join(restart_dir, 'filters')
util.make_output_subdirectory(self.output_directory, 'previous_restart')
shutil.copytree(self.core_seed_path, core_restart)
shutil.copytree(self.edge_seed_path, edge_restart)
os.mkdir(filters_restart)
shutil.copyfile(self.filters_path, os.path.join(filters_restart, 'filters.h5'))
shutil.copyfile(self.species_map_path, os.path.join(filters_restart, 'species_map.yml'))

If the user instead were to run a new RMG job from this folder the seed from the previous job would be lost. This was the behavior before I implemented the restart stuff, but I am willing to consider other options if you think this is worth while.

@@ -1268,7 +1268,16 @@ def make_seed_mech(self, first_time=False):
if os.path.exists(seed_dir): # This is a seed from a previous RMG run. Delete it
shutil.rmtree(seed_dir)
else: # This is a seed from the previous iteration. Move it to a temporary directory in case we run into errors
os.rename(seed_dir, os.path.join(temp_seed_dir))
try:
os.rename(seed_dir, temp_seed_dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered the tempdirectory option? I am curious if it would have advantages in this case. https://docs.python.org/3/library/tempfile.html#tempfile.tempdir

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had not, thanks for the suggestion! It looks interesting, but I am not sure if I can move existing files inside of a temporary directory. Since we already have the previous seed files in a non-temporary place, it should be much faster to rename/move the files (which does not actually move the data on the disk) rather than copying. So if we can move these files into the temporary directory without making a copy then it might be interesting to implement. The only added advantage I think is just the automatic cleanup, but it would be cleaner code.

@amarkpayne
Copy link
Member Author

I have refactored how we handle the first_time code in saving the seed mechanism. I need to verify still that this change doesn't break functionality but I think it is at least ready for a first review. @mliu49 do you also have time to take a quick look?

os.rename(seed_dir, os.path.join(temp_seed_dir))
try:
os.rename(seed_dir, os.path.join(temp_seed_dir))
except PermissionError: # The Windows Subsystem for Linux (WSL) can have problems with renaming
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if the permissions error is with overwriting or renaming? For example, could the issue potentially be resolved by deleting the temp_seed_dir before renaming?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I have seen I believe this is an issue with os.rename itself in the WSL, and occurs regardless of if we are trying to overwrite an existing directory. In fact, in this function the renaming is not being used to overwrite an existing directory. If I can find the links I based this on I will post them here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find the original issues I was looking at, but these links seem relevant:

https://github.com/microsoft/WSL/issues/1529
https://github.com/microsoft/WSL/issues/3395
https://github.com/microsoft/vscode-remote-release/issues/208
https://github.com/Microsoft/WSL/issues/3738
https://code.visualstudio.com/docs/remote/wsl#_i-see-eaccess-permission-denied-error-trying-to-rename-a-folder-in-the-open-workspace
https://github.com/Microsoft/WSL/issues/2097

I am putting these in a code block so that this PR doesn't get linked to all of these issues

with open(os.path.join(self.output_directory, 'restart_from_seed.py'), 'w') as f:
f.write('restartFromSeed(path=\'seed\')\n\n')
with open(self.input_file, 'r') as input_file:
f.write(''.join(input_file.readlines()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you do f.write(input_file.read())?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks for the catch!

Comment on lines 1275 to 1281
causes RMG to make a seed mechanism out of the current chem_annotated.inp and species_dictionary.txt
this seed mechanism is outputted in a seed folder within the run directory and automatically
added to as the (or replaces the current) 'Seed' thermo and kinetics libraries in database

`initialize_seed_mech` should be called once before this function is ever called.

This also writes the filter tensors to the `filters` sub-folder for restarting an RMG job from a seed mechanism
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you reword this docstring? It's written a bit weirdly and isn't really accurate.

I think it should start with "This method saves a seed mechanism..." or "Save a seed mechanism...". It is also not generating it from the chem_annotated.inp and species_dictionary.txt but rather the actual reaction model.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated! I should have updated this the last time I modified this function in a PR

WSL can occasionally throw permission errors when trying to rename non-empty directories, even when the user owns all of the files and directories.
Move all first_time only functions to a separate function that is called during the job initialization. This improves performance and readability
@mliu49
Copy link
Contributor

mliu49 commented Dec 10, 2019

I'm not sure if this is related to the changes in this PR, but I just tested with the superminimal example, and running the restart job leads to the following error:

Traceback (most recent call last):
  File "../../rmg.py", line 111, in <module>
    main()
  File "../../rmg.py", line 105, in main
    rmg.execute(**kwargs)
  File "/home/mjliu/Code/RMG-Py/rmgpy/rmg/main.py", line 651, in execute
    self.initialize(**kwargs)
  File "/home/mjliu/Code/RMG-Py/rmgpy/rmg/main.py", line 615, in initialize
    self.initialize_reaction_threshold_and_react_flags()
  File "/home/mjliu/Code/RMG-Py/rmgpy/rmg/main.py", line 1562, in initialize_reaction_threshold_and_react_flags
    unimolecular_threshold_restart = f.get('unimolecular_threshold').value
AttributeError: 'NoneType' object has no attribute 'value'

@amarkpayne
Copy link
Member Author

I'm not sure if this is related to the changes in this PR, but I just tested with the superminimal example, and running the restart job leads to the following error:

Traceback (most recent call last):
  File "../../rmg.py", line 111, in <module>
    main()
  File "../../rmg.py", line 105, in main
    rmg.execute(**kwargs)
  File "/home/mjliu/Code/RMG-Py/rmgpy/rmg/main.py", line 651, in execute
    self.initialize(**kwargs)
  File "/home/mjliu/Code/RMG-Py/rmgpy/rmg/main.py", line 615, in initialize
    self.initialize_reaction_threshold_and_react_flags()
  File "/home/mjliu/Code/RMG-Py/rmgpy/rmg/main.py", line 1562, in initialize_reaction_threshold_and_react_flags
    unimolecular_threshold_restart = f.get('unimolecular_threshold').value
AttributeError: 'NoneType' object has no attribute 'value'

This probably means that the filters were never saved. Which iteration was the restart file from (i.e. when you ran the super-minimal job the first time, how many iterations did you let it go through before you stopped the job?)

I had tested this on the minimal example and it seemed to work, so I am surprised that the super-minimal job is having this error. I might need to re-think how I test the restart functionality

@amarkpayne
Copy link
Member Author

Quick update: I am able to replicate the issue on my end. I ran the job to completion and then restarted the job, which should work. I am looking into what is going on now.

@amarkpayne
Copy link
Member Author

I think I have figured it out. The super-minimal job does not use filtering, so the filters are never saved. This should be handled by the try/except block in 1561-1593 of main.py, but the error thrown is an AttributeError instead of a key error. I am pretty sure that this would have been tested when we did the original restart PR, but perhaps it was overlooked. Additionally, it is possible that the behavior of h5py has changed so that it throws a different error.

Either way, this issue is not a result of this PR, but it should be an easy fix. @mliu49 would you prefer me to include the necessary changes in this PR or a separate one? I am planning on adding a commit with the fix, along with some functional tests for restarting the minimal job and the super-minimal job

@mliu49
Copy link
Contributor

mliu49 commented Dec 10, 2019

Please open a separate PR. If you've confirmed that this PR does not change behavior, then I think we can go ahead and merge this.

@amarkpayne
Copy link
Member Author

I just checked current master and can confirm that this issue also affects it, so this error is not a result of the changes in this PR. I'll open up an issue and a separate PR to fix this. Thanks for the catch @mliu49 !

@mliu49 mliu49 merged commit cafae8d into master Dec 10, 2019
@mliu49 mliu49 deleted the wsl_permisson_bug branch December 10, 2019 18:51
This was referenced Dec 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WSL permission denied when renaming seed folder
3 participants