-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broken symlinks for run_dir cause rt.sh to not remove symlink and error out when trying to create new link. #2148
Comments
Hey, @BrianCurtis-NOAA , I tried to replicate this on Hercules. I forced a failure first, so the run_dir directory with symlink was in the test/ directory, then ran another test using ./rt.sh -a epic -e -l rt.conf . The new run seemed to create a new run_dir with symlink , then once completed I see the removal of run_dir, with the symlink successfully deleted afterwards. Is this the expected sequence of events or am I missing something? |
Run a test and then move the actual rt_XX to a different name. Then try to re-run the rt.sh. |
Ok, I think I replicated the error.
|
I still see this error:
What's the purpose of this run_dir? We didn't have that directory until recently and everything worked just fine. |
It's a link in the ufs-weather-model/tests dir directly to the FV3_RT/rt_##### dir The rt.sh is supposed to remove that and create a new one on each call to rt.sh Lines 1043 to 1048 in 47c0099
Any idea why it didn't seem to. |
I see it's a link to rt_####### directory but why it is there, why do we need it, how is it used? |
It's here to provide dev's a direct route to all test outputs (err, out etc..). It's convenience, IMO. Why shouldn't we have it, why should we be concerned it's there? If it's because we can error out rt.sh, then I can fix the code once we figure out why it didn't remove the run_dir. Maybe the combination of -L and -d are not quite right? |
Then why, for dev's convenience, we do not have a direct route to the baseline directory or input data directory or spack module files used etc. It's not necessary, it complicates scripts, it introduces potential errors. Like this one for example. |
the rt_###### dir changes each run, and there's tons of information there when things fail. There's no need for a dev to be in any of those other places you mention when things fail. I disagree that we should ignore a convenient tool because we're worried it might cause an error now and then. |
We can add a python script to safely handle directory sym link issue:
|
Why do we need python to remove a symbolic link? How did people remove symbolic links before there was python. Simple rm will do the job:
Please do not complicate things even further. |
@zach1221 We can go with Dusan's solution. Using rm -rf is generally unsafe, with symlinks it won't follow into those so it shouldn't pose a problem here. You will want to remove the if sections of the code and just keep Dusan's part. |
Why is rm -rf unsafe? |
Its necessary in code to use -rf like this, but in general I don't like using rm -rf with env vars in case the env var mysteriously gets changed before reaching that point and a directory gets removed on accident. |
@BrianCurtis-NOAA right, there is a chance to delete source directory with -rf when target link is deleted. |
If it matters, the reason I had the if statements was for just this reason. It's safe to use just rm because there's little risk if an env var gets changed, nothing resursive is done. I was trying to verify that it was a directory and a symlink before trying to use rm to be extra safe. |
So should I still add it? @BrianCurtis-NOAA you mean remove the below from lines 1044 - 1046? |
Yeah, no need to keep that if/fi section. |
I do not understand this. Are you saying that this line: rm -rf "${PATHRT}/run_dir" can remove source directory? Can you give exact sequence of commands how this can happen. |
Recommended workaround to this issue was committed with WM PR #2213 . |
Description
Broken symlinks for run_dir cause rt.sh to not remove symlink and error out when trying to create new link.
To Reproduce:
Fix?
Try using -L check along with -e to weed out if the symlink is broken.
The text was updated successfully, but these errors were encountered: