Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change how user "debug" files in debug_files.tar.gz are handled #8719

Open
belforte opened this issue Sep 26, 2024 · 1 comment
Open

change how user "debug" files in debug_files.tar.gz are handled #8719

belforte opened this issue Sep 26, 2024 · 1 comment

Comments

@belforte
Copy link
Member

belforte commented Sep 26, 2024

currently we put in S3 cache debug_files.tar.gz which contains user's config and script exe to help operators debug problems and to create a new task to submit in crab recovery

But in order to make those files visible in the UI, we need an URL which points to the text version. We currently do this by placing files in WEB_DIR/debug and fetching via scheduler.

Also for historical reasons the TW sends to the scheduler both the tarball and the explicit files, which is a duplication of content.

Can we do something better, simpler and easier to document and understand ?

Since those files are small, the duplication in sending from TW to scheduler is not a big worry, but confusion and lack of documentation is bad.

@belforte
Copy link
Member Author

move my thoughts here from #8699 (comment)

about debug directory in tarball.
the reason to expand debug_files.tar.gz in DagmanCreator is to make the files available to the CRABServerUI

def extractMonitorFiles(self, inputFiles, **kw):

so that in the UI they can be fetched from the scheduler [1]
OTOH this could be done in AdjustSites when WEB_DIR is prepared instead of both copying there debug_files.tar.gz and creating a symlink for debug/ directory
## Copy the debug folder. It might not be available if an older (<3.3.1607) crabclient is used.
if os.path.isfile(os.path.join(".", "debug_files.tar.gz")):
shutil.copy2(os.path.join(".", "debug_files.tar.gz"), os.path.join(path, "debug_files.tar.gz"))
## Make all the necessary symbolic links in the web directory.
sourceLinks = ["debug",

And leave for a future optimization to simply put files in S3, like we do for twlog. w/o creating the debug_files.tar.gz. That will require changes to crab recovery and crab getsandbox too, besides of course task_info.js where files will be downloaded from S3, not the scheduler.

Having a tar saves on repeated calls to crabserver (to get preapproved url's) and S3 from the client. The idea was that when CRABClient is executed from sites with a large RTT, the fewer http calls, the better.
Maybe do this in the TW ? DagmanCreator could upload the files to S3 instead of adding to InputFiles.tar.gz.

Maybe better to settle on the final solution, and do all changes in one go. Avoid putting code there "for a few months" which is hard to remove.

[1]

function displayConfigAndPSet(errHandler) {
if (userWebDir === "") {
errHandler(new TaskInfoUndefinedError());
return;
} else if (userWebDir === "None") {
// If user webdir wasn't created at all
errHandler(new UserWebDirUndefinedError());
return;
} else if (proxiedWebDirUrl === "") {
// In case proxy api returned empty or failed
// Set links, show error and don't load anything else.
$("#task-config-link").attr("href", userWebDir + "/debug_files.tar.gz");
$("#task-pset-link").attr("href", userWebDir + "/debug_files.tar.gz");
errHandler(new ProxyNotFoundErrorError);
return;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant