-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stdout can't be viewed as job progresses #209
Comments
Josh - do we currently export the API for condor_tail? That should allow dynamic fetching of the stdout. Scott - if there was a standard API from the map to send percent-complete notifications to the submit host, would you be in a place to use it? |
@bbockelm we don't, and I think that's probably the right solution for this specific problem (wanting to look at the raw stdout of a single job). @stsievert , does that describe your use case? Do you only want to use this from the CLI, or do you want to be able to retrieve live stdout programatically? If you'd like to look at stdout/stderr from multiple jobs, you could set the map option Quick brainstorming on a more generic "progress tracker" API: I think we could provide execute-side functions look like this:
and we could do the tqdm trick to wrap it up as an iterator:
and could do |
Yeah, that's perfect. Viewing the last couple lines from the job would work, something along the lines of this: (base) [stsievert@submit2 exp-cifar10]$ htmap tail foo 0
iteration 45 out of 100, loss: 0.015
iteration 46 out of 100, loss: 0.01
My immediate use case is with the CLI: for debugging purposes, I only really care about the output of one job, and don't need the output of many jobs. I can see a def execute(N=100):
for k in range(N):
loss = ...
htmap.update_progress(k, N, msg=f"iter={k}, loss={loss}") |
What's your issue?
I have launched several (supposedly) short jobs. On EC2 with a modern NVIDIA GPU, they take around 40 minutes. I have launched these jobs on HTCondor, and specified a GPU that's less modern. The jobs apparently take at least 120 minutes on this lower capability GPU.
I'd like some idea of the job progress, and am printing some items to stdout to view the progress. So, let's view the output of one of the running jobs:
(base) [stsievert@submit2 exp-cifar10]$ htmap stdout adadamp 0 # hangs...
This hangs indefinitely. This means I can't monitor the progress of any one component; I have to for that component to complete.
What would resolve your issue?
If the jobs stdout could be viewed even if the job wasn't completed.
The text was updated successfully, but these errors were encountered: