-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let uncurl
download progress reach git-annex
#328
Comments
uncurl
uncurl
download progress reach git-annex
nevermind -- figured myself out an example |
My interim conclusion after the devcall discussion today is to amend the special remote super main used in This would be an approach that minimizes undesired interactions with other components. It makes the adjust only in a special remote process, and it does not require any yet-to-be-invented approach to bypass git-annex. There is also a reduced likelihood for more than one progress reporting to be ongoing in such a process. |
FWIW uncurl does progress_id = self._get_progress_id(from_url, to_path)
...
self._progress_report_start(
progress_id,
('Download %s to %s', from_url, to_path),
'downloading',
expected_size,
)
...
for chunk in requests_tee(r, fp):
self._progress_report_update(
progress_id, ('Downloaded chunk',), len(chunk)) whenever datalad special remote pbar = ui.get_progressbar(label=url, fill_text=filepath, total=target_size)
...
for chunk in stream:
...
pbar.update(total) so,
in a loop. I tried to dig deeper into logger based progress bar handling but my foo was not good enough for a rapid Aha moment. |
Here is sketch of how the any progress logging compliant with http://docs.datalad.org/en/stable/design/progress_reporting.html can be reported to git-annex within a special remote process. diff --git a/datalad/customremotes/main.py b/datalad/customremotes/main.py
index a2931ff04..d1a522c8c 100644
--- a/datalad/customremotes/main.py
+++ b/datalad/customremotes/main.py
@@ -45,6 +45,48 @@ def setup_parser(remote_name, description):
return parser
+def only_progress_logrecords(record):
+ return hasattr(record, 'dlm_progress')
+
+
+class AnnexProgressHandler(logging.Handler):
+ def __init__(self, annexremote):
+ super().__init__()
+ self.annexremote = annexremote
+ self._ptrackers = {}
+
+ def close(self):
+ self._ptrackers = {}
+ super().close()
+
+ def emit(self, record):
+ if not hasattr(record, 'dlm_progress'):
+ # a filter should have been used to prevent this call
+ return
+
+ maint = getattr(record, 'dlm_progress_maint', None)
+ if maint in ('clear', 'refresh'):
+ return
+ pid = getattr(record, 'dlm_progress')
+ update = getattr(record, 'dlm_progress_update', None)
+ if pid not in self._ptrackers:
+ # this is new
+ prg = getattr(record, 'dlm_progress_initial', 0)
+ self._ptrackers[pid] = prg
+ self.annexremote.send_progress(prg)
+ elif update is None:
+ # not an update -> done
+ self._ptrackers.pop(pid)
+ else:
+ prg = self._ptrackers[pid]
+ if getattr(record, 'dlm_progress_increment', False):
+ prg += update
+ else:
+ prg = update
+ self._ptrackers[pid] = prg
+ self.annexremote.send_progress(prg)
+
+
def _main(args, cls):
"""Unprotected portion"""
assert(cls is not None)
@@ -52,6 +94,15 @@ def _main(args, cls):
master = Master()
remote = cls(master)
master.LinkRemote(remote)
+
+ # we add an additional handler to the logger to deal with
+ # progress reports
+ dlroot_lgr = logging.getLogger('datalad')
+ phandler = AnnexProgressHandler(remote)
+ phandler.addFilter(only_progress_logrecords)
+ dlroot_lgr.addHandler(phandler)
+
+ # run the remote
master.Listen()
# cleanup
if hasattr(remote, 'stop'): Add dedicated log handler is attached to both the datalad "root" logger and the special remote class instance. The handler gets a filter to only makes it see progress log records (this is something that also the main progress handler should be doing). In the generic This sketch is limited/incomplete in that it assumes only a single progress tracker to be used (although it is already set up to support more than one). So in case of multiple concurrent trackers, it would send rubbish progress logs to git-annex. That being said, with this change we see git-annex native progress reporting with |
Add dedicated log handler is attached to both the datalad "root" logger and the special remote class instance. The handler gets a filter to only makes it see progress log records (this is something that also the main progress handler should be doing). In the generic main() of special remotes, this log setup modifications is added as an additional log output -- no interference with any existing setup. This sketch is limited/incomplete in that it assumes only a single progress tracker to be used (although it is already set up to support more than one). So in case of multiple concurrent trackers, it would send rubbish progress logs to git-annex. That being said, with this change we see git-annex native progress reporting with git annex get and datalad native progress reporting with datalad get. Closes datalad#328
I am pulling a file via https and
datalad get
from the ICF data store (800M). Not a single progress update.Pulling the same file from the same URL with
datalad download
(notdownload-url
!) works fine and shows progress.This is a serious issue, because progress reporting is necessary to avoid triggering annex stall detection -- which could easily happen for large-file downloads.
The text was updated successfully, but these errors were encountered: