-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: improve uproot.futures
compatibility with concurrent.futures
#983
Conversation
I started looking at this in #956, before realising it would be a lot more work than I wanted for that PR. I'm glad you're looking at this now! I felt at the time that it would be helpful for @jpivarski and @ioanaif to weigh in on how important is it to provide backwards compatibility in uproot's executors. Can we break them in a minor version release? If we can't break them, then I think we would want to introduce an adaptor to make the uproot executors look like Regarding the import concurrent.futures
import threading
import functools
import urllib.request
def execute_with_resource(func, args, kwargs, local, create_resource):
try:
resource = local.value
except AttributeError:
resource = local.value = create_resource()
return func(*args, **kwargs, resource=resource)
def submit_with_resource(executor, func, local, create_resource, /, *args, **kwargs):
return executor.submit(
execute_with_resource, func, args, kwargs, local, create_resource
)
def read_n_bytes(resource, n_bytes):
return resource.read(n_bytes)
if __name__ == "__main__":
url = "https://github.com/scikit-hep/scikit-hep-testdata/raw/v0.4.33/src/skhep_testdata/data/uproot-issue121.root"
executor = concurrent.futures.ThreadPoolExecutor()
locals = [
threading.local() for _ in range(executor._max_workers)
] # TODO: abstract this!
open_url = functools.partial(urllib.request.urlopen, url)
tasks = [
submit_with_resource(executor, read_n_bytes, local, open_url, n_bytes=512)
for local in locals
]
for future in concurrent.futures.as_completed(tasks):
print(future.result()) This wouldn't work with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure (don't remember) whether the num_workers
versus max_workers
was a mistake—all the way back to Python 3.5, the name of that argument is max_workers
. However, this implementation of ThreadPoolExecutor
always creates exactly as many workers as requested, so the "num" and "max" concepts happen to be the same. It becomes an implementation detail that the executor never makes fewer than the requested number of workers.
The notify
used to be a lambda function; it only became formalized as a function-generating-a-function because I was adopting that style to have fewer barriers to pickling (even though Sources
can't be pickled, anyway). When a notify
that needs to take an argument is needed, it would have just been a different lambda function in the original scheme. But I like your solution of making it take an optional first argument better. Dignifying it with a docstring makes it more of an established thing than I originally had in mind, but that's fine.
We can change uproot.source.futures
interfaces as we would internal code. They haven't been explicitly marked as private (with underscores), but users should not be using Uproot's reimplementation or modification of standard library features, anyway. It definitely didn't come up in our scans of user code, but if someone has been using them and complains about the change, that's a good chance to point them toward the standard library.
We would, at some point, release a minor version of Uproot with non-public features properly hidden with underscores. The original idea was to make Uproot a Swiss army knife, with useful tools like the ones that get at low-level TBasket details, but now that it's been in production for a while and we've seen how people are using it in the user code scans, we can draw a clearer line between public and private.
Co-authored-by: Angus Hollands <goosey15@gmail.com>
The
uproot.source.chunk.notifier
method is used as a callback for a future but it's not directly compatible with theconcurrent.futures.Future.add_done_callback
api as it does not take afuture
argument. I modified it to add an optional argument so it's compatible (previously one could wrap it in a function to achieve the compatibility, but I think this makes more sense).The
uproot.futures.ThreadPoolExecutor
does not have the same constructor arguments as theconcurrent.futures.ThreadPoolExecutor
. uproot's hasnum_workers
and concurrent's hasmax_workers
instead. I guess when uproot's was created the argument was callednum_workers
then it got renamed (altough its not just a renaming, it means different things).I think this PR can also serve as an opportunity for a discussion regarding the
uproot.futures
module. I understand that it was originally motivated by the need to support Python 2 but now that this is deprecated maybe it makes sense to replace the usage of some of these classes such asuproot.futures.Future
oruproot.futures.ThreadPoolExecutor
by theconcurrent
equivalent?