Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve uproot.futures compatibility with concurrent.futures #983

Merged
merged 7 commits into from
Oct 12, 2023

Conversation

lobis
Copy link
Collaborator

@lobis lobis commented Oct 10, 2023

  • The uproot.source.chunk.notifier method is used as a callback for a future but it's not directly compatible with the concurrent.futures.Future.add_done_callback api as it does not take a future argument. I modified it to add an optional argument so it's compatible (previously one could wrap it in a function to achieve the compatibility, but I think this makes more sense).

  • The uproot.futures.ThreadPoolExecutor does not have the same constructor arguments as the concurrent.futures.ThreadPoolExecutor. uproot's has num_workers and concurrent's has max_workers instead. I guess when uproot's was created the argument was called num_workers then it got renamed (altough its not just a renaming, it means different things).

I think this PR can also serve as an opportunity for a discussion regarding the uproot.futures module. I understand that it was originally motivated by the need to support Python 2 but now that this is deprecated maybe it makes sense to replace the usage of some of these classes such as uproot.futures.Future or uproot.futures.ThreadPoolExecutor by the concurrent equivalent?

@lobis lobis mentioned this pull request Oct 10, 2023
@agoose77
Copy link
Collaborator

I started looking at this in #956, before realising it would be a lot more work than I wanted for that PR. I'm glad you're looking at this now!

I felt at the time that it would be helpful for @jpivarski and @ioanaif to weigh in on how important is it to provide backwards compatibility in uproot's executors. Can we break them in a minor version release? If we can't break them, then I think we would want to introduce an adaptor to make the uproot executors look like concurrent.futures

Regarding the ResourceThreadPoolExecutors, I didn't get as far as figuring out the best model to align with concurrent.futures. Upon reflection, I think we could try something that effectively functools.partials the func, but with a callable argument factory e.g.:

import concurrent.futures
import threading
import functools
import urllib.request


def execute_with_resource(func, args, kwargs, local, create_resource):
    try:
        resource = local.value
    except AttributeError:
        resource = local.value = create_resource()

    return func(*args, **kwargs, resource=resource)


def submit_with_resource(executor, func, local, create_resource, /, *args, **kwargs):
    return executor.submit(
        execute_with_resource, func, args, kwargs, local, create_resource
    )


def read_n_bytes(resource, n_bytes):
    return resource.read(n_bytes)


if __name__ == "__main__":
    url = "https://github.com/scikit-hep/scikit-hep-testdata/raw/v0.4.33/src/skhep_testdata/data/uproot-issue121.root"

    executor = concurrent.futures.ThreadPoolExecutor()

    locals = [
        threading.local() for _ in range(executor._max_workers)
    ]  # TODO: abstract this!

    open_url = functools.partial(urllib.request.urlopen, url)

    tasks = [
        submit_with_resource(executor, read_n_bytes, local, open_url, n_bytes=512)
        for local in locals
    ]
    for future in concurrent.futures.as_completed(tasks):
        print(future.result())

This wouldn't work with ProcessPoolExecutor, because the threading.local isn't pickleable. But, in any case, I think that kind of executor-agnostic abstraction would be good.

Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure (don't remember) whether the num_workers versus max_workers was a mistake—all the way back to Python 3.5, the name of that argument is max_workers. However, this implementation of ThreadPoolExecutor always creates exactly as many workers as requested, so the "num" and "max" concepts happen to be the same. It becomes an implementation detail that the executor never makes fewer than the requested number of workers.

The notify used to be a lambda function; it only became formalized as a function-generating-a-function because I was adopting that style to have fewer barriers to pickling (even though Sources can't be pickled, anyway). When a notify that needs to take an argument is needed, it would have just been a different lambda function in the original scheme. But I like your solution of making it take an optional first argument better. Dignifying it with a docstring makes it more of an established thing than I originally had in mind, but that's fine.

We can change uproot.source.futures interfaces as we would internal code. They haven't been explicitly marked as private (with underscores), but users should not be using Uproot's reimplementation or modification of standard library features, anyway. It definitely didn't come up in our scans of user code, but if someone has been using them and complains about the change, that's a good chance to point them toward the standard library.

We would, at some point, release a minor version of Uproot with non-public features properly hidden with underscores. The original idea was to make Uproot a Swiss army knife, with useful tools like the ones that get at low-level TBasket details, but now that it's been in production for a while and we've seen how people are using it in the user code scans, we can draw a clearer line between public and private.

Co-authored-by: Angus Hollands <goosey15@gmail.com>
@lobis lobis marked this pull request as ready for review October 11, 2023 17:38
@lobis lobis changed the base branch from main to main-v510 October 12, 2023 13:16
@lobis lobis merged commit 9f5e4c7 into main-v510 Oct 12, 2023
@lobis lobis deleted the futures-api branch October 12, 2023 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants