Skip to content
This repository has been archived by the owner on Jun 21, 2022. It is now read-only.

uproot.lazyarray doesn't crash/warn if given multiple branches #306

Closed
kratsg opened this issue Jul 19, 2019 · 7 comments
Closed

uproot.lazyarray doesn't crash/warn if given multiple branches #306

kratsg opened this issue Jul 19, 2019 · 7 comments
Assignees

Comments

@kratsg
Copy link

kratsg commented Jul 19, 2019

I have a piece of code that extracts branches to use based on the selection strings passed in, doing something like:

branches_selection = formulate.from_auto(strformat_chars.sub('', selection_string)).variables
branches_weights = formulate.from_auto(strformat_chars.sub('', weight_string)).variables

I find that when I do this

tree = uproot.lazyarray(
    files,
    tree_name,
    itertools.chain(branches_selection, branches_weights),
)
print(tree)

(as formulate.from_x gives you a set), I get the strangest cryptic error:

  File "/Users/kratsg/.virtualenvs/optimization/lib/python3.7/site-packages/numexpr/necompiler.py", line 744, in getArguments
    a = local_dict[name]
  File "/Users/kratsg/.virtualenvs/optimization/lib/python3.7/site-packages/awkward/array/chunked.py", line 310, in __getitem__
    chunks.append(chunk[where])
  File "/Users/kratsg/.virtualenvs/optimization/lib/python3.7/site-packages/awkward/array/virtual.py", line 369, in __getitem__
    return self.array[where]
  File "/Users/kratsg/.virtualenvs/optimization/lib/python3.7/site-packages/awkward/array/virtual.py", line 295, in array
    return self.materialize()
  File "/Users/kratsg/.virtualenvs/optimization/lib/python3.7/site-packages/awkward/array/virtual.py", line 326, in materialize
    array = self._util_toarray(self._generator(*self._args, **self._kwargs), self.DEFAULTTYPE)
  File "/Users/kratsg/.virtualenvs/optimization/lib/python3.7/site-packages/uproot/tree.py", line 1823, in __call__
    return tree[branchname].lazyarray(interpretation=tree.interpretations[branchname], entrysteps=self.entrysteps, entrystart=None, entrystop=None, flatten=self.flatten, awkwardlib=awkward, cache=None, basketcache=self.basketcache, keycache=self.keycache, executor=self.executor, persistvirtual=self.persistvirtual)
KeyError: b'bjets_n'

yet all the files have that branch... Looking at the API, it only accepts one branch. I was somewhat surprised that it just accepted arbitrary iterables.

@jpivarski
Copy link
Member

On the one hand, the function you want is named lazyarrays with an "s". That's the one that takes a list of branch names. (Though if you give it only one string, it would interpret that as a singleton list.)

There is an issue with lazyarray (singular) if it fails with a weird error when you give it an iterable that isn't a string. The argument simply needs to be sanitized. A string is the only legal argument type.

@HDembinski
Copy link
Member

I just stumbled over the same issue. The difference lazyarrays vs lazyarray is difficult to spot.

@HDembinski
Copy link
Member

Possible improvements in order of my personal preference:

  1. lazyarray detects that the user passes a list of branches and calls the lazyarrays implementation. The original lazyarrays would become deprecated.
  2. lazyarray detects that the user passes a list of branches and throws an InvalidArgument error, telling the user to use lazyarrays.

@jpivarski
Copy link
Member

The original array/arrays were distinguished because you want to know if you're going to get a container (usually dict) back so that you can unpack it—you don't want the type to depend on a value like a * in the name of the branch that matches more than one branch. (Wildcards in the names are as effective as passing a list.) Scripts that work in one circumstance would break in others, seemingly mysteriously.

The idea behind the lazyarray/lazyarrays distinction was the same, though it's an awkward-array structure to unpack, rather than a Python structure to unpack. lazyarrays always returns a Table of whatever, even if it's only one thing, because if it's two things on some TTrees, you don't want to have to write your script to detect that.

I guess this means I'm convincing myself of your option 2: when requesting a lazyarray, you have no business specifying more than one branch name. If that's not true of array as well, it should be. Both lazyarray and array should point you to the pluralized forms if you pass a list. I see that's what I concluded on Jul 19, but haven't fixed it yet.

@jpivarski jpivarski self-assigned this Sep 26, 2019
@HDembinski
Copy link
Member

Makes sense! Option 2 is also fine. :)

@HDembinski
Copy link
Member

...and btw uproot is awesome.

jpivarski added a commit that referenced this issue Sep 27, 2019
… TTree.array, TTree.lazyarray, or uproot.lazyarray (all the non-pluralized methods), then a useful error message is raised.
@jpivarski
Copy link
Member

PR #350, will be part of 3.10.2.

jpivarski added a commit that referenced this issue Sep 27, 2019
Fixes issue #306: better error message.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants