Skip to content
This repository has been archived by the owner on Jun 21, 2022. It is now read-only.

Reading custom classes in BDSIM Trees #371

Closed
chernals opened this issue Oct 7, 2019 · 7 comments
Closed

Reading custom classes in BDSIM Trees #371

chernals opened this issue Oct 7, 2019 · 7 comments

Comments

@chernals
Copy link

chernals commented Oct 7, 2019

I am attempting to process BDSIM ROOT output files (see http://www.pp.rhul.ac.uk/bdsim/manual/) using uproot to take advantage of your wonderful library and simplify our present workflow (pyroot + root_numpy with bits of rootpy and root_pandas).

So far so good, except, for custom classes. I tried to follow issue #124 but this seems to be different.

The custom class is correctly shown with a TBrowser; see screenshot:

Screenshot 2019-10-07 14 41 17

Using show within uproot reports this:

DRIFT_0. TStreamerInfo asdtype("[(' fBits', '>u8'), (' fUniqueID', '>u8'), ('n', '>i4'), ('z', '>f4'), ('modelID', '>i4'), ('S', '>f4')]")

The interpretation appears to be wrong.

The class which is feed onto the branches like 'DRIFT_0' is this one: https://bitbucket.org/jairhul/bdsim/src/master/include/BDSOutputROOTEventSampler.hh .

I must admit that I do not know enough about root to correctly diagnose what's happening, so any pointer would be appreciated.

An example file is shared on Dropbox.

jpivarski added a commit that referenced this issue Oct 7, 2019
jpivarski added a commit that referenced this issue Oct 7, 2019
@jpivarski
Copy link
Member

Yes, the interpretation was wrong. Actually, it should have failed outright because std::string, std::vector<float>, and std::vector<int> hadn't been implemented inside of unsplit class objects. There was supposed to be a NotImplementedError, but that code path got circumvented and since I've never had a file like this, I've never had a chance to test it.

But the file you provided allowed me to do one better: I've now implemented these types in unsplit classes. You can now read in an array of these objects:

t = uproot.open("tests/samples/output.root")["Event"]
obj = t["DRIFT_0."].array()[0]
assert obj._samplerName == b'DRIFT_0'
assert obj._n == 1
assert obj._energy[0] == numpy.array([2.3371024], dtype=numpy.float32)[0]

The leading underscore is because these are raw data members from the C++ class; in principle, they could be private. uproot has a mechanism to provide Python methods and properties for class objects, which usually appear without an underscore (e.g. TTree._fName vs TTree.name, where name is a property). The uproot-methods project exists to adorn classes from ROOT with Python methods and properties, but I doubt you want to get into all of that.

What I've implemented only covers std::string and std::vectors of numerical types, which are simpler than vectors of other types, but this seems to cover a lot of the ROOT files people have produced.

I hope this helps to simplify your workflow!

@jpivarski
Copy link
Member

Oops; forgot to make the PR. Now it's PR #372. Once it has passed its tests, it will be uproot version 3.10.6.

@chernals
Copy link
Author

chernals commented Oct 7, 2019

@jpivarski Thanks a lot for looking into that so quickly, this is very much appreciated!

I tested the branch and it works as advertised.

However, I am now a bit puzzled: the class, having the " //|| Don't split the header" thing, it means that I will only get an ObjectArray, right?

Obviously, given the way these data are produced, I was expecting a jagged array, for example, when reading the members "x", "y", etc.

Could you confirm that this is not linked to uproot and that a modification in the code should be done to allow this? Maybe simply remove the "don't split" thing?

One more question: how come TBrowser is splitting it, although it is a non-split thing? Just making it looks like it's split?

@jpivarski
Copy link
Member

In ROOT, splitting is an internal implementation detail. I don't think a class split into many branches will look different in the TBrowser than a class in one branch, unsplit. In uproot, splitting is visible to the user because the interface lets you pick which branches you want to read.

If a class is not split, it will not be readable in uproot as a JaggedArray of its components. "Split" means that the data for all x values (per basket) are contiguous on disk and the data for all y values are in another contiguous branch. "Not split" means that all the data for one object in one event is contiguous on disk: x, y, z, etc. before we see the next event's x. Uproot reads split data into JaggedArrays by minimally processing the branch, which is why it's fast. If the data are unsplit, there's no alternative but to read whole objects, one at a time, in (auto-generated) Python code.

If uproot were to present split and unsplit data with the same interface, it would hide a huge performance difference that could be hard to understand. Therefore, it presents split data as JaggedArrays and unsplit data as ObjectArrays.

If you want the convenience of working with JaggedArrays, there's a function in awkward-array for doing that conversion, but first you have to make it look like JSON:

import awkward
jagged_table = awkward.fromiter([{n: getattr(x, "_" + n) for n in x._members()}
                                 for x in object_array])
jagged_table.layout   # look at the structure that was made
jagged_table["x"]     # or jagged_table.x
jagged_table["y"]     # or jagged_table.y
jagged_table["z"]     # or jagged_table.z

This is ironic because the object-oriented view of the world (i.e. not jagged arrays) was created for convenience!

@chernals
Copy link
Author

chernals commented Oct 7, 2019

@jpivarski Thanks again, that makes a lot of sense.

I will then investigate if I can split it directly...

@chernals
Copy link
Author

chernals commented Oct 7, 2019

For the record, I am now splitting the TBranch and it works as expected :)

@jpivarski
Copy link
Member

Great! I'm glad you found a solution that works better for you.

However, I'm also glad that you took the initial detour because I've wanted an example of a file like that to implement nested std::vectors anyway.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants