Fixes #465 by explicitly trimming STL vector size (some ROOT files have unused/padding/junk in each event after the vector's serialized data). #466

jpivarski · 2020-03-17T01:36:07Z

@tamasgal This should fix your issue. As described here, the STL vector includes information about its length that we had been ignoring because ROOT also gives the position of the start of each event, so normally the STL vector size is redundant. But you've found a case where there's unused bytes/padding/junk written to the file after the STL vector itself, making the STL vector size not redundant.

Now uproot.asjagged has an additional parameter (sizeat) for the byte position of an integer that specifies each jagged subarray's size and STL vectors set that parameter. (Other jagged arrays do not because they don't have that information.) In most cases, it's still redundant, so we check that it agrees with the sizes inferred from the ROOT event starts and only trim the data if they disagree.

…ve unused/padding/junk in each event after the vector's serialized data).

jpivarski · 2020-03-17T01:36:28Z

Oh, I forgot to say: please stress-test this. Thanks!

tamasgal · 2020-03-17T09:02:14Z

Thanks Jim! I'd like to add tests if it's ok...

tamasgal · 2020-03-17T11:27:23Z

I'm afraid it's going deeper in the rabbit hole...

The fix now nicely cuts off the uninitialised values, but the shape is still not right.

In [4]: import uproot

In [5]: f = uproot.open("usr-nested.root")

In [6]: f['E']['Evt']['mc_trks']['mc_trks.usr_names'].array()
Out[6]: <ObjectArray [[b'bx', b'by', b'ichan', b'cc'] [b'bx', b'by', b'ichan', b'cc'] [b'bx', b'by', b'ichan', b'cc'] ... [b'bx', b'by', b'ichan', b'cc'] [b'bx', b'by', b'ichan', b'cc'] [b'bx', b'by', b'ichan', b'cc']] at 0x000119099c10>

In [7]: f['E']['Evt']['mc_trks']['mc_trks.usr'].array()[0]
Out[7]: array([0.048692, 0.058846, 3.      , 2.      ])

In this case, the [0]-th entry of both usr_names and usr should contain the same number of sub-arrays as any other attribute in mc_trks, like mc_trks.pos.x:

In [11]: f['E']['Evt']['mc_trks']['mc_trks.pos.x'].array()[0]
Out[11]:
array([32.263, 32.263, 32.263, 32.263, 32.263, 32.263, 32.263, 32.263,
       32.263, 32.263, 32.263, 32.263, 32.263, 32.263, 32.263, 32.263,
       32.263, 32.263, 32.263, 32.263, 32.263])

So the correct output output of f['E']['Evt']['mc_trks']['mc_trks.usr'].array()[0] would be 21 sub-items:

>>> f['E']['Evt']['mc_trks']['mc_trks.usr'].array()[0]
array([0.048692, 0.058846, 3.      , 2.      ] [63.2413] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] []))

and the f['E']['Evt']['mc_trks']['mc_trks.usr_names'].array()[0] also a nested list of:

>>> f['E']['Evt']['mc_trks']['mc_trks.usr_names'].array()[0]
[[b'bx', b'by', b'ichan', b'cc'], ['energy_lost_in_can'], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []]

At least in the usr_names field I see the energy_lost_in_can using uproot.debug:

In [15]: f['E']['Evt']['mc_trks']['mc_trks.usr_names'].array(uproot.asdebug)[0]
Out[15]:
array([ 64,   0,   0, 120,   0,   9,   0,   0,   0,   4,   2,  98, 120,
         2,  98, 121,   5, 105,  99, 104,  97, 110,   2,  99,  99,   0,
         0,   0,   1,  18, 101, 110, 101, 114, 103, 121,  95, 108, 111,
       115, 116,  95, 105, 110,  95,  99,  97, 110,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0], dtype=uint8)

In [16]: f['E']['Evt']['mc_trks']['mc_trks.usr_names'].array(uproot.asdebug)[0].tostring()
Out[16]: b'@\x00\x00x\x00\t\x00\x00\x00\x04\x02bx\x02by\x05ichan\x02cc\x00\x00\x00\x01\x12energy_lost_in_can\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

Since we expect 19 missing entries (0 length strings) after energy_lost_in_can and the scheme is always [length][string], it seems that 4 bytes are used to decode an empty string. Maybe it's just padding? So the first byte is \x00 and the the remaining \x00\x00\x00 are for the padding. The trailing \x00 in the usr_names field are at least exactly 76 bytes, which is divisible by 19 yielding 4 😉

In [20]: b = b"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\
    ...: x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
    ...: \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0
    ...: 0\x00\x00\x00\x00\x00\x00\x00\x00\x00"

In [21]: len(b)
Out[21]: 76

In [22]: len(b)/19
Out[22]: 4.0

One question is, how to interpret this as a nested list?

tamasgal · 2020-03-17T13:25:56Z

Btw. another similar structure is the trks.fitinf attribute of the Trk class (which also derives from AAObject) and is a list of lists. That one is parsed perfectly fine in uproot.

This is from the same file, which contains 636 tracks in the first event and therefore 636 sub-arrays in the fitinf attribute:

[ins] In [7]: import uproot

[ins] In [8]: f = uproot.open("usr-nested.root")

[ins] In [9]: f['E']['Evt']['trks']['trks.fitinf'].array()[0][:2]
Out[9]:
[[0.048538346479761546,
  0.03306135757929911,
  -38.38983929899999,
  17.0,
  0.0,
  0.0,
  2.1429282348347063,
  62.0,
  4.954644469809049,
  4.954644469809049,
  6.779622631909124],
 [0.036081671195352216,
  0.025045466802941046,
  -32.477731698672045,
  17.0,
  0.0,
  0.0,
  3.266832183594016,
  12.0,
  8.289712676040777,
  8.289712676040777,
  15.719224788428178]]

[ins] In [11]: len(f['E']['Evt']['trks']['trks.fitinf'].array()[0])
Out[11]: 636

jpivarski · 2020-03-17T13:38:26Z

Well, I don't see anything in the C structs or the ROOT metadata that suggest that f['E']['Evt']['mc_trks']['mc_trks.usr'] ought to be nested (more than a single jagged array across events; i.e. a single 1-D array in each event).

But also, the two branches you're looking at have the same jagged structure (correcting for the ObjectArray with an awkward.fromiter):

>>> mc_trks_usr_names = awkward.fromiter(f['E']['Evt']['mc_trks']['mc_trks.usr_names'].array())
>>> mc_trks_usr = f['E']['Evt']['mc_trks']['mc_trks.usr'].array()
>>> numpy.array_equal(mc_trks_usr_names.counts, mc_trks_usr.counts)
True

As a demonstration of this, we can match a string name to each numerical value—they're one-to-one:

>>> named_pairs = awkward.JaggedArray.zip(mc_trks_usr_names, mc_trks_usr)
>>> named_pairs
<JaggedArray [[(b'bx', 0.048692) (b'by', 0.058846) (b'ichan', 3.0) (b'cc', 2.0)] [(b'bx', 0.146713) (b'by', 0.400233) (b'ichan', 3.0) (b'cc', 2.0)] [(b'bx', 0.134258) (b'by', 0.077309) (b'ichan', 3.0) (b'cc', 2.0)] ... [(b'bx', 0.0) (b'by', 0.0) (b'ichan', 5.284424934e-315) (b'cc', 1.3742287462714573e-122)] [(b'bx', 5.304989477e-315) (b'by', 5e-324) (b'ichan', 175.476) (b'cc', 0.384255)] [(b'bx', 3.0) (b'by', 2.0) (b'ichan', 2.655811498e-314) (b'cc', -2.0868684099274518e-254)]] at 0x7f15ee0ad850>

tamasgal · 2020-03-17T13:40:39Z

This is very weird 😕 The data is definitely nested.

I was always confused about this . (dot) in the branch name, which might be used as a hint that something is nested? At least I only see this . in branch names for the "children" of the Evt class (branch).

What do you think?

Edit, I mean that it's ['mc_trks']['mc_trks.usr'] and not ['mc_trks']['usr']

jpivarski · 2020-03-17T13:48:50Z

Btw. another similar structure is the trks.fitinf attribute of the Trk class (which also derives from AAObject) and is a list of lists. That one is parsed perfectly fine in uproot.

The surprising thing is that ROOT is not trimming the serialized std::vector before writing it. The std::vector is valid, it says what its length is, and we know where each event starts. It's possible that this is not considered a bug—maybe it's faster or parallelized writing at the expense of larger files. It's even possible that ROOT developers don't know this is happening, since these files would read back correctly in ROOT (assuming ROOT looks at the std::vector header to deserialize it, which is almost certain). The only symptom would be that the files are a little larger.

It showed up as a bug in uproot because I made a strong assumption that the serialized std::vectors would entirely fill the basket. This PR weakens that assumption and introduces an additional check. (The std::vector size and the ROOT event positions appeared to be redundant information, and if they differ, which one is correct? From your example, it turns out that the std::vector size is the one that's correct.)

The fact that it's never come up before suggests that this is a rare feature-or-bug. Therefore, the fact that it happens in some branches and not others is also not too surprising.

tamasgal · 2020-03-17T13:55:31Z

I see.. what do you suggest? How do we proceed? Should I I fix this upstream and write an own method to parse the usr_names and derive the sub-structure of the corresponding usr from that information or should this be treated in uproot?

jpivarski · 2020-03-17T14:03:45Z

This is very weird The data is definitely nested.

I was always confused about this . (dot) in the branch name, which might be used as a hint that something is nested? At least I only see this . in branch names for the "children" of the Evt class (branch).

What do you think?

Edit, I mean that it's ['mc_trks']['mc_trks.usr'] and not ['mc_trks']['usr']

After it's been written to the file, the dots in the branch names are just a naming convention. When I say I see no evidence that they should be nested, I mean that they're not variable length of variable length objects (only one level of "variable length").

Nesting of same length or fixed-length objects is a matter of convention, whether you consider record_x, record_y, record_z to be three arrays or records to be an array of containers of three fields is a matter of how you interpret the dot:

>>> record_x = numpy.arange(10) + 0.1
>>> record_y = numpy.arange(10) + 0.2
>>> record_z = numpy.arange(10) + 0.3
>>> records = awkward.Table({"x": record_x, "y": record_y, "z": record_z})
>>> records.tolist()
[{'x': 0.1, 'y': 0.2, 'z': 0.3}, {'x': 1.1, 'y': 1.2, 'z': 1.3}, {'x': 2.1, 'y': 2.2, 'z': 2.3}, {'x': 3.1, 'y': 3.2, 'z': 3.3}, {'x': 4.1, 'y': 4.2, 'z': 4.3}, {'x': 5.1, 'y': 5.2, 'z': 5.3}, {'x': 6.1, 'y': 6.2, 'z': 6.3}, {'x': 7.1, 'y': 7.2, 'z': 7.3}, {'x': 8.1, 'y': 8.2, 'z': 8.3}, {'x': 9.1, 'y': 9.2, 'z': 9.3}]
>>> records.x
array([0.1, 1.1, 2.1, 3.1, 4.1, 5.1, 6.1, 7.1, 8.1, 9.1])
>>> records.y
array([0.2, 1.2, 2.2, 3.2, 4.2, 5.2, 6.2, 7.2, 8.2, 9.2])
>>> records.z
array([0.3, 1.3, 2.3, 3.3, 4.3, 5.3, 6.3, 7.3, 8.3, 9.3])
>>> records_x
>>> record_x
array([0.1, 1.1, 2.1, 3.1, 4.1, 5.1, 6.1, 7.1, 8.1, 9.1])
>>> record_y
array([0.2, 1.2, 2.2, 3.2, 4.2, 5.2, 6.2, 7.2, 8.2, 9.2])
>>> record_z
array([0.3, 1.3, 2.3, 3.3, 4.3, 5.3, 6.3, 7.3, 8.3, 9.3])

Uproot reads branches as completely separate objects. If they're supposed to be in a hierarchy of nested fixed-size objects (like the record above, which has the same fixed size for x, y, and z), then you can put them together into a Table (or in Awkward 1, a RecordArray).

If the inner data have variable sizes, then that's a problem you need Uproot to resolve. Fixed-size things are so fluidly combinable and separable that maybe that shouldn't be a rigid concept anymore, but variable-sized nesting is an irreducible thing.

Have we been talking past each other on the use of the word "nested"?

jpivarski · 2020-03-17T14:06:02Z

I see.. what do you suggest? How do we proceed? Should I I fix this upstream and write an own method to parse the usr_names and derive the sub-structure of the corresponding usr from that information or should this be treated in uproot?

Other than the wrong std::vector deserialization, which is fixed by this PR, what needs to be fixed?

tamasgal · 2020-03-17T14:26:32Z

OK, maybe we are talking past each other regarding the nested thing ;)

Other than the wrong std::vector deserialization, which is fixed by this PR, what needs to be fixed?

The problem is that only the first element of the list of lists is fixed now. But it's still top level.

In fact, the usr and usr_names are variable length lists with lists in them.

Let me try to summarise better. The following array gives the number of tracks in each event (it's a bit unusual that this branch lists numbers, but we got used to it):

In [5]: f['E']['Evt']['mc_trks'].array()
Out[5]:
array([21, 13,  2, 18,  5, 10,  2, 10, 13, 16, 19, 14, 17,  8,  2, 10,  2,
        2,  2,  2,  2, 11, 13,  2, 14, 14, 16, 12, 12, 14,  2, 12,  2, 21,
        2,  2, 18,  2,  2,  2,  2, 11, 11,  2, 16,  2, 13, 13,  2, 14,  7,
        2,  2, 12, 10, 10,  2,  9,  2, 10,  2, 12, 10,  2,  2, 16, 12,  2,
        2, 21,  9,  2, 21, 10, 17, 11, 10, 10,  2, 13,  9,  2, 10, 15, 14,
        2, 19,  2, 10, 22,  2,  2, 12, 15,  2,  2, 11, 11, 15, 12, 10,  2,
       18, 10, 13, 19, 16,  2, 19,  2, 10, 19,  9, 17,  2,  2,  2, 16,  2,
       15,  2, 16,  2,  2,  2, 26,  2,  2,  2, 13, 15, 17,  2,  2,  2,  2,
       18, 10,  8,  2, 22, 10,  2,  2,  2,  2,  9, 10, 14,  2,  2, 14,  2,
        2, 15,  2,  2,  9, 12, 16, 17,  2, 18, 21, 17,  2,  2, 12,  2, 13,
       15,  9,  2,  2,  2, 20,  2,  2, 21, 20, 14,  8, 15,  2, 16,  2, 26,
       17,  2,  2,  2,  2,  2,  2,  2, 11,  2,  2,  2,  2,  2,  2,  2,  8,
       14,  2, 14, 10,  2,  2,  2,  2,  8,  2, 17,  2, 19,  2,  2,  2,  2,
       11,  2,  2, 15,  2,  2,  2,  2, 16, 22,  2,  2, 15,  2,  2, 19, 11,
        2,  2,  2,  2, 11, 18, 21,  2,  2,  2,  2,  2,  2, 23,  2,  2,  2,
        2,  2,  2,  2,  2, 17, 16,  2, 11,  2,  2,  2,  2, 12, 40,  2,  2,
        2,  2, 18,  2,  2, 15,  2, 22,  2, 26,  2,  2,  2, 25,  2, 18,  2,
        2, 22,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,
        2], dtype=int32)

Here you see that the first event has 21 tracks, the second 13 and so on.

Now, each attribute of this mc_trks needs to be 21 long in the first event, 13 in the second and so on. For usr and usr_names these entries needs to be lists-of-lists (what I call nested lists)

Before the fix, the first entry (for the first event) had a length of 15, which seemed kind of random (instead of 21) and it was a flat list of values. The first 4 entries were in fact values, which one would expect in the first array of the entry ([0][0]).

After the fix, the length is OK, but it's still not nested and is missing the remaining 20 elements (as sub arrays):

In [6]: f['E']['Evt']['mc_trks']['mc_trks.usr'].array()[0]
Out[6]: array([0.048692, 0.058846, 3.      , 2.      ])

In [7]: f['E']['Evt']['mc_trks']['mc_trks.usr_names'].array()[0]
Out[7]: [b'bx', b'by', b'ichan', b'cc']

The expected output is the following:

>>> f['E']['Evt']['mc_trks']['mc_trks.usr'].array()[0]
array([0.048692, 0.058846, 3.      , 2.      ] [63.2413] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] []))

>>> f['E']['Evt']['mc_trks']['mc_trks.usr_names'].array()[0]
[[b'bx', b'by', b'ichan', b'cc'], ['energy_lost_in_can'], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []]

Each containing 21 sub-arrays for the first event.

Especially the last expected output is kind of easy to check, since the energy_lost_in_can value is not showing up. However, in the debug output you can see it (notice in Out[16]):

In [15]: f['E']['Evt']['mc_trks']['mc_trks.usr_names'].array(uproot.asdebug)[0]
Out[15]:
array([ 64,   0,   0, 120,   0,   9,   0,   0,   0,   4,   2,  98, 120,
         2,  98, 121,   5, 105,  99, 104,  97, 110,   2,  99,  99,   0,
         0,   0,   1,  18, 101, 110, 101, 114, 103, 121,  95, 108, 111,
       115, 116,  95, 105, 110,  95,  99,  97, 110,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0], dtype=uint8)

In [16]: f['E']['Evt']['mc_trks']['mc_trks.usr_names'].array(uproot.asdebug)[0].tostring()
Out[16]: b'@\x00\x00x\x00\t\x00\x00\x00\x04\x02bx\x02by\x05ichan\x02cc\x00\x00\x00\x01\x12energy_lost_in_can\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

jpivarski · 2020-03-17T14:40:40Z

Oh, then I have again misunderstood your intent! Instead of

>>> f['E']['Evt']['mc_trks']['mc_trks.usr'].interpretation
asjagged(asdtype('>f8'), 10, 6)
>>> f['E']['Evt']['mc_trks']['mc_trks.usr_names'].interpretation
asgenobj(STLVector(STLString()))

you want the interpretation to be

>>> branch = f['E']['Evt']['mc_trks']['mc_trks.usr']
>>> branch.array(uproot.asgenobj(uproot.SimpleArray(uproot.STLVector(uproot.asdtype(">f8"))), branch._context, 6))[0]
[[0.048692, 0.058846, 3.0, 2.0], [63.2413], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []]
>>> branch = f['E']['Evt']['mc_trks']['mc_trks.usr_names']
>>> branch.array(uproot.asgenobj(uproot.SimpleArray(uproot.STLVector(uproot.STLString())), branch._context, 6))[0]
[[b'bx', b'by', b'ichan', b'cc'], [b'energy_lost_in_can'], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []]

so the stuff after the first vector is not "random junk" or "padding" or anything—you were looking for more data. In that case, this PR is entirely mistaken and I'll be closing it, rather than merging it. If the above is right, then your real problem is why the interpretation thought it was singly jagged when it's doubly jagged.

However, when I'm looking a the C++ snippets you've given me, I would also assume that they're supposed to be singly jagged. As I understood it, you have an Evt struct that contains one std::vector: singly jagged. Where is the std::vector inside an std::vector?

tamasgal · 2020-03-17T15:23:36Z

Yes! Sorry that might be due to my bad vocabulary. Indeed I meant double jagged ;)

OK, so yes, indeed the C++ implementation is confusing.

The superclass for every class is basically this (the AAObject):

struct AAObject : public TObject
{
  std::vector<double>      usr;              ///< user data
  std::vector<std::string> usr_names;        ///< user keys
...
...

  TObject* any;                              ///< Pointer to "any" user data.

  ClassDef(AAObject, 6)
};

And then we have Trk and Hit etc. like this:

struct Trk: public AAObject
{
  int    id;                          ///< track identifier
  Vec    pos;                         ///< postion of the track at time t
  Vec    dir;                         ///< track direction
  double t;                           ///< track time (when the particle is at pos )
  double E;                           ///< Energy (either MC truth or reconstructed)

  double len;                         ///< length, if applicable
  double lik;                         ///< likelihood or lambda value (for aafit, lambda)
  int    type;                        ///< MC: particle type in PDG encoding.
  int               rec_type;         ///< identifyer for the overall fitting algorithm/chain/strategy
  std::vector<int>  rec_stages;       ///< list of identifyers of succesfull fitting stages resulting in this track

  int status;                         ///< MC status code
  int mother_id;                      ///< MC id of the parent particle 

  std::vector<double> fitinf;         ///< place to store additional fit info, for jgandalf, see JFitParameters.hh
  std::vector<int>    hit_ids;        ///< list of associated hit-ids (corresponds to Hit::id).
  std::vector<double> error_matrix;   ///< (5x5) error covariance matrix (stored as linear vector)
  std::string    comment;             ///< use as you like
...
...

And finally the Evt class which stores std::vectors of these Trk and Hit instances:

struct Evt: public AAObject
{
  int id;                       ///< offline event identifier 
  int det_id;                   ///< detector identifier from DAQ
  int mc_id;                    ///< identifier of the MC event (as found in ascii or antcc file).
  
  int run_id;                   ///< DAQ run identifier
  int mc_run_id;                ///< MC  run identifier
  
  int frame_index;              ///< from the raw data
  ULong64_t trigger_mask;       ///< trigger mask from raw data (i.e. the trigger bits)
  ULong64_t trigger_counter;    ///< trigger counter
  unsigned int overlays;        ///< number of overlaying triggered events
  TTimeStamp t;                 ///< UTC time of the start of the timeslice the event came from
  
  //hits and tracks
  std::vector<Hit> hits;        ///< list of hits
  std::vector<Trk> trks;        ///< list of reconstructed tracks (can be several because of prefits,showers, etc).   
...

jpivarski · 2020-03-17T16:04:36Z

I see, so the Evt has a std::vector of Trk and each Trk has a std::vector of usr and usr_names. That's where the two std::vectors come in.

I'm closing this PR because it is a wrong fix: the problem is not that there are any "junk bytes" after the single serialized std::vector, it's that there's supposed to be a SimpleArray of STLVector in the interpretation. The error is in TTree::_attachstreamers (maybe we descended the streamers incorrectly and assigned the wrong level node?) or the error is in uproot.interp.auto.interpret (the massive function that assigns interpretations from what metadata it finds).

Fixes #465 by explicitly trimming STL vector size (some ROOT files ha…

a76458e

…ve unused/padding/junk in each event after the vector's serialized data).

jpivarski closed this Mar 17, 2020

jpivarski deleted the issue-465 branch March 17, 2020 16:04

jpivarski mentioned this pull request Mar 17, 2020

[WIP] Try to fix #465 again; this time focusing on the interpretation. #467

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes #465 by explicitly trimming STL vector size (some ROOT files have unused/padding/junk in each event after the vector's serialized data). #466

Fixes #465 by explicitly trimming STL vector size (some ROOT files have unused/padding/junk in each event after the vector's serialized data). #466

jpivarski commented Mar 17, 2020

jpivarski commented Mar 17, 2020

tamasgal commented Mar 17, 2020

tamasgal commented Mar 17, 2020

tamasgal commented Mar 17, 2020

jpivarski commented Mar 17, 2020

tamasgal commented Mar 17, 2020 •

edited

Loading

jpivarski commented Mar 17, 2020

tamasgal commented Mar 17, 2020

jpivarski commented Mar 17, 2020

jpivarski commented Mar 17, 2020

tamasgal commented Mar 17, 2020

jpivarski commented Mar 17, 2020

tamasgal commented Mar 17, 2020

jpivarski commented Mar 17, 2020

Fixes #465 by explicitly trimming STL vector size (some ROOT files have unused/padding/junk in each event after the vector's serialized data). #466

Fixes #465 by explicitly trimming STL vector size (some ROOT files have unused/padding/junk in each event after the vector's serialized data). #466

Conversation

jpivarski commented Mar 17, 2020

jpivarski commented Mar 17, 2020

tamasgal commented Mar 17, 2020

tamasgal commented Mar 17, 2020

tamasgal commented Mar 17, 2020

jpivarski commented Mar 17, 2020

tamasgal commented Mar 17, 2020 • edited Loading

jpivarski commented Mar 17, 2020

tamasgal commented Mar 17, 2020

jpivarski commented Mar 17, 2020

jpivarski commented Mar 17, 2020

tamasgal commented Mar 17, 2020

jpivarski commented Mar 17, 2020

tamasgal commented Mar 17, 2020

jpivarski commented Mar 17, 2020

tamasgal commented Mar 17, 2020 •

edited

Loading