Skip to content
This repository has been archived by the owner on Jun 21, 2022. It is now read-only.

Commit

Permalink
Merge pull request #337 from scikit-hep/ttree-works
Browse files Browse the repository at this point in the history
Extend and append methods added
  • Loading branch information
jpivarski authored Sep 17, 2019
2 parents 3c303fe + 6e5a070 commit 8808ff1
Show file tree
Hide file tree
Showing 5 changed files with 674 additions and 20 deletions.
12 changes: 4 additions & 8 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,6 @@ env:
- PYVER=2.7 NPY="numpy==1.14.5"
- PYVER=2.7 NPY="numpy>=1.15"

- PYVER=3.4 NPY="numpy==1.13.1"
- PYVER=3.4 NPY="numpy==1.14.5"
- PYVER=3.4 NPY="numpy>=1.15"

- PYVER=3.5 NPY="numpy==1.13.1"
- PYVER=3.5 NPY="numpy==1.14.5"
- PYVER=3.5 NPY="numpy>=1.15"
Expand All @@ -37,7 +33,7 @@ install:
- export PATH="$HOME/miniconda/bin:$PATH"
- hash -r
- conda config --add channels conda-forge;
- conda install --quiet --yes -c conda-forge/label/mamba-alpha mamba
# conda install --quiet --yes -c conda-forge/label/mamba-alpha mamba
- conda config --set always_yes yes --set changeps1 no
# Create the conda testing environment
# FIXME: Mamba decides to upgrade Python here so pin it again
Expand All @@ -47,13 +43,13 @@ install:
- if [[ "${PYVER}" = pypy* ]]; then
conda create --quiet --yes -n testenv ${PYVER};
elif [ "${PYVER}" = "2.7" ] || [ "${PYVER}" = "3.6" ] || [ "${PYVER}" = "3.7" ]; then
mamba create --quiet --yes -n testenv python=${PYVER} pip;
conda create --quiet --yes -n testenv python=${PYVER} pip;
else
conda create --quiet --yes -n testenv python=${PYVER};
fi
- source activate testenv
- if [ "${PYVER}" = "2.7" ] || [ "${PYVER}" = "3.6" ] || [ "${PYVER}" = "3.7" ]; then
mamba install --quiet --yes python=${PYVER} pip root;
conda install --quiet --yes python=${PYVER} pip root;
source activate testenv;
fi
- pip install --upgrade setuptools-scm
Expand All @@ -68,7 +64,7 @@ install:
- pip install pandas
# pyopenssl is for deployment
- if [[ ${PYVER} != pypy* ]] ; then
mamba install -c anaconda python=${PYVER} pyopenssl;
conda install -c anaconda python=${PYVER} pyopenssl;
fi
- wget -O tests/samples/Event.root http://scikit-hep.org/uproot/examples/Event.root

Expand Down
150 changes: 149 additions & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4403,7 +4403,155 @@ which may have come from other libraries.
Writing TTrees
--------------

Coming this summer!
As of now, uproot can write TTrees whose branches are basic types
(integers and floating-point numbers).

Basic usage:

.. code-block:: python3
import uproot
import numpy
with uproot.recreate("example.root") as f:
f["t"] = uproot.newtree({"branch": "int32"})
f["t"].extend({"branch": numpy.array([1, 2, 3, 4, 5])})
You can specify the branches in your TTree explicitly:

.. code-block:: python3
t = uproot.newtree({"branch1": int,
"branch2": numpy.int32,
"branch3": uproot.newbranch(numpy.float64, title="This is the title")})
uproot.newtree() takes a python dictionary as an argument, where the key
is the name of the branch and the value is the branch object or type of
branch.

We can specify the title, the flushsize and the compression while
creating the tree.

This is an example of how you would add a title to your tree:

.. code-block:: python3
tree = uproot.newtree(branchdict, title="TTree Title")
To specify the title of the branch, similar to how you would add a title
to a tree:

.. code-block:: python3
b = uproot.newbranch("int32", title="This is the title")
Writing baskets
~~~~~~~~~~~~~~~

| Assume there are 2 branches in the TTree:
| branch1
| branch2
|
The suggested interface of writing baskets to the TTree is using the
extend method:

.. code-block:: python3
f["t"].extend({"branch1": numpy.array([1, 2, 3, 4, 5]), "branch2": [6, 7, 8, 9, 10]})
| The extend method takes a dictionary where the key is the name of the
branch and the value of the dictionary is a numpy array or a list of
data to be written to the branch.
|
| Remember to add entries to all the branches and the number of entries added to the branches is the same!
|
| You can specify a flush parameter to True or False in the extend method.
.. code-block:: python3
f["t"].extend({"branch1": numpy.array([1, 2, 3, 4, 5]), "branch2": [6, 7, 8, 9, 10]}, flush=True)
By default, it is true. This means that these values are immediately
flushed to the file.

| You can choose not to flush the baskets immediately by setting flush =
False.
.. code-block:: python3
f["t"].extend({"branch1": numpy.array([1, 2, 3, 4, 5]), "branch2": [1, 2, 3, 4, 5]}, flush=False)
| The baskets are added to a buffer which are flushed to the file
depending on the flush size set by the user.
The flush size can be set at the branch level and the tree level.

To set it at the branch level:

.. code-block:: python3
t = uproot.newbranch("int32", flushsize="10 KB")
and to set it at the tree level:

.. code-block:: python3
tree = uproot.newtree({"demoflush": t}, flushsize=1000)
You can also use the append function to add baskets to your file if you
need to just add a single value at the end of your current basket
buffer:

.. code-block:: python3
f["t"].append({"branch1": 1, "branch2": 2)
Make sure to add entries to every branch, similar to the extend method.

The append method does not provide a way to explicitly flush data to the
file, the data is added to the end of the buffer and is flushed hased on
the branch and tree flush sizes.

**Low level interface**

If you want, you can write a basket to only 1 branch. But remember to
add equal number of baskets to the other branches as well as ROOT
assumes that all the branches have equal number of baskets and will not
read the non-uniform baskets.

.. code-block:: python3
f["t"]["branch1"].newbasket([1, 2, 3])
Add 3 more basket data to branch2!

.. code-block:: python3
f["t"]["branch2"].newbasket([91, 92, 93])
Compression
~~~~~~~~~~~~~~~

By default, the baskets of all the branches are compressed depending on
the compression set for the file.

You can specify the compression of all the branches if you want it to be
separate from the compression specified for the entire file by using the uproot.newtree() method.

You can also specify the compression of each branch individually by using the uproot.newbranch() method.

.. code-block:: python3
b1 = uproot.newbranch("i4", compression=uproot.ZLIB(5))
b2 = uproot.newbranch("i8", compression=uproot.LZMA(4))
b3 = uproot.newbranch("f4")
branchdict = {"branch1": b1, "branch2": b2, "branch3": b3}
tree = uproot.newtree(branchdict, compression=uproot.LZ4(4))
with uproot.recreate("example.root", compression=uproot.LZMA(5)) as f:
f["t"] = tree
f["t"].extend({"branch1": [1]*1000, "branch2": [2]*1000, "branch3": [3]*1000})
Acknowledgements
================
Expand Down
182 changes: 180 additions & 2 deletions tests/test_write.py
Original file line number Diff line number Diff line change
Expand Up @@ -1660,10 +1660,10 @@ def test_tree_move_compress(tmp_path):
assert branch.GetCompressionAlgorithm() == 1
assert branch.GetCompressionLevel() == 4

def test_user_interface1(tmp_path):
def test_tree_renames(tmp_path):
filename = join(str(tmp_path), "example.root")

b = newbranch(">i4")
b = uproot.newbranch(">i4")
branchdict = {"intBranch": b}
tree = uproot.newtree(branchdict)
a = numpy.array([1], dtype=">i4")
Expand All @@ -1677,3 +1677,181 @@ def test_user_interface1(tmp_path):
treedata = tree.array("intBranch")
for i in range(19):
assert a[0] == treedata[i]

def test_ttree_extend_flush_true(tmp_path):
filename = join(str(tmp_path), "example.root")

b = uproot.newbranch(">i4")
branchdict = {"intBranch": b, "intBranch2": b}
tree = uproot.newtree(branchdict)
with uproot.recreate(filename) as f:
f["t"] = tree
basket_add = {"intBranch": numpy.array([1, 2, 3, 4, 5]), "intBranch2": numpy.array([6, 7, 8, 9, 10])}
f["t"].extend(basket_add)

f = ROOT.TFile.Open(filename)
tree = f.Get("t")
branch1 = tree.AsMatrix(["intBranch"])
branch2 = tree.AsMatrix(["intBranch2"])
branch1_test = numpy.array([1, 2, 3, 4, 5], dtype=">i4")
branch2_test = numpy.array([6, 7, 8, 9, 10], dtype=">i4")
for i in range(5):
assert branch1[i] == branch1_test[i]
assert branch2[i] == branch2_test[i]

def test_ttree_extend_flush_false(tmp_path):
filename = join(str(tmp_path), "example.root")

b = uproot.newbranch(">i4")
branchdict = {"intBranch": b}
tree = uproot.newtree(branchdict, flushsize=5)
with uproot.recreate(filename) as f:
f["t"] = tree
basket_add = {"intBranch": numpy.array([1, 2])}
f["t"].extend(basket_add, flush=False)

f = ROOT.TFile.Open(filename)
tree = f.Get("t")
branch1 = tree.AsMatrix(["intBranch"])
branch1_test = numpy.array([1, 2], dtype=">i4")
for i in range(2):
assert branch1[i] == branch1_test[i]

def test_ttree_extend_flush_false_readback_and_proper_close(tmp_path):
filename = join(str(tmp_path), "example.root")

b = uproot.newbranch(">i4")
branchdict = {"intBranch": b}
tree = uproot.newtree(branchdict, flushsize=9)
with uproot.recreate(filename) as f:
f["t"] = tree
basket_add = {"intBranch": numpy.array([1, 2])}
f["t"].extend(basket_add, flush=False)
assert f["t"]["intBranch"].numbaskets == 0

f = ROOT.TFile.Open(filename)
tree = f.Get("t")
branch1 = tree.AsMatrix(["intBranch"])
branch1_test = numpy.array([1, 2], dtype=">i4")
for i in range(2):
assert branch1[i] == branch1_test[i]

def test_ttree_extend_flush_false_multibranch_same_type(tmp_path):
filename = join(str(tmp_path), "example.root")

b = uproot.newbranch(">i4")
branchdict = {"intBranch": b, "intBranch2": b}
tree = uproot.newtree(branchdict, flushsize=9)
with uproot.recreate(filename) as f:
f["t"] = tree
basket_add = {"intBranch": numpy.array([1, 2]), "intBranch2": numpy.array([11, 12])}
f["t"].extend(basket_add, flush=False)

f = ROOT.TFile.Open(filename)
tree = f.Get("t")
branch1 = tree.AsMatrix(["intBranch"])
branch2 = tree.AsMatrix(["intBranch2"])
branch1_test = numpy.array([1, 2], dtype=">i4")
branch2_test = numpy.array([11, 12], dtype=">i4")
for i in range(2):
assert branch1[i] == branch1_test[i]
assert branch2[i] == branch2_test[i]

def test_ttree_extend_flush_false_multibranch_diff_type(tmp_path):
filename = join(str(tmp_path), "example.root")

b = uproot.newbranch(">i4")
b2 = uproot.newbranch(">i8")
branchdict = {"intBranch": b, "intBranch2": b2}
tree = uproot.newtree(branchdict, flushsize=9)
with uproot.recreate(filename) as f:
f["t"] = tree
basket_add = {"intBranch": numpy.array([1, 2]), "intBranch2": numpy.array([11, 12])}
f["t"].extend(basket_add, flush=False)

f = ROOT.TFile.Open(filename)
tree = f.Get("t")
branch1 = tree.AsMatrix(["intBranch"])
branch2 = tree.AsMatrix(["intBranch2"])
branch1_test = numpy.array([1, 2], dtype=">i4")
branch2_test = numpy.array([11, 12], dtype=">i8")
for i in range(2):
assert branch1[i] == branch1_test[i]
assert branch2[i] == branch2_test[i]

def test_ttree_extend_flush_false_multibasket(tmp_path):
filename = join(str(tmp_path), "example.root")

b = uproot.newbranch(">i4")
branchdict = {"intBranch": b}
tree = uproot.newtree(branchdict, flushsize=5)
with uproot.recreate(filename) as f:
f["t"] = tree
basket_add = {"intBranch": numpy.array([1, 2])}
for i in range(3):
f["t"].extend(basket_add, flush=False)

f = ROOT.TFile.Open(filename)
tree = f.Get("t")
branch1 = tree.AsMatrix(["intBranch"])
branch1_test = numpy.array([1, 2]*3, dtype=">i4")
for i in range(6):
assert branch1[i] == branch1_test[i]

def test_ttree_extend_flush_false_multibasket_multibranch_same_type(tmp_path):
filename = join(str(tmp_path), "example.root")

b = uproot.newbranch(">i4")
branchdict = {"intBranch": b, "intBranch2": b}
tree = uproot.newtree(branchdict, flushsize=5)
with uproot.recreate(filename) as f:
f["t"] = tree
basket_add = {"intBranch": numpy.array([1, 2]), "intBranch2": numpy.array([11, 12])}
for i in range(3):
f["t"].extend(basket_add, flush=False)

f = ROOT.TFile.Open(filename)
tree = f.Get("t")
branch1 = tree.AsMatrix(["intBranch"])
branch2 = tree.AsMatrix(["intBranch2"])
branch1_test = numpy.array([1, 2] * 3, dtype=">i4")
branch2_test = numpy.array([11, 12] * 3, dtype=">i4")
for i in range(6):
assert branch1[i] == branch1_test[i]
assert branch2[i] == branch2_test[i]

def test_ttree_extend_flush_false_multibasket_multibranch_diff_type(tmp_path):
filename = join(str(tmp_path), "example.root")

b = uproot.newbranch(">i4")
b2 = uproot.newbranch(">i8")
branchdict = {"intBranch": b, "intBranch2": b2}
tree = uproot.newtree(branchdict, flushsize=5)
with uproot.recreate(filename) as f:
f["t"] = tree
basket_add = {"intBranch": numpy.array([1, 2]), "intBranch2": numpy.array([11, 12])}
for i in range(3):
f["t"].extend(basket_add, flush=False)

f = ROOT.TFile.Open(filename)
tree = f.Get("t")
branch1 = tree.AsMatrix(["intBranch"])
branch2 = tree.AsMatrix(["intBranch2"])
branch1_test = numpy.array([1, 2] * 3, dtype=">i4")
branch2_test = numpy.array([11, 12] * 3, dtype=">i8")
for i in range(6):
assert branch1[i] == branch1_test[i]
assert branch2[i] == branch2_test[i]

def test_ttree_extend_flush_false_diff_flush(tmp_path):
filename = join(str(tmp_path), "example.root")

b = uproot.newbranch(">i4", flushsize=5)
branchdict = {"intBranch": b}
tree = uproot.newtree(branchdict, flushsize=12)
with uproot.recreate(filename) as f:
f["t"] = tree
f["t"].extend({"intBranch": numpy.array([1])}, flush=False)
assert f["t"]["intBranch"].numbaskets == 0
f["t"].extend({"intBranch": numpy.array([2, 3])}, flush=False)
assert f["t"]["intBranch"].numbaskets == 2
Loading

0 comments on commit 8808ff1

Please sign in to comment.