scikit-hep · jpivarski · Dec 11, 2020 · Dec 11, 2020 · Dec 11, 2020 · Dec 11, 2020
diff --git a/docs-src/_toc.yml b/docs-src/_toc.yml
@@ -20,8 +20,8 @@
       title: "Arrow and Parquet"
     - file: how-to-convert-pandas
       title: "Pandas"
-    - file: how-to-convert-arrayset
-      title: "Generic array-sets"
+    - file: how-to-convert-buffers
+      title: "Generic buffers"
 
 - file: how-to-create
   title: "Creating new arrays"

diff --git a/docs-src/how-to-convert-arrayset.md → docs-src/how-to-convert-buffers.md b/docs-src/how-to-convert-arrayset.md → docs-src/how-to-convert-buffers.md
@@ -11,10 +11,10 @@ kernelspec:
   name: python3
 ---
 
-Generic array-sets
-==================
+Generic buffers
+===============
 
-Most of the conversion functions target a particular library: NumPy, Arrow, Pandas, or Python itself. As a catch-all for other storage formats, Awkward Arrays can be converted to and from "array-sets," sets of named arrays with a schema that can be used to reconstruct the original array. This section will demonstrate how an array-set can be used to store an Awkward Array in an HDF5 file, which ordinarily wouldn't be able to represent nested, irregular data structures.
+Most of the conversion functions target a particular library: NumPy, Arrow, Pandas, or Python itself. As a catch-all for other storage formats, Awkward Arrays can be converted to and from sets of named buffers. The buffers are not (usually) intelligible on their own; the length of the array and a JSON document are needed to reconstitute the original structure. This section will demonstrate how an array-set can be used to store an Awkward Array in an HDF5 file, which ordinarily wouldn't be able to represent nested, irregular data structures.
 
 ```{code-cell} ipython3
 import awkward as ak
@@ -23,8 +23,8 @@ import h5py
 import json
 ```
 
-From Awkward to an array-set
-----------------------------
+From Awkward to buffers
+-----------------------
 
 Consider the following complex array:
 
@@ -37,18 +37,17 @@ ak_array = ak.Array([
 ak_array
 ```
 
-The [ak.to_arrayset](https://awkward-array.readthedocs.io/en/latest/_auto/ak.to_arrayset.html) function decomposes it into a set of one-dimensional arrays (a zero-copy operation).
+The [ak.to_buffers](https://awkward-array.readthedocs.io/en/latest/_auto/ak.to_buffers.html) function decomposes it into a set of one-dimensional arrays (a zero-copy operation).
 
 ```{code-cell} ipython3
-form, container, num_partitions = ak.to_arrayset(ak_array)
+form, length, container = ak.to_buffers(ak_array)
 ```
 
 The pieces needed to reconstitute this array are:
 
    * the [Form](https://awkward-array.readthedocs.io/en/latest/ak.forms.Form.html), which defines how structure is built from one-dimensional arrays,
-   * the one-dimensional arrays in the `container` (a [MutableMapping](https://docs.python.org/3/library/collections.abc.html#collections-abstract-base-classes)),
-   * the number of partitions, if any,
-   * the length of the original array or lengths of all partitions ([ak.partitions](https://awkward-array.readthedocs.io/en/latest/_auto/ak.partitions.html)) are needed if we wish to read it back _lazily_ (more on that below).
+   * the length of the original array or lengths of all of its partitions ([ak.partitions](https://awkward-array.readthedocs.io/en/latest/_auto/ak.partitions.html)),
+   * the one-dimensional arrays in the `container` (a [MutableMapping](https://docs.python.org/3/library/collections.abc.html#collections-abstract-base-classes)).
 
 The [Form](https://awkward-array.readthedocs.io/en/latest/ak.forms.Form.html) is like an Awkward [Type](https://awkward-array.readthedocs.io/en/latest/ak.types.Type.html) in that it describes how the data are structured, but with more detail: it includes distinctions such as the difference between [ListArray](https://awkward-array.readthedocs.io/en/latest/ak.layout.ListArray.html) and [ListOffsetArray](https://awkward-array.readthedocs.io/en/latest/ak.layout.ListOffsetArray.html), as well as the integer types of structural [Indexes](https://awkward-array.readthedocs.io/en/latest/ak.layout.Index.html).
 
@@ -58,48 +57,42 @@ It is usually presented as JSON, and has a compact JSON format (when [Form.tojso
 form
 ```
 
-This `container` is a new dict, but it could have been a user-specified [MutableMapping](https://docs.python.org/3/library/collections.abc.html#collections-abstract-base-classes).
+In this case, the `length` is just an integer. It would be a list of integers if `ak_array` was partitioned.
 
 ```{code-cell} ipython3
-container
+length
 ```
 
-This array has no partitions.
+This `container` is a new dict, but it could have been a user-specified [MutableMapping](https://docs.python.org/3/library/collections.abc.html#collections-abstract-base-classes) if passed into [ak.to_buffers](https://awkward-array.readthedocs.io/en/latest/_auto/ak.to_buffers.html) as an argument.
 
 ```{code-cell} ipython3
-num_partitions is None
-```
-
-This is also what we find from [ak.partitions](https://awkward-array.readthedocs.io/en/latest/_auto/ak.partitions.html).
-
-```{code-cell} ipython3
-ak.partitions(ak_array) is None
+container
 ```
 
-From array-set to Awkward
--------------------------
+From buffers to Awkward
+-----------------------
 
-The function that reverses [ak.to_arrayset](https://awkward-array.readthedocs.io/en/latest/_auto/ak.to_arrayset.html) is [ak.from_arrayset](https://awkward-array.readthedocs.io/en/latest/_auto/ak.from_arrayset.html). Its first three arguments are `form`, `container`, and `num_partitions`.
+The function that reverses [ak.to_buffers](https://awkward-array.readthedocs.io/en/latest/_auto/ak.to_buffers.html) is [ak.from_buffers](https://awkward-array.readthedocs.io/en/latest/_auto/ak.from_buffers.html). Its first three arguments are `form`, `length`, and `container`.
 
 ```{code-cell} ipython3
-ak.from_arrayset(form, container, num_partitions)
+ak.from_buffers(form, length, container)
 ```
 
 Saving Awkward Arrays to HDF5
 -----------------------------
 
-The [h5py](https://www.h5py.org/) library presents each group in an HDF5 file as a [MutableMapping](https://docs.python.org/3/library/collections.abc.html#collections-abstract-base-classes), which we can use as a container for an array-set. We must also save the `form`, `num_partitions`, and `length` as metadata for the array to be retrievable.
+The [h5py](https://www.h5py.org/) library presents each group in an HDF5 file as a [MutableMapping](https://docs.python.org/3/library/collections.abc.html#collections-abstract-base-classes), which we can use as a container for an array-set. We must also save the `form` and `length` as metadata for the array to be retrievable.
 
 ```{code-cell} ipython3
 file = h5py.File("/tmp/example.hdf5", "w")
 group = file.create_group("awkward")
 group
 ```
 
-We can fill this `group` as a `container` by passing it in to [ak.to_arrayset](https://awkward-array.readthedocs.io/en/latest/_auto/ak.to_arrayset.html).
+We can fill this `group` as a `container` by passing it in to [ak.to_buffers](https://awkward-array.readthedocs.io/en/latest/_auto/ak.to_buffers.html).
 
 ```{code-cell} ipython3
-form, container, num_partitions = ak.to_arrayset(ak_array, container=group)
+form, length, container = ak.to_buffers(ak_array, container=group)
 ```
 
 ```{code-cell} ipython3
@@ -115,7 +108,7 @@ container.keys()
 Here's one.
 
 ```{code-cell} ipython3
-np.asarray(container["node0-offsets"])
+np.asarray(container["part0-node0-offsets"])
 ```
 
 Now we need to add the other information to the group as metadata. Since HDF5 accepts string-valued metadata, we can put it all in as JSON or numbers.
@@ -126,38 +119,27 @@ group.attrs["form"]
 ```
 
 ```{code-cell} ipython3
-group.attrs["num_partitions"] = json.dumps(num_partitions)
-group.attrs["num_partitions"]
-```
-
-```{code-cell} ipython3
-group.attrs["partition_lengths"] = json.dumps(ak.partitions(ak_array))
-group.attrs["partition_lengths"]
-```
-
-```{code-cell} ipython3
-group.attrs["length"] = len(ak_array)
+group.attrs["length"] = json.dumps(length)   # JSON-encode it because it might be a list
 group.attrs["length"]
 ```
 
 Reading Awkward Arrays from HDF5
 --------------------------------
 
-With that, we can reconstitute the array by supplying [ak.from_arrayset](https://awkward-array.readthedocs.io/en/latest/_auto/ak.from_arrayset.html) the right arguments from the group and metadata.
+With that, we can reconstitute the array by supplying [ak.from_buffers](https://awkward-array.readthedocs.io/en/latest/_auto/ak.from_buffers.html) the right arguments from the group and metadata.
 
 The group can't be used as a `container` as-is, since subscripting it returns `h5py.Dataset` objects, rather than arrays.
 
 ```{code-cell} ipython3
-reconstituted = ak.from_arrayset(
+reconstituted = ak.from_buffers(
     ak.forms.Form.fromjson(group.attrs["form"]),
+    json.loads(group.attrs["length"]),
     {k: np.asarray(v) for k, v in group.items()},
 )
 reconstituted
 ```
 
-Like [ak.from_parquet](https://awkward-array.readthedocs.io/en/latest/_auto/ak.from_parquet.html), [ak.from_arrayset](https://awkward-array.readthedocs.io/en/latest/_auto/ak.from_arrayset.html) has the option to read lazily, only accessing record fields and partitions that are accessed.
-
-To do so, we need to pass `lazy=True`, but also the total length of the array (if not partitioned) or the lengths of all the partitions (if partitioned).
+Like [ak.from_parquet](https://awkward-array.readthedocs.io/en/latest/_auto/ak.from_parquet.html), [ak.from_buffers](https://awkward-array.readthedocs.io/en/latest/_auto/ak.from_buffers.html) has the option to read lazily, only accessing record fields and partitions that are accessed.
 
 ```{code-cell} ipython3
 class LazyGet:
@@ -168,11 +150,11 @@ class LazyGet:
         print(key)
         return np.asarray(self.group[key])
 
-lazy = ak.from_arrayset(
+lazy = ak.from_buffers(
     ak.forms.Form.fromjson(group.attrs["form"]),
+    json.loads(group.attrs["length"]),
     LazyGet(group),
     lazy=True,
-    lazy_lengths = group.attrs["length"],
 )
 ```
 

diff --git a/docs-src/how-to-convert.md b/docs-src/how-to-convert.md
@@ -20,4 +20,4 @@ Converting arrays
    * **[ROOT via Uproot](how-to-convert-uproot)**
    * **[Arrow and Parquet](how-to-convert-arrow)**
    * **[Pandas](how-to-convert-pandas)**
-   * **[Generic array-sets](how-to-convert-arrayset)**
+   * **[Generic array-sets](how-to-convert-buffers)**
diff --git a/src/awkward/_util.py b/src/awkward/_util.py
@@ -761,7 +761,9 @@ def apply(inputs, depth, user):
                 outcontent = apply(nextinputs, depth + 1, user)
                 assert isinstance(outcontent, tuple)
 
-                return tuple(ak.layout.RegularArray(x, maxsize, maxlen) for x in outcontent)
+                return tuple(
+                    ak.layout.RegularArray(x, maxsize, maxlen) for x in outcontent
+                )
 
             elif not all_same_offsets(nplike, inputs):
                 fcns = [
@@ -1695,3 +1697,31 @@ def union_to_record(unionarray, anonymous):
             )
 
         return ak.layout.RecordArray(all_fields, all_names, len(unionarray))
+
+
+def adjust_old_pickle(form, container, num_partitions, behavior):
+    def key_format(**v):
+        if num_partitions is None:
+            if v["attribute"] == "data":
+                return "{form_key}".format(**v)
+            else:
+                return "{form_key}-{attribute}".format(**v)
+
+        else:
+            if v["attribute"] == "data":
+                return "{form_key}-part{partition}".format(**v)
+            else:
+                return "{form_key}-{attribute}-part{partition}".format(**v)
+
+    return ak.operations.convert.from_buffers(
+        form,
+        None,
+        container,
+        partition_start=0,
+        key_format=key_format,
+        lazy=False,
+        lazy_cache="new",
+        lazy_cache_key=None,
+        highlevel=False,
+        behavior=behavior,
+    )
diff --git a/src/awkward/highlevel.py b/src/awkward/highlevel.py
@@ -1386,16 +1386,24 @@ def numba_type(self):
         return numba.typeof(self._numbaview)
 
     def __getstate__(self):
-        form, container, num_partitions = ak.to_arrayset(self)
+        form, length, container = ak.operations.convert.to_buffers(self._layout)
         if self._behavior is ak.behavior:
             behavior = None
         else:
             behavior = self._behavior
-        return form, container, num_partitions, behavior
+        return form, length, container, behavior
 
     def __setstate__(self, state):
-        form, container, num_partitions, behavior = state
-        layout = ak.from_arrayset(form, container, num_partitions, highlevel=False)
+        if isinstance(state[1], dict):
+            form, container, num_partitions, behavior = state
+            layout = ak._util.adjust_old_pickle(
+                form, container, num_partitions, behavior
+            )
+        else:
+            form, length, container, behavior = state
+            layout = ak.operations.convert.from_buffers(
+                form, length, container, highlevel=False, behavior=behavior
+            )
         if self.__class__ is Array:
             self.__class__ = ak._util.arrayclass(layout, behavior)
         self.layout = layout
@@ -1975,17 +1983,25 @@ def numba_type(self):
         return numba.typeof(self._numbaview)
 
     def __getstate__(self):
-        form, container, num_partitions = ak.to_arrayset(self._layout.array)
+        form, length, container = ak.operations.convert.to_buffers(self._layout.array)
         if self._behavior is ak.behavior:
             behavior = None
         else:
             behavior = self._behavior
-        return form, container, num_partitions, behavior, self._layout.at
+        return form, length, container, behavior, self._layout.at
 
     def __setstate__(self, state):
-        form, container, num_partitions, behavior, at = state
-        array = ak.from_arrayset(form, container, num_partitions, highlevel=False)
-        layout = ak.layout.Record(array, at)
+        if isinstance(state[1], dict):
+            form, container, num_partitions, behavior, at = state
+            layout = ak._util.adjust_old_pickle(
+                form, container, num_partitions, behavior
+            )
+        else:
+            form, length, container, behavior, at = state
+            layout = ak.operations.convert.from_buffers(
+                form, length, container, highlevel=False, behavior=behavior
+            )
+        layout = ak.layout.Record(layout, at)
         if self.__class__ is Record:
             self.__class__ = ak._util.recordclass(layout, behavior)
         self.layout = layout