Skip to content

Commit

Permalink
docs: refactor documentation (#95)
Browse files Browse the repository at this point in the history
* refactor

* refactor

* update functions

* add notes
  • Loading branch information
ianna authored Jun 4, 2024
1 parent 5e7a6ad commit 3558465
Show file tree
Hide file tree
Showing 8 changed files with 2,184 additions and 41 deletions.
8 changes: 7 additions & 1 deletion docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,13 @@ makedocs(;
pages=[
"Introduction" => "index.md",
"Example Usage" => "exampleusage.md",
"Reference Guide" => "api.md",
"API" => Any[
"Types" => "types.md",
"Functions" => "functions.md",
hide("Indexing" => "indexing.md"),
hide("Internals" => "internals.md"),
],
hide("Reference Guide" => "api.md"),
"LICENSE" => "LICENSE.md",
],
repo="https://github.com/JuliaHEP/AwkwardArray.jl/blob/{commit}{path}#L{line}",
Expand Down
48 changes: 8 additions & 40 deletions docs/src/api.md
Original file line number Diff line number Diff line change
@@ -1,48 +1,16 @@
## List of functions
# Public Documentation

Every `Content` subclass has the following built-in functions:
Documentation for `AwkwardArray.jl`'s public interface.

* `Base.length`
* `Base.size` (1-tuple of `length`)
* `Base.firstindex`, `Base.lastindex` (1-based or inherited from its index)
* `Base.getindex`: select by `Int` (single item), `UnitRange{Int}` (slice), and `Symbol` (record field)
* `Base.iterate`
* `Base.(==)` (equality defined by values: a `ListOffsetArray` and a `ListArray` may be considered the same)
* `Base.push!`
* `Base.append!`
* `Base.show`
See the Internals section of the manual for internal package docs covering all submodules.

They also have the following functions for manipulating and checking structure:
## Index

* `AwkwardArray.parameters_of`: gets all parameters
* `AwkwardArray.has_parameter`: returns true if a parameter exists
* `AwkwardArray.get_parameter`: returns a parameter or raises an error
* `AwkwardArray.with_parameter`: returns a copy of this node with a specified parameter
* `AwkwardArray.copy`: shallow-copy of the array, allowing properties to be replaced
* `AwkwardArray.is_valid`: verifies that the structure adheres to Awkward Array's protocol
```@index
Pages = ["api.md"]
```

They have the following functions for filling an array:
## Public Interface

* `AwkwardArray.end_list!`: closes off a `ListType` array (`ListOffsetArray`, `ListArray`, or `RegularArray`) in the manner of Python's [ak.ArrayBuilder](https://awkward-array.org/doc/main/reference/generated/ak.ArrayBuilder.html) (no `begin_list` is necessary)
* `AwkwardArray.end_record!`: closes off a `RecordArray`
* `AwkwardArray.end_tuple!`: closes off a `TupleArray`
* `AwkwardArray.push_null!`: pushes a missing value onto `OptionType` arrays (`IndexedOptionArray`, `ByteMaskedArray`, `BitMaskedArray`, or `UnmaskedArray`)
* `AwkwardArray.push_dummy!`: pushes an unspecified value onto the array (used by `ByteMaskedArray` and `BitMaskedArray`, which need to have a placeholder in memory behind each `missing` value)

`RecordArray` and `TupleArray` have the following for selecting fields (as opposed to rows):

* `AwkwardArray.slot`: gets a `RecordArray` or `TupleArray` field, to avoid conflicts with `Base.getindex` for `TupleArrays` (both use integers to select a field)
* `AwkwardArray.Record`: scalar representation of an item from a `RecordArray`
* `AwkwardArray.Tuple`: scalar representation of an item from a `TupleArray` (note: not the same as `Base.Tuple`)

`UnionArray` has the following for dealing with specializations:

* `AwkwardArray.Specialization`: selects a `UnionArray` specialization for `push!`, `append!`, etc.

Finally, all `Content` subclasses can be converted with the following:

* `AwkwardArray.layout_for`: returns an appropriately-nested `Content` type for a given Julia type (`DataType`)
* `AwkwardArray.from_iter`: converts Julia data into an Awkward Array
* `AwkwardArray.to_vector`: converts an Awkward Array into Julia data
* `AwkwardArray.from_buffers`: constructs an Awkward Array from a Form (JSON), length, and buffers for zero-copy passing from Python
* `AwkwardArray.to_buffers`: deconstructs an Awkward Array into a Form (JSON), length, and buffers for zero-copy passing to Python
66 changes: 66 additions & 0 deletions docs/src/functions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
```@meta
CurrentModule = AwkwardArray
```
## List of [`Content`](@ref) functions

Every [`Content`](@ref) subclass has the following built-in functions:

* [`Base.length`](@ref)
* [`Base.size`](@ref) (1-tuple of `length`)
* [`Base.firstindex`](@ref), [`Base.lastindex`](@ref)(1-based or inherited from its index)
* [`Base.getindex`](@ref) select by `Int`(single item), `UnitRange{Int}`(slice), and `Symbol`(record field)
* [`Base.iterate`](@ref)
* [`Base.:(==)`](@ref) (equality defined by values: a [`ListOffsetArray`](@ref) and a [`ListArray`](@ref) may be considered the same)
* [`Base.push!`](@ref)
* [`Base.append!`](@ref)
* [`Base.show`](@ref)

They also have the following functions for manipulating and checking structure:

* [`AwkwardArray.parameters_of`](@ref) gets all parameters
* [`AwkwardArray.has_parameter`](@ref) returns true if a parameter exists
* [`AwkwardArray.get_parameter`](@ref) returns a parameter or raises an error
* [`AwkwardArray.with_parameter`](@ref) returns a copy of this node with a specified parameter
* [`AwkwardArray.copy`](@ref) shallow-copy of the array, allowing properties to be replaced
* [`AwkwardArray.is_valid`](@ref) verifies that the structure adheres to Awkward Array's protocol

They have the following functions for filling an array:

* [`AwkwardArray.end_list!`](@ref): closes off a [`ListType`](@ref) array ([`ListOffsetArray`](@ref), [`ListArray`](@ref), or [`RegularArray`](@ref)) in the manner of Python's [ak.ArrayBuilder](https://awkward-array.org/doc/main/reference/generated/ak.ArrayBuilder.html) (no `begin_list` is necessary)
* [`AwkwardArray.end_record!`](@ref) closes off a [`RecordArray`](@ref)
* [`AwkwardArray.end_tuple!`](@ref) closes off a [`TupleArray`](@ref)
* [`AwkwardArray.push_null!`](@ref) pushes a missing value onto [`OptionType`](@ref) arrays (`IndexedOptionArray`](@ref) [`ByteMaskedArray`](@ref) [`BitMaskedArray`](@ref) or [`UnmaskedArray`](@ref))
* [`AwkwardArray.push_dummy!`](@ref) pushes an unspecified value onto the array (used by [`ByteMaskedArray`](@ref) and [`BitMaskedArray`](@ref) which need to have a placeholder in memory behind each `missing` value)

[`RecordArray`](@ref)and [`TupleArray`](@ref) have the following for selecting fields (as opposed to rows):

* [`AwkwardArray.slot`](@ref) gets a [`RecordArray`](@ref)or [`TupleArray`](@ref) field, to avoid conflicts with [`Base.getindex`](@ref) for `TupleArrays` (both use integers to select a field)
* [`AwkwardArray.Record`](@ref) scalar representation of an item from a [`RecordArray`](@ref)
* [`AwkwardArray.SlotRecord`](@ref) scalar representation of an item from a [`TupleArray`](@ref)(note: not the same as `Base.Tuple`)

[`UnionArray`](@ref)has the following for dealing with specializations:

* [`AwkwardArray.Specialization`](@ref) selects a [`UnionArray`](@ref)specialization for [`push!`](@ref) [`append!`](@ref) etc.

Finally, all [`Content`](@ref)subclasses can be converted with the following:

* [`AwkwardArray.layout_for`](@ref) returns an appropriately-nested [`Content`](@ref)type for a given Julia type (`DataType`)
* [`AwkwardArray.from_iter`](@ref) converts Julia data into an Awkward Array
* [`AwkwardArray.to_vector`](@ref) converts an Awkward Array into Julia data
* [`AwkwardArray.from_buffers`](@ref) constructs an Awkward Array from a Form (JSON), length, and buffers for zero-copy passing from Python
* [`AwkwardArray.to_buffers`](@ref) deconstructs an Awkward Array into a Form (JSON), length, and buffers for zero-copy passing to Python


## Array functions

```@autodocs
Modules = [AwkwardArray]
Public = true
Order = [:function]
```

# Index

```@index
Pages = ["functions.md"]
```
13 changes: 13 additions & 0 deletions docs/src/indexing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
```@meta
CurrentModule = AwkwardArray
```

# Types

```@index
Pages = ["indexing.md"]
```

## Indexing

FIXME
13 changes: 13 additions & 0 deletions docs/src/internals.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
```@meta
CurrentModule = AwkwardArray
```

# Types

```@index
Pages = ["internals.md"]
```

## Internals

FIXME
68 changes: 68 additions & 0 deletions docs/src/types.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
```@meta
CurrentModule = AwkwardArray
```

# Types

## Array layout classes

In Python, we make a distinction between high-level `ak.Array` (for data analysts) and low-level `Content` memory layouts (for downstream developers). In Julia, it's more advantageous to expose the concrete type details to all users, particularly for defining functions with multiple dispatch. Thus, there is no `ak.Array` equivalent.

The layout classes (subclasses of `AwkwardArray.Content`) are:

| Julia class | corresponding Python | corresponding Arrow | description |
|:--|:--|:--|:--|
| [`PrimitiveArray`](@ref) | [NumpyArray](https://awkward-array.org/doc/main/reference/generated/ak.contents.NumpyArray.html) | [primitive](https://arrow.apache.org/docs/format/Columnar.html#fixed-size-primitive-layout) | one-dimensional array of booleans, numbers, date-times, or time-differences |
| [`EmptyArray`](@ref) | [EmptyArray](https://awkward-array.org/doc/main/reference/generated/ak.contents.EmptyArray.html) | _(none)_ | length-zero array with unknown type (usually derived from untyped sources) |
| [`ListOffsetArray`](@ref) | [ListOffsetArray](https://awkward-array.org/doc/main/reference/generated/ak.contents.ListOffsetArray.html) | [list](https://arrow.apache.org/docs/format/Columnar.html#variable-size-list-layout) | variable-length lists defined by an index of `offsets` |
| [`ListArray`](@ref) | [ListArray](https://awkward-array.org/doc/main/reference/generated/ak.contents.ListArray.html) | _(none)_ | variable-length lists defined by more general `starts` and `stops` indexes |
| [`RegularArray`](@ref) | [RegularArray](https://awkward-array.org/doc/main/reference/generated/ak.contents.RegularArray.html) | [fixed-size](https://arrow.apache.org/docs/format/Columnar.html#fixed-size-list-layout) | lists of uniform `size` |
| [`RecordArray`](@ref) | [RecordArray](https://awkward-array.org/doc/main/reference/generated/ak.contents.RecordArray.html) with `fields` | [struct](https://arrow.apache.org/docs/format/Columnar.html#struct-layout) | struct-like records with named fields of different types |
| [`TupleArray`](@ref) | [RecordArray](https://awkward-array.org/doc/main/reference/generated/ak.contents.RecordArray.html) with `fields=None` | _(none)_ | tuples of unnamed fields of different types |
| [`IndexedArray`](@ref) | [IndexedArray](https://awkward-array.org/doc/main/reference/generated/ak.contents.IndexedArray.html) | [dictionary](https://arrow.apache.org/docs/format/Columnar.html#dictionary-encoded-layout) | data that are lazily filtered, duplicated, and/or rearranged by an integer `index` |
| [`IndexedOptionArray`](@ref) | [IndexedOptionArray](https://awkward-array.org/doc/main/reference/generated/ak.contents.IndexedOptionArray.html) | _(none)_ | same but negative values in the `index` correspond to `Missing` values |
| [`ByteMaskedArray`](@ref) | [ByteMaskedArray](https://awkward-array.org/doc/main/reference/generated/ak.contents.ByteMaskedArray.html) | _(none)_ | possibly-missing data, defined by a byte `mask` |
| [`BitMaskedArray`](@ref) (only `lsb_order = true`) | [BitMaskedArray](https://awkward-array.org/doc/main/reference/generated/ak.contents.BitMaskedArray.html) | [bitmaps](https://arrow.apache.org/docs/format/Columnar.html#validity-bitmaps) | same, defined by a `BitVector` |
| [`UnmaskedArray`](@ref) | [UnmaskedArray](https://awkward-array.org/doc/main/reference/generated/ak.contents.UnmaskedArray.html) | same | in-principle missing data, but none are actually missing so no mask |
| [`UnionArray`](@ref) | [UnionArray](https://awkward-array.org/doc/main/reference/generated/ak.contents.UnionArray.html) | [dense union](https://arrow.apache.org/docs/format/Columnar.html#dense-union) | data of different types in the same array |

Any node in the data-type tree can carry `Dict{String,Any}` metadata as `parameters`, as well as a `behavior::Symbol` that can be used to define specialized behaviors. For instance, arrays of strings (constructed with `StringOffsetArray`, `StringArray`, or `StringRegularArray`) are defined by `behavior = :string` (instead of `behavior = :default`).

## Types specification

```@autodocs
Modules = [AwkwardArray]
Public = true
Order = [:type]
```

## Examples

```julia
julia> using AwkwardArray: StringOffsetArray

julia> array = StringOffsetArray()
0-element ListOffsetArray{Vector{Int64}, PrimitiveArray{UInt8, Vector{UInt8}, :char}, :string}

julia> append!(array, ["one", "two", "three", "four", "five"])
5-element ListOffsetArray{Vector{Int64}, PrimitiveArray{UInt8, Vector{UInt8}, :char}, :string}:
"one"
"two"
"three"
"four"
"five"

julia> array[3]
"three"

julia> typeof(array[3])
String
```

Most applications of `behavior` apply to `RecordArrays` (e.g. [Vector](https://github.com/scikit-hep/vector) in Python).

## Index

```@index
Pages = ["types.md"]
```
4 changes: 4 additions & 0 deletions src/AwkwardArray.jl
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
"""
Main module for `AwkwardArray.jl` -- an implementation of the Awkward Array data structures in Julia.
"""

module AwkwardArray

import JSON
Expand Down
Loading

0 comments on commit 3558465

Please sign in to comment.