Future Data Items

This page describes a feature planned for a future version of Nion Swift.

Data Items Version 2

The second version of data items have expanded capabilities, incorporating ideas from HDF5, Pandas, HyperSpy, xarray, and other libraries.

As before, data items should be fully mappable to both HDF5 and a simpler JSON + Numpy directory structure or zip file.

The new data items can support more complex organization. Key features include:

Numeric value structures: scalar, complex, rgb, rgba, vector, tensor.
General value structures including strings and timestamps.
Data dimensions 0d, 1d, 2d, and 3d.
Data organized into dimension sets, 1d, 2d, or 3d each.
Data stored contiguously or sparsely.
Datum dimension set corresponds to the final enclosed dimension set.
Collection/sequence dimension sets correspond to enclosing dimension sets.
Hierarchical organization into arrays, lists, structures and dictionaries.
Sub-views of data within hierarchy.
Intensity scales specified by formula.
Dimension scales specified by formula or coordinates. Sharable.
Arbitrary number of intensity scales attached to datum dimension set.
Arbitrary number of dimensional scales attached to each dimension.
Calibrations adhering to a unit standard.
Reference frames as list of dimension scales attached to dimension sets. Shareable.
Numeric data types, strings, timestamps, and references within data item.
Efficient conversions to various Python structures: numpy, pandas, xarray.
Optionally include schema at various levels of organization.
- Schema can separately describe recommended displays and reductions.
Attachable storage handler, ndata, hdf5, zarr, etc.
- Supports partial paging to memory/disk
Fully observable (data, properties, insert/remove, etc.)
Data item is an interface with storage and memory drivers for implementation
Storage and memory drivers should have the ability to:
- Be local or remote with optimizations
- Slice and reduce optimizations
- Asynchronous access that may not occur instantly
- Pipelined updates (partial data updates)
- Use the GPU
- Specify a primary in-memory storage mechanism (numpy array, pandas table, h5py memory mapped, gpu, etc.)
Improved definition of formal vs informal attributes.
Formal attributes:
- Units (nm, ms, etc.), dimension scales, quantity type (length, time, etc.), reference frames
- Domain (time or space vs frequency)
- Provenance
- Validity/timestamp of arbitrary data slices
  - Part of data from one scan, part from another.
- Timestamps and timezone.
API
- Fall back to old API when possible.
- Improved indexing (xarray data.loc[calibrated]) (TBD).

Proposed Migration

Merge data item and data and metadata objects
Define better terminology and use it
Deprecate old methods and eliminate use within Nion libraries

HyperSpy

How is calibration supported?
Indexing is interesting
Possible data ordering issue
Not sure if it supports 5D data

Pandas

Only supports 1D Series and 2D DataFrame
No support for calibration info.

Tensorflow

Ragged lists

xarray

DataArray is very close to DataAndMetadata
Does not support data loading/unloading
Does not support generated coordinates -- coordinates are explicit array
Missing intensity calibration?
DataSets have common dimensions.
Includes dimension names, coordinates

zarr

Storage

Provide feedback

Saved searches

Use saved searches to filter your results more quickly