Skip to content

Commit

Permalink
Merge pull request #12 from akleeman/DataView
Browse files Browse the repository at this point in the history
Stephan's sprintbattical
  • Loading branch information
akleeman committed Feb 21, 2014
2 parents 303b890 + 9488463 commit 6e0b12c
Show file tree
Hide file tree
Showing 28 changed files with 4,217 additions and 2,162 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,5 @@ nosetests.xml
.mr.developer.cfg
.project
.pydevproject

doc/_build
97 changes: 76 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,76 @@
scidata
=======

Objects for holding self describing scientific data in python. The goal of this project is to
provide a Common Data Model (http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/CDM/)
allowing users to read write and manipulate netcdf-like data without worrying about where the data
source lives. A dataset that is too large to fit in memory, served from an OpenDAP server, streamed
or stored as NetCDF3, NetCDF4, grib (?), HDF5 and others can all be inspected and manipulated using
the same methods.

Of course there are already several packages in python that offer similar functionality (netCDF4,
scipy.io, pupynere, iris, ... ) but each of those packages have their own shortcomings:

netCDF4
Doesn't allow streaming. If you want to create a new object it needs to live on disk.
scipy.io / pupynere
Only works with NetCDF3 and doesn't support DAP making it difficult to work with large datasets.
iris
is REALLY close to what this project will provide, but iris strays further from the CDM,
than I would like. (if you read then write a netcdf file using iris all global attributes
are pushed down to variable level attributes.
# xray: transparently manipulate scientific datasets in Python

**xray** is a Python package for working with aligned sets of homogeneous,
n-dimensional arrays. It implements flexible array operations and dataset
manipulation for in-memory datasets within the [Common Data Model][cdm] widely
used for self-describing scientific data (netCDF, OpenDAP, etc.).

***Warning: xray is still in its early development phase. Expect the API to
change.***

## Main Feaures

- A `DatasetArray` object that is compatible with NumPy's ndarray and ufuncs
but keeps ancilliary variables and metadata intact.
- Array broadcasting based on dimension names and coordinate indices
instead of only shapes.
- Flexible split-apply-combine functionality with the `Array.groupby` method
(patterned after [pandas][pandas]).
- Fast label-based indexing and (limited) time-series functionality built on
[pandas][pandas].

## Design Goals

- Provide a data analysis toolkit as fast and powerful as pandas but
designed for working with datasets of aligned, homogeneous N-dimensional
arrays.
- Whenever possible, build on top of and interoperate with pandas and the
rest of the awesome [scientific python stack][scipy].
- Provide a uniform API for loading and saving scientific data in a variety
of formats (including streaming data).
- Use metadata according to [conventions][cf] when appropriate, but don't
strictly enforce them. Conflicting attributes (e.g., units) should be
silently dropped instead of causing errors. The onus is on the user to
make sure that operations make sense.

## Prior Art

- [Iris][iris] (supported by the UK Met office) is a similar package
designed for working with geophysical datasets in Python. Iris provided
much of the inspiration for xray (e.g., xray's `DatasetArray` is largely
based on the Iris `Cube`), but it has several limitations that led us to
build xray instead of extending Iris:
1. Iris has essentially one first-class object (the `Cube`) on which it
attempts to build all functionality (`Coord` supports a much more
limited set of functionality). xray has its equivalent of the Cube
(the `DatasetArray` object), but it is only a thin wrapper on the more
primitive building blocks of Dataset and Array objects.
2. Iris has a strict interpretation of [CF conventions][cf], which,
although a principled choice, we have found to be impractical for
everyday uses. With Iris, every quantity has physical (SI) units, all
coordinates have cell-bounds, and all metadata (units, cell-bounds and
other attributes) is required to match before merging or doing
operations with on multiple cubes. This means that a lot of time with
Iris is spent figuring out why cubes are incompatible and explicitly
removing possibly conflicting metadata.
3. Iris can be slow and complex. Strictly interpretting metadata requires
a lot of work and (in our experience) can be difficult to build mental
models of how Iris functions work. Moreover, it means that a lot of
logic (e.g., constraint handling) uses non-vectorized operations. For
example, extracting all times within a range can be surprisingly slow
(e.g., 0.3 seconds vs 3 milliseconds in xray to select along a time
dimension with 10000 elements).
- [pandas][pandas] is fast and powerful but oriented around working with
tabular datasets. pandas has experimental N-dimensional panels, but they
don't support aligned math with other objects. We believe the
`DatasetArray`/ `Cube` model is better suited to working with scientific
datasets. We use pandas internally in xray to support fast indexing.
- [netCDF4-python][nc4] provides xray's primary interface for working with
netCDF and OpenDAP datasets.

[pandas]: http://pandas.pydata.org/
[cdm]: http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/CDM/
[cf]: http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.6/cf-conventions.html
[scipy]: http://scipy.org/
[nc4]: http://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4-module.html
[iris]: http://scitools.org.uk/iris/
177 changes: 177 additions & 0 deletions doc/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
# Makefile for Sphinx documentation
#

# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
PAPER =
BUILDDIR = _build

# User-friendly check for sphinx-build
ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1)
$(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/)
endif

# Internal variables.
PAPEROPT_a4 = -D latex_paper_size=a4
PAPEROPT_letter = -D latex_paper_size=letter
ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
# the i18n builder cannot share the environment and doctrees with the others
I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .

.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext

help:
@echo "Please use \`make <target>' where <target> is one of"
@echo " html to make standalone HTML files"
@echo " dirhtml to make HTML files named index.html in directories"
@echo " singlehtml to make a single large HTML file"
@echo " pickle to make pickle files"
@echo " json to make JSON files"
@echo " htmlhelp to make HTML files and a HTML help project"
@echo " qthelp to make HTML files and a qthelp project"
@echo " devhelp to make HTML files and a Devhelp project"
@echo " epub to make an epub"
@echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
@echo " latexpdf to make LaTeX files and run them through pdflatex"
@echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx"
@echo " text to make text files"
@echo " man to make manual pages"
@echo " texinfo to make Texinfo files"
@echo " info to make Texinfo files and run them through makeinfo"
@echo " gettext to make PO message catalogs"
@echo " changes to make an overview of all changed/added/deprecated items"
@echo " xml to make Docutils-native XML files"
@echo " pseudoxml to make pseudoxml-XML files for display purposes"
@echo " linkcheck to check all external links for integrity"
@echo " doctest to run all doctests embedded in the documentation (if enabled)"

clean:
rm -rf $(BUILDDIR)/*

html:
$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."

dirhtml:
$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."

singlehtml:
$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
@echo
@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."

pickle:
$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
@echo
@echo "Build finished; now you can process the pickle files."

json:
$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
@echo
@echo "Build finished; now you can process the JSON files."

htmlhelp:
$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
@echo
@echo "Build finished; now you can run HTML Help Workshop with the" \
".hhp project file in $(BUILDDIR)/htmlhelp."

qthelp:
$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
@echo
@echo "Build finished; now you can run "qcollectiongenerator" with the" \
".qhcp project file in $(BUILDDIR)/qthelp, like this:"
@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/scidata.qhcp"
@echo "To view the help file:"
@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/scidata.qhc"

devhelp:
$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
@echo
@echo "Build finished."
@echo "To view the help file:"
@echo "# mkdir -p $$HOME/.local/share/devhelp/scidata"
@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/scidata"
@echo "# devhelp"

epub:
$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
@echo
@echo "Build finished. The epub file is in $(BUILDDIR)/epub."

latex:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo
@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
@echo "Run \`make' in that directory to run these through (pdf)latex" \
"(use \`make latexpdf' here to do that automatically)."

latexpdf:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo "Running LaTeX files through pdflatex..."
$(MAKE) -C $(BUILDDIR)/latex all-pdf
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."

latexpdfja:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo "Running LaTeX files through platex and dvipdfmx..."
$(MAKE) -C $(BUILDDIR)/latex all-pdf-ja
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."

text:
$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
@echo
@echo "Build finished. The text files are in $(BUILDDIR)/text."

man:
$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
@echo
@echo "Build finished. The manual pages are in $(BUILDDIR)/man."

texinfo:
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
@echo
@echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
@echo "Run \`make' in that directory to run these through makeinfo" \
"(use \`make info' here to do that automatically)."

info:
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
@echo "Running Texinfo files through makeinfo..."
make -C $(BUILDDIR)/texinfo info
@echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."

gettext:
$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
@echo
@echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."

changes:
$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
@echo
@echo "The overview file is in $(BUILDDIR)/changes."

linkcheck:
$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
@echo
@echo "Link check complete; look for any errors in the above output " \
"or in $(BUILDDIR)/linkcheck/output.txt."

doctest:
$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
@echo "Testing of doctests in the sources finished, look at the " \
"results in $(BUILDDIR)/doctest/output.txt."

xml:
$(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml
@echo
@echo "Build finished. The XML files are in $(BUILDDIR)/xml."

pseudoxml:
$(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml
@echo
@echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml."
Loading

0 comments on commit 6e0b12c

Please sign in to comment.