Merge pull request #12 from akleeman/DataView

Stephan's sprintbattical
pydata · Feb 21, 2014 · 6e0b12c · 6e0b12c
2 parents 303b890 + 9488463
commit 6e0b12c
Show file tree

Hide file tree

Showing 28 changed files with 4,217 additions and 2,162 deletions.
diff --git a/.gitignore b/.gitignore
@@ -33,3 +33,5 @@ nosetests.xml
 .mr.developer.cfg
 .project
 .pydevproject
+
+doc/_build
diff --git a/README.md b/README.md
@@ -1,21 +1,76 @@
-scidata
-=======
-
-Objects for holding self describing scientific data in python.  The goal of this project is to
-provide a Common Data Model (http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/CDM/)
-allowing users to read write and manipulate netcdf-like data without worrying about where the data 
-source lives. A dataset that is too large to fit in memory, served from an OpenDAP server, streamed 
-or stored as NetCDF3, NetCDF4, grib (?), HDF5 and others can all be inspected and manipulated using 
-the same methods.
-
-Of course there are already several packages in python that offer similar functionality (netCDF4, 
-scipy.io, pupynere, iris, ... ) but each of those packages have their own shortcomings:
-
-netCDF4
-    Doesn't allow streaming.  If you want to create a new object it needs to live on disk.
-scipy.io / pupynere
-    Only works with NetCDF3 and doesn't support DAP making it difficult to work with large datasets.
-iris
-    is REALLY close to what this project will provide, but iris strays further from the CDM,
-    than I would like. (if you read then write a netcdf file using iris all global attributes 
-    are pushed down to variable level attributes.
+# xray: transparently manipulate scientific datasets in Python
+
+**xray** is a Python package for working with aligned sets of homogeneous,
+n-dimensional arrays. It implements flexible array operations and dataset
+manipulation for in-memory datasets within the [Common Data Model][cdm] widely
+used for self-describing scientific data (netCDF, OpenDAP, etc.).
+
+***Warning: xray is still in its early development phase. Expect the API to
+change.***
+
+## Main Feaures
+
+  - A `DatasetArray` object that is compatible with NumPy's ndarray and ufuncs
+    but keeps ancilliary variables and metadata intact.
+  - Array broadcasting based on dimension names and coordinate indices
+    instead of only shapes.
+  - Flexible split-apply-combine functionality with the `Array.groupby` method
+    (patterned after [pandas][pandas]).
+  - Fast label-based indexing and (limited) time-series functionality built on
+    [pandas][pandas].
+
+## Design Goals
+
+  - Provide a data analysis toolkit as fast and powerful as pandas but
+    designed for working with datasets of aligned, homogeneous N-dimensional
+    arrays.
+  - Whenever possible, build on top of and interoperate with pandas and the
+    rest of the awesome [scientific python stack][scipy].
+  - Provide a uniform API for loading and saving scientific data in a variety
+    of formats (including streaming data).
+  - Use metadata according to [conventions][cf] when appropriate, but don't
+    strictly enforce them. Conflicting attributes (e.g., units) should be
+    silently dropped instead of causing errors. The onus is on the user to
+    make sure that operations make sense.
+
+## Prior Art
+
+  - [Iris][iris] (supported by the UK Met office) is a similar package
+    designed for working with geophysical datasets in Python. Iris provided
+    much of the inspiration for xray (e.g., xray's `DatasetArray` is largely
+    based on the Iris `Cube`), but it has several limitations that led us to
+    build xray instead of extending Iris:
+    1. Iris has essentially one first-class object (the `Cube`) on which it
+       attempts to build all functionality (`Coord` supports a much more
+       limited set of functionality). xray has its equivalent of the Cube
+       (the `DatasetArray` object), but it is only a thin wrapper on the more
+       primitive building blocks of Dataset and Array objects.
+    2. Iris has a strict interpretation of [CF conventions][cf], which,
+       although a principled choice, we have found to be impractical for
+       everyday uses. With Iris, every quantity has physical (SI) units, all
+       coordinates have cell-bounds, and all metadata (units, cell-bounds and
+       other attributes) is required to match before merging or doing
+       operations with on multiple cubes. This means that a lot of time with
+       Iris is spent figuring out why cubes are incompatible and explicitly
+       removing possibly conflicting metadata.
+    3. Iris can be slow and complex. Strictly interpretting metadata requires
+       a lot of work and (in our experience) can be difficult to build mental
+       models of how Iris functions work. Moreover, it means that a lot of
+       logic (e.g., constraint handling) uses non-vectorized operations. For
+       example, extracting all times within a range can be surprisingly slow
+       (e.g., 0.3 seconds vs 3 milliseconds in xray to select along a time
+       dimension with 10000 elements).
+  - [pandas][pandas] is fast and powerful but oriented around working with
+    tabular datasets. pandas has experimental N-dimensional panels, but they
+    don't support aligned math with other objects. We believe the
+    `DatasetArray`/ `Cube` model is better suited to working with scientific
+    datasets. We use pandas internally in xray to support fast indexing.
+  - [netCDF4-python][nc4] provides xray's primary interface for working with
+    netCDF and OpenDAP datasets.
+
+[pandas]: http://pandas.pydata.org/
+[cdm]: http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/CDM/
+[cf]: http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.6/cf-conventions.html
+[scipy]: http://scipy.org/
+[nc4]: http://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4-module.html
+[iris]: http://scitools.org.uk/iris/
diff --git a/doc/Makefile b/doc/Makefile
@@ -0,0 +1,177 @@
+# Makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line.
+SPHINXOPTS    =
+SPHINXBUILD   = sphinx-build
+PAPER         =
+BUILDDIR      = _build
+
+# User-friendly check for sphinx-build
+ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1)
+$(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/)
+endif
+
+# Internal variables.
+PAPEROPT_a4     = -D latex_paper_size=a4
+PAPEROPT_letter = -D latex_paper_size=letter
+ALLSPHINXOPTS   = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
+# the i18n builder cannot share the environment and doctrees with the others
+I18NSPHINXOPTS  = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
+
+.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext
+
+help:
+	@echo "Please use \`make <target>' where <target> is one of"
+	@echo "  html       to make standalone HTML files"
+	@echo "  dirhtml    to make HTML files named index.html in directories"
+	@echo "  singlehtml to make a single large HTML file"
+	@echo "  pickle     to make pickle files"
+	@echo "  json       to make JSON files"
+	@echo "  htmlhelp   to make HTML files and a HTML help project"
+	@echo "  qthelp     to make HTML files and a qthelp project"
+	@echo "  devhelp    to make HTML files and a Devhelp project"
+	@echo "  epub       to make an epub"
+	@echo "  latex      to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
+	@echo "  latexpdf   to make LaTeX files and run them through pdflatex"
+	@echo "  latexpdfja to make LaTeX files and run them through platex/dvipdfmx"
+	@echo "  text       to make text files"
+	@echo "  man        to make manual pages"
+	@echo "  texinfo    to make Texinfo files"
+	@echo "  info       to make Texinfo files and run them through makeinfo"
+	@echo "  gettext    to make PO message catalogs"
+	@echo "  changes    to make an overview of all changed/added/deprecated items"
+	@echo "  xml        to make Docutils-native XML files"
+	@echo "  pseudoxml  to make pseudoxml-XML files for display purposes"
+	@echo "  linkcheck  to check all external links for integrity"
+	@echo "  doctest    to run all doctests embedded in the documentation (if enabled)"
+
+clean:
+	rm -rf $(BUILDDIR)/*
+
+html:
+	$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
+	@echo
+	@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
+
+dirhtml:
+	$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
+	@echo
+	@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
+
+singlehtml:
+	$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
+	@echo
+	@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
+
+pickle:
+	$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
+	@echo
+	@echo "Build finished; now you can process the pickle files."
+
+json:
+	$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
+	@echo
+	@echo "Build finished; now you can process the JSON files."
+
+htmlhelp:
+	$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
+	@echo
+	@echo "Build finished; now you can run HTML Help Workshop with the" \
+	      ".hhp project file in $(BUILDDIR)/htmlhelp."
+
+qthelp:
+	$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
+	@echo
+	@echo "Build finished; now you can run "qcollectiongenerator" with the" \
+	      ".qhcp project file in $(BUILDDIR)/qthelp, like this:"
+	@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/scidata.qhcp"
+	@echo "To view the help file:"
+	@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/scidata.qhc"
+
+devhelp:
+	$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
+	@echo
+	@echo "Build finished."
+	@echo "To view the help file:"
+	@echo "# mkdir -p $$HOME/.local/share/devhelp/scidata"
+	@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/scidata"
+	@echo "# devhelp"
+
+epub:
+	$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
+	@echo
+	@echo "Build finished. The epub file is in $(BUILDDIR)/epub."
+
+latex:
+	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
+	@echo
+	@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
+	@echo "Run \`make' in that directory to run these through (pdf)latex" \
+	      "(use \`make latexpdf' here to do that automatically)."
+
+latexpdf:
+	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
+	@echo "Running LaTeX files through pdflatex..."
+	$(MAKE) -C $(BUILDDIR)/latex all-pdf
+	@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
+
+latexpdfja:
+	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
+	@echo "Running LaTeX files through platex and dvipdfmx..."
+	$(MAKE) -C $(BUILDDIR)/latex all-pdf-ja
+	@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
+
+text:
+	$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
+	@echo
+	@echo "Build finished. The text files are in $(BUILDDIR)/text."
+
+man:
+	$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
+	@echo
+	@echo "Build finished. The manual pages are in $(BUILDDIR)/man."
+
+texinfo:
+	$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
+	@echo
+	@echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
+	@echo "Run \`make' in that directory to run these through makeinfo" \
+	      "(use \`make info' here to do that automatically)."
+
+info:
+	$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
+	@echo "Running Texinfo files through makeinfo..."
+	make -C $(BUILDDIR)/texinfo info
+	@echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
+
+gettext:
+	$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
+	@echo
+	@echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
+
+changes:
+	$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
+	@echo
+	@echo "The overview file is in $(BUILDDIR)/changes."
+
+linkcheck:
+	$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
+	@echo
+	@echo "Link check complete; look for any errors in the above output " \
+	      "or in $(BUILDDIR)/linkcheck/output.txt."
+
+doctest:
+	$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
+	@echo "Testing of doctests in the sources finished, look at the " \
+	      "results in $(BUILDDIR)/doctest/output.txt."
+
+xml:
+	$(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml
+	@echo
+	@echo "Build finished. The XML files are in $(BUILDDIR)/xml."
+
+pseudoxml:
+	$(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml
+	@echo
+	@echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml."