-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #12 from akleeman/DataView
Stephan's sprintbattical
- Loading branch information
Showing
28 changed files
with
4,217 additions
and
2,162 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -33,3 +33,5 @@ nosetests.xml | |
.mr.developer.cfg | ||
.project | ||
.pydevproject | ||
|
||
doc/_build |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,21 +1,76 @@ | ||
scidata | ||
======= | ||
|
||
Objects for holding self describing scientific data in python. The goal of this project is to | ||
provide a Common Data Model (http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/CDM/) | ||
allowing users to read write and manipulate netcdf-like data without worrying about where the data | ||
source lives. A dataset that is too large to fit in memory, served from an OpenDAP server, streamed | ||
or stored as NetCDF3, NetCDF4, grib (?), HDF5 and others can all be inspected and manipulated using | ||
the same methods. | ||
|
||
Of course there are already several packages in python that offer similar functionality (netCDF4, | ||
scipy.io, pupynere, iris, ... ) but each of those packages have their own shortcomings: | ||
|
||
netCDF4 | ||
Doesn't allow streaming. If you want to create a new object it needs to live on disk. | ||
scipy.io / pupynere | ||
Only works with NetCDF3 and doesn't support DAP making it difficult to work with large datasets. | ||
iris | ||
is REALLY close to what this project will provide, but iris strays further from the CDM, | ||
than I would like. (if you read then write a netcdf file using iris all global attributes | ||
are pushed down to variable level attributes. | ||
# xray: transparently manipulate scientific datasets in Python | ||
|
||
**xray** is a Python package for working with aligned sets of homogeneous, | ||
n-dimensional arrays. It implements flexible array operations and dataset | ||
manipulation for in-memory datasets within the [Common Data Model][cdm] widely | ||
used for self-describing scientific data (netCDF, OpenDAP, etc.). | ||
|
||
***Warning: xray is still in its early development phase. Expect the API to | ||
change.*** | ||
|
||
## Main Feaures | ||
|
||
- A `DatasetArray` object that is compatible with NumPy's ndarray and ufuncs | ||
but keeps ancilliary variables and metadata intact. | ||
- Array broadcasting based on dimension names and coordinate indices | ||
instead of only shapes. | ||
- Flexible split-apply-combine functionality with the `Array.groupby` method | ||
(patterned after [pandas][pandas]). | ||
- Fast label-based indexing and (limited) time-series functionality built on | ||
[pandas][pandas]. | ||
|
||
## Design Goals | ||
|
||
- Provide a data analysis toolkit as fast and powerful as pandas but | ||
designed for working with datasets of aligned, homogeneous N-dimensional | ||
arrays. | ||
- Whenever possible, build on top of and interoperate with pandas and the | ||
rest of the awesome [scientific python stack][scipy]. | ||
- Provide a uniform API for loading and saving scientific data in a variety | ||
of formats (including streaming data). | ||
- Use metadata according to [conventions][cf] when appropriate, but don't | ||
strictly enforce them. Conflicting attributes (e.g., units) should be | ||
silently dropped instead of causing errors. The onus is on the user to | ||
make sure that operations make sense. | ||
|
||
## Prior Art | ||
|
||
- [Iris][iris] (supported by the UK Met office) is a similar package | ||
designed for working with geophysical datasets in Python. Iris provided | ||
much of the inspiration for xray (e.g., xray's `DatasetArray` is largely | ||
based on the Iris `Cube`), but it has several limitations that led us to | ||
build xray instead of extending Iris: | ||
1. Iris has essentially one first-class object (the `Cube`) on which it | ||
attempts to build all functionality (`Coord` supports a much more | ||
limited set of functionality). xray has its equivalent of the Cube | ||
(the `DatasetArray` object), but it is only a thin wrapper on the more | ||
primitive building blocks of Dataset and Array objects. | ||
2. Iris has a strict interpretation of [CF conventions][cf], which, | ||
although a principled choice, we have found to be impractical for | ||
everyday uses. With Iris, every quantity has physical (SI) units, all | ||
coordinates have cell-bounds, and all metadata (units, cell-bounds and | ||
other attributes) is required to match before merging or doing | ||
operations with on multiple cubes. This means that a lot of time with | ||
Iris is spent figuring out why cubes are incompatible and explicitly | ||
removing possibly conflicting metadata. | ||
3. Iris can be slow and complex. Strictly interpretting metadata requires | ||
a lot of work and (in our experience) can be difficult to build mental | ||
models of how Iris functions work. Moreover, it means that a lot of | ||
logic (e.g., constraint handling) uses non-vectorized operations. For | ||
example, extracting all times within a range can be surprisingly slow | ||
(e.g., 0.3 seconds vs 3 milliseconds in xray to select along a time | ||
dimension with 10000 elements). | ||
- [pandas][pandas] is fast and powerful but oriented around working with | ||
tabular datasets. pandas has experimental N-dimensional panels, but they | ||
don't support aligned math with other objects. We believe the | ||
`DatasetArray`/ `Cube` model is better suited to working with scientific | ||
datasets. We use pandas internally in xray to support fast indexing. | ||
- [netCDF4-python][nc4] provides xray's primary interface for working with | ||
netCDF and OpenDAP datasets. | ||
|
||
[pandas]: http://pandas.pydata.org/ | ||
[cdm]: http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/CDM/ | ||
[cf]: http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.6/cf-conventions.html | ||
[scipy]: http://scipy.org/ | ||
[nc4]: http://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4-module.html | ||
[iris]: http://scitools.org.uk/iris/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,177 @@ | ||
# Makefile for Sphinx documentation | ||
# | ||
|
||
# You can set these variables from the command line. | ||
SPHINXOPTS = | ||
SPHINXBUILD = sphinx-build | ||
PAPER = | ||
BUILDDIR = _build | ||
|
||
# User-friendly check for sphinx-build | ||
ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1) | ||
$(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/) | ||
endif | ||
|
||
# Internal variables. | ||
PAPEROPT_a4 = -D latex_paper_size=a4 | ||
PAPEROPT_letter = -D latex_paper_size=letter | ||
ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . | ||
# the i18n builder cannot share the environment and doctrees with the others | ||
I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . | ||
|
||
.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext | ||
|
||
help: | ||
@echo "Please use \`make <target>' where <target> is one of" | ||
@echo " html to make standalone HTML files" | ||
@echo " dirhtml to make HTML files named index.html in directories" | ||
@echo " singlehtml to make a single large HTML file" | ||
@echo " pickle to make pickle files" | ||
@echo " json to make JSON files" | ||
@echo " htmlhelp to make HTML files and a HTML help project" | ||
@echo " qthelp to make HTML files and a qthelp project" | ||
@echo " devhelp to make HTML files and a Devhelp project" | ||
@echo " epub to make an epub" | ||
@echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" | ||
@echo " latexpdf to make LaTeX files and run them through pdflatex" | ||
@echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx" | ||
@echo " text to make text files" | ||
@echo " man to make manual pages" | ||
@echo " texinfo to make Texinfo files" | ||
@echo " info to make Texinfo files and run them through makeinfo" | ||
@echo " gettext to make PO message catalogs" | ||
@echo " changes to make an overview of all changed/added/deprecated items" | ||
@echo " xml to make Docutils-native XML files" | ||
@echo " pseudoxml to make pseudoxml-XML files for display purposes" | ||
@echo " linkcheck to check all external links for integrity" | ||
@echo " doctest to run all doctests embedded in the documentation (if enabled)" | ||
|
||
clean: | ||
rm -rf $(BUILDDIR)/* | ||
|
||
html: | ||
$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html | ||
@echo | ||
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html." | ||
|
||
dirhtml: | ||
$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml | ||
@echo | ||
@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml." | ||
|
||
singlehtml: | ||
$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml | ||
@echo | ||
@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml." | ||
|
||
pickle: | ||
$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle | ||
@echo | ||
@echo "Build finished; now you can process the pickle files." | ||
|
||
json: | ||
$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json | ||
@echo | ||
@echo "Build finished; now you can process the JSON files." | ||
|
||
htmlhelp: | ||
$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp | ||
@echo | ||
@echo "Build finished; now you can run HTML Help Workshop with the" \ | ||
".hhp project file in $(BUILDDIR)/htmlhelp." | ||
|
||
qthelp: | ||
$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp | ||
@echo | ||
@echo "Build finished; now you can run "qcollectiongenerator" with the" \ | ||
".qhcp project file in $(BUILDDIR)/qthelp, like this:" | ||
@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/scidata.qhcp" | ||
@echo "To view the help file:" | ||
@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/scidata.qhc" | ||
|
||
devhelp: | ||
$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp | ||
@echo | ||
@echo "Build finished." | ||
@echo "To view the help file:" | ||
@echo "# mkdir -p $$HOME/.local/share/devhelp/scidata" | ||
@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/scidata" | ||
@echo "# devhelp" | ||
|
||
epub: | ||
$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub | ||
@echo | ||
@echo "Build finished. The epub file is in $(BUILDDIR)/epub." | ||
|
||
latex: | ||
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex | ||
@echo | ||
@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex." | ||
@echo "Run \`make' in that directory to run these through (pdf)latex" \ | ||
"(use \`make latexpdf' here to do that automatically)." | ||
|
||
latexpdf: | ||
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex | ||
@echo "Running LaTeX files through pdflatex..." | ||
$(MAKE) -C $(BUILDDIR)/latex all-pdf | ||
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." | ||
|
||
latexpdfja: | ||
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex | ||
@echo "Running LaTeX files through platex and dvipdfmx..." | ||
$(MAKE) -C $(BUILDDIR)/latex all-pdf-ja | ||
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." | ||
|
||
text: | ||
$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text | ||
@echo | ||
@echo "Build finished. The text files are in $(BUILDDIR)/text." | ||
|
||
man: | ||
$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man | ||
@echo | ||
@echo "Build finished. The manual pages are in $(BUILDDIR)/man." | ||
|
||
texinfo: | ||
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo | ||
@echo | ||
@echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo." | ||
@echo "Run \`make' in that directory to run these through makeinfo" \ | ||
"(use \`make info' here to do that automatically)." | ||
|
||
info: | ||
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo | ||
@echo "Running Texinfo files through makeinfo..." | ||
make -C $(BUILDDIR)/texinfo info | ||
@echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo." | ||
|
||
gettext: | ||
$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale | ||
@echo | ||
@echo "Build finished. The message catalogs are in $(BUILDDIR)/locale." | ||
|
||
changes: | ||
$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes | ||
@echo | ||
@echo "The overview file is in $(BUILDDIR)/changes." | ||
|
||
linkcheck: | ||
$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck | ||
@echo | ||
@echo "Link check complete; look for any errors in the above output " \ | ||
"or in $(BUILDDIR)/linkcheck/output.txt." | ||
|
||
doctest: | ||
$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest | ||
@echo "Testing of doctests in the sources finished, look at the " \ | ||
"results in $(BUILDDIR)/doctest/output.txt." | ||
|
||
xml: | ||
$(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml | ||
@echo | ||
@echo "Build finished. The XML files are in $(BUILDDIR)/xml." | ||
|
||
pseudoxml: | ||
$(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml | ||
@echo | ||
@echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml." |
Oops, something went wrong.