Skip to content

Commit

Permalink
small changes.
Browse files Browse the repository at this point in the history
  • Loading branch information
dcherian committed Sep 28, 2019
1 parent 8df0ca4 commit 0b6045b
Show file tree
Hide file tree
Showing 3 changed files with 13 additions and 13 deletions.
16 changes: 8 additions & 8 deletions doc/dask.rst
Original file line number Diff line number Diff line change
Expand Up @@ -87,10 +87,8 @@ for the full disclaimer). By default, :py:meth:`~xarray.open_mfdataset` will chu
netCDF file into a single Dask array; again, supply the ``chunks`` argument to
control the size of the resulting Dask arrays. In more complex cases, you can
open each file individually using :py:meth:`~xarray.open_dataset` and merge the result, as
described in :ref:`combining data`. If you have a distributed cluster running,
passing the keyword argument ``parallel=True`` to :py:meth:`~xarray.open_mfdataset`
will speed up the reading of large multi-file datasets by executing those read tasks
in parallel using ``dask.delayed``.
described in :ref:`combining data`. Passing the keyword argument ``parallel=True`` to :py:meth:`~xarray.open_mfdataset` will speed up the reading of large multi-file datasets by
executing those read tasks in parallel using ``dask.delayed``.

You'll notice that printing a dataset still shows a preview of array values,
even if they are actually Dask arrays. We can do this quickly with Dask because
Expand Down Expand Up @@ -157,6 +155,12 @@ explicit conversion step. One notable exception is indexing operations: to
enable label based indexing, xarray will automatically load coordinate labels
into memory.

.. tip::

By default, dask uses its multi-threaded scheduler, which distributes work across
multiple cores and allows for processing some datasets that do not fit into memory.
For running across a cluster, `setup the distributed scheduler <https://docs.dask.org/en/latest/setup.html>`_.

The easiest way to convert an xarray data structure from lazy Dask arrays into
*eager*, in-memory NumPy arrays is to use the :py:meth:`~xarray.Dataset.load` method:

Expand Down Expand Up @@ -417,7 +421,3 @@ With analysis pipelines involving both spatial subsetting and temporal resamplin

6. The dask `diagnostics <https://docs.dask.org/en/latest/understanding-performance.html>`_ can be
useful in identifying performance bottlenecks.

7. Installing the optional `bottleneck <https://github.com/kwgoodman/bottleneck>`_ library
will result in greatly reduced memory usage when using :py:meth:`~xarray.Dataset.rolling`
on dask arrays,
6 changes: 3 additions & 3 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ intuitive, more concise, and less error-prone developer experience.
The package includes a large and growing library of domain-agnostic functions
for advanced analytics and visualization with these data structures.

Xarray is particularly tailored to working with netCDF_ files, which were the
Xarray is inspired by and borrows heavily from pandas_, the popular data
analysis package focused on labelled tabular data.
It is particularly tailored to working with netCDF_ files, which were the
source of xarray's data model, and integrates tightly with dask_ for parallel
computing.
It is inspired by and borrows heavily from pandas_, the popular data
analysis package focused on labelled tabular data.

.. _NumPy: http://www.numpy.org
.. _pandas: http://pandas.pydata.org
Expand Down
4 changes: 2 additions & 2 deletions xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -613,7 +613,7 @@ def sizes(self) -> Mapping[Hashable, int]:
"""
return self.dims

def load(self: T, **kwargs) -> T:
def load(self, **kwargs) -> "Dataset":
"""Manually trigger loading and/or computation of this dataset's data
from disk or a remote source into memory and return this dataset.
Unlike compute, the original dataset is modified and returned.
Expand Down Expand Up @@ -771,7 +771,7 @@ def _dask_postpersist(dsk, info, *args):

return Dataset._construct_direct(variables, *args)

def compute(self: T, **kwargs) -> T:
def compute(self, **kwargs) -> "Dataset":
"""Manually trigger loading and/or computation of this dataset's data
from disk or a remote source into memory and return a new dataset.
Unlike load, the original dataset is left unaltered.
Expand Down

0 comments on commit 0b6045b

Please sign in to comment.