Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: indexing with broadcasting #1473

Closed
wants to merge 128 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
128 commits
Select commit Hold shift + click to select a range
105bd64
Implemented `_broadcast_indexes` in Variable.py
fujiisoup Jul 7, 2017
23b4fe0
Diagonal indexing for Variable.
fujiisoup Jul 10, 2017
726ba5d
update _broadcast_indexes. update tests.
fujiisoup Jul 12, 2017
df7011f
Support basic boolean indexing.
fujiisoup Jul 15, 2017
f9232cb
tests for dask-based Variable
fujiisoup Jul 16, 2017
17b6465
Explicitly mark xfail flags
fujiisoup Jul 16, 2017
33c51d3
orthogonal indexing for dask.
fujiisoup Jul 16, 2017
03a336f
Refactor DaskArrayAdapter
shoyer Jul 16, 2017
d5af395
Merge pull request #1 from shoyer/indexing_broadcasting
fujiisoup Jul 17, 2017
866de91
Added MissingDimensionsError. Improve DaskIndexingAdapter, Variable._…
fujiisoup Jul 17, 2017
08e7444
use `np.arange(*slice.indices(size))` rather than `np.arange(size)[sl…
fujiisoup Jul 17, 2017
84afc98
Merge branch 'master' into indexing_broadcasting
fujiisoup Jul 17, 2017
7b33269
Add orthogonalize_indexers
fujiisoup Jul 20, 2017
50ea56e
A bug fix.
fujiisoup Jul 20, 2017
bac0089
Working with LazilyIndexedArray
fujiisoup Jul 20, 2017
1206c28
Fix in LazilyIndexedArray.
fujiisoup Jul 20, 2017
c2747be
add @requires_dask in test_variable
fujiisoup Jul 20, 2017
0671f39
rename orthogonalize_indexers -> unbroadcast_indexers
fujiisoup Jul 21, 2017
ffccff1
Wrap LazilyIndexedArray so that it accepts broadcasted-indexers
fujiisoup Jul 21, 2017
becf539
small rename
fujiisoup Jul 21, 2017
1ae4b4c
Another small fix
fujiisoup Jul 21, 2017
1967bf5
Remove unused function.
fujiisoup Jul 21, 2017
c2eeff3
Added _broadcast_indexes_1vector
fujiisoup Jul 23, 2017
df12c04
Minor fix
fujiisoup Jul 23, 2017
5ba367d
Avoid doubly wrapping by LazilyIndexedArray
fujiisoup Jul 23, 2017
d25c1f1
General orthogonal indexing for dask array.
fujiisoup Jul 23, 2017
0115994
Added base class IndexableArrayAdapter
fujiisoup Jul 24, 2017
1b4e854
Deprecate _unbroadcast_indexers and support IndexerTuple classes
fujiisoup Jul 25, 2017
36d052f
removed unintended prints.
fujiisoup Jul 25, 2017
9bd53ca
Some clean up.
fujiisoup Jul 25, 2017
563cafa
Some small fix.
fujiisoup Jul 25, 2017
1712060
Care for boolean array.
fujiisoup Jul 25, 2017
884423a
Always map boolean index to integer array.
fujiisoup Jul 25, 2017
c2e6f42
Takes care of boolean index in test_indexing
fujiisoup Jul 25, 2017
002eafa
replace self.assertTrue by assert
fujiisoup Jul 25, 2017
eedfb3f
Fix based on shoyer's comments.
fujiisoup Jul 29, 2017
bb2e515
Added `to_tuple()` method to IndexerTuple class.
fujiisoup Jul 29, 2017
5983a67
Removed: 'orthogonal_indexer', 'canonicalize_indexer'
fujiisoup Jul 29, 2017
7a5ff79
update IndexVariable.__getitem__
fujiisoup Jul 29, 2017
0b559bc
Made to_tuple function.
fujiisoup Jul 30, 2017
bad828e
BASIC_INDEXING_TYPES
fujiisoup Jul 30, 2017
a821a2b
Removed unused function from tests.
fujiisoup Jul 30, 2017
6550880
assert -> raise
fujiisoup Jul 30, 2017
464e711
Update Dataset.isel
fujiisoup Jul 30, 2017
7dd171d
Use `merge_variables` in checking the consistency.
fujiisoup Jul 30, 2017
e8f006b
Cleanup Dataset.__getitem__
shoyer Jul 30, 2017
a8ec82b
Add comment about why align() is unneeded
shoyer Jul 30, 2017
32749d4
Ensure correct tests are run in test_variable.py
shoyer Jul 31, 2017
31401d4
Support pointwise indexing with dask
shoyer Jul 31, 2017
f42ddfd
Merge pull request #2 from shoyer/indexing_broadcasting
fujiisoup Aug 5, 2017
8d96ad3
Add a vindex routine for np.ndarray
shoyer Aug 6, 2017
19f7204
Add an OrderedSet to xarray.core.utils
shoyer Aug 6, 2017
a8f60ba
Support dask and numpy vindex with one path
shoyer Aug 6, 2017
69f8570
Fix test failures
shoyer Aug 6, 2017
5eb00b7
working with `Dataset.sel`
fujiisoup Jul 30, 2017
d133766
Added more tests
fujiisoup Aug 6, 2017
631f6e9
Changes per review
shoyer Aug 7, 2017
72587de
Merge pull request #3 from shoyer/indexing_broadcasting
fujiisoup Aug 7, 2017
3231445
Restore `isel_points`. Remove automatic tuple conversion for `sel`
fujiisoup Aug 7, 2017
dd325c5
Some clean up
fujiisoup Aug 7, 2017
434a004
Supported indexing by a scalar Variable
fujiisoup Aug 9, 2017
d518f7a
Supported the indexing by DataArray with coordinates.
fujiisoup Aug 9, 2017
ba3cc88
Update DataArray.loc and DataArray.sel to use Dataset.loc and Dataset…
fujiisoup Aug 9, 2017
f63f3d5
Merge remote-tracking branch 'pydata/master' into indexing_broadcasting
fujiisoup Aug 21, 2017
aa10635
Added inhouse normalize_axis_index
fujiisoup Aug 21, 2017
fd73e82
Support an integer key for _advanced_indexer_subspaces
fujiisoup Aug 21, 2017
6202aff
Add warning for coordinate conflict.
fujiisoup Aug 27, 2017
f9746fd
Warning changes DeprecationWarning -> FutureWarning.
fujiisoup Aug 27, 2017
f78c932
fix related to pytest.warns
fujiisoup Aug 27, 2017
1c027cd
Another fix related to warning.
fujiisoup Aug 27, 2017
d11829f
Raise an Error for confusing indexing type
fujiisoup Aug 27, 2017
0777128
Minor fix
fujiisoup Aug 27, 2017
f580c99
Test for indexing by a scalar coordinate.
fujiisoup Aug 27, 2017
4ebe852
Modified test
fujiisoup Aug 29, 2017
20f5cb9
Remove too specialized errorning
fujiisoup Aug 29, 2017
ab08af8
Working with docs
fujiisoup Aug 29, 2017
92dded6
Found a bug in as_variable
fujiisoup Aug 29, 2017
a624424
Working with docs
fujiisoup Aug 29, 2017
a4cd724
Enable indexing IndexVariable by multi-dimensional Variable.
fujiisoup Aug 29, 2017
f66c9b6
Found a bug in indexing np.ndarray
fujiisoup Aug 30, 2017
24309c4
Added a test for boolean-DataArray indexing.
fujiisoup Aug 30, 2017
fd698de
Merge branch 'indexing_broadcasting' into indexing_broadcasting_doc
fujiisoup Aug 30, 2017
9cbaff9
Make sure assignment certainly works.
fujiisoup Aug 30, 2017
a5c7766
Added assignment section
fujiisoup Aug 30, 2017
f242166
pep8
fujiisoup Aug 31, 2017
bff18f0
Remove unused tests.
fujiisoup Aug 31, 2017
21c11c4
Add more docs.
fujiisoup Aug 31, 2017
1975f66
Api.rst changed
fujiisoup Aug 31, 2017
73ad94e
Merge branch 'indexing_broadcasting_doc' into indexing_broadcasting
fujiisoup Aug 31, 2017
b49f813
Add link in whats-new
fujiisoup Aug 31, 2017
1fd6b3a
Small format cleanup
fujiisoup Aug 31, 2017
46dd7c7
allow positional indexing with unsigned integer types
Aug 31, 2017
11f3e4f
Merge branch 'master' into indexing_broadcasting
fujiisoup Sep 1, 2017
1d3eddc
Catch up to the previous merge.
fujiisoup Sep 1, 2017
bcb25f1
Merge remote-tracking branch 'github/fix/1405' into indexing_broadcas…
fujiisoup Sep 1, 2017
7104964
workaround for daskarray with uint indexer.
fujiisoup Sep 1, 2017
173968b
Add a section about assignment, full indexing rules.
fujiisoup Sep 2, 2017
addb91a
Merge branch 'master' into indexing_broadcasting
fujiisoup Sep 3, 2017
7ad7d36
warning added for reindex for DataArray indexers.
fujiisoup Sep 4, 2017
91dd833
Move warning in alignment.reindex_variables.
fujiisoup Sep 4, 2017
118a5d8
+ Change API to attach non-dimensional coordinates.
fujiisoup Sep 5, 2017
dc9f8a6
Some clean up. Fix error in test_reindex_warning
fujiisoup Sep 5, 2017
5726c89
Enable vindex for PandasIndexAdapter.
fujiisoup Sep 5, 2017
523ecaa
Add deprecation warning for isel_points
fujiisoup Sep 6, 2017
a3a83db
Merge branch 'master' into indexing_broadcasting
fujiisoup Sep 6, 2017
765ae45
Add a sanity check for boolean vectorized indexing.
fujiisoup Sep 6, 2017
3deaf5c
Modify tests to take care of the sanity check related to boolean arra…
fujiisoup Sep 6, 2017
c8c8a12
Another follow up
fujiisoup Sep 6, 2017
a16a04b
pep8
fujiisoup Sep 6, 2017
1b34cd4
Clean up sanity checks in broadcast_indexers
fujiisoup Sep 7, 2017
24599a7
Fix unintended rename
fujiisoup Sep 11, 2017
d5d967b
indexing.rst edits
shoyer Sep 11, 2017
d0d6a6f
remove note about conflicts for now
shoyer Sep 17, 2017
4f08e2e
Apply coordinate conflict rule.
fujiisoup Sep 17, 2017
fbbe35c
Python 2 support
fujiisoup Sep 18, 2017
db23c93
Add tests for mindex selection.
fujiisoup Sep 18, 2017
031be9a
Drop coordinate of itself.
fujiisoup Sep 18, 2017
969f9cf
Clean up the coordinate dropping logic.
fujiisoup Sep 18, 2017
8608451
Merge pull request #4 from shoyer/indexing_broadcasting
fujiisoup Sep 22, 2017
b4e5b36
A small bug fix in coordinate dropping logic
fujiisoup Sep 22, 2017
9523039
Merge remote-tracking branch 'pydata/master' into indexing_broadcasting
fujiisoup Sep 27, 2017
dc60348
Fixes based on jhamman's comments.
fujiisoup Sep 27, 2017
8a62ad9
Improve test for warning.
fujiisoup Sep 27, 2017
6b96960
Merge branch 'master' into indexing_broadcasting
fujiisoup Oct 4, 2017
cb84154
Remove unused assert sentence.
fujiisoup Oct 4, 2017
9726531
Simplify rules for indexing conflicts
shoyer Oct 9, 2017
caa79fe
Merge pull request #5 from shoyer/indexing_broadcasting
fujiisoup Oct 9, 2017
170abc5
Better error-message for multiindex vectorized-selection.
fujiisoup Oct 10, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 33 additions & 55 deletions xarray/core/variable.py
Original file line number Diff line number Diff line change
Expand Up @@ -422,62 +422,40 @@ def _broadcast_indexes(self, key):
indexers: list of integer, array-like, or slice. This is aligned
along self.dims.
"""
if not utils.is_dict_like(key):
key = {self.dims[0]: key}
example_v = None
indexes = OrderedDict()
for k, v in key.items():
if not isinstance(v, (integer_types, slice, Variable)):
if not hasattr(key, 'ndim'): # convert list or tuple
v = np.array(v)
if example_v is None and isinstance(v, Variable):
example_v = v
indexes[k] = v

# When all the keys are array or integer, slice
if example_v is None:
# more than one (unlabelled) arrays
if len([v for k, v in indexes.items()
if not isinstance(v, (integer_types, slice))]) > 1:
raise IndexError("Indexing with multiple unlabelled arrays "
"is not allowed.")
# multi-dimensional unlabelled array
if any([v.ndim > 1 for k, v in indexes.items()
if not isinstance(v, integer_types)]):
raise IndexError("Indexing with a multi-dimensional unlabelled"
"array is not allowed.")
# convert the array into Variable
for k, v in indexes.items():
if not hasattr(v, 'dims'):
indexes[k] = type(self)([k], v)
example_v = v

for k, v in indexes.items():
# Found unlabelled array
if not isinstance(v, (Variable, integer_types, slice)):
if (v.ndim > example_v.ndim or
any([example_v.ndim != v.ndim for k, v
in indexes.items() if isinstance(v, Variable)])):
raise IndexError("Broadcasting failed because dimensions "
"does not match.")
else:
_, indexes[k], _ = _broadcast_compat_data(example_v, v)

index_tuple = tuple(indexes.get(d, slice(None)) for d in self.dims)
index_tuple = indexing.expanded_indexer(index_tuple, self.ndim)

# comput dims
dims = []
for i, d in enumerate(self.dims):
if d in indexes.keys():
if isinstance(v, Variable):
for d in v.dims:
if d not in dims:
dims.append(d)
else:
dims.append(d)
key = self._item_key_to_tuple(key) # key is a tuple
# key is a tuple of full size
key = indexing.expanded_indexer(key, self.ndim)
basic_indexing_types = integer_types + (slice,)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this a module level constant instead, since we access it in a few places.

if all([isinstance(k, basic_indexing_types) for k in key]):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid the unneeded extra [] inside all()

return self._broadcast_indexes_basic(key)
else:
return self._broadcast_indexes_advanced(key)

def _broadcast_indexes_basic(self, key):
dims = tuple(dim for k, dim in zip(key, self.dims)
if not isinstance(k, integer_types))
return dims, key

def _broadcast_indexes_advanced(self, key):
variables = []

return dims, index_tuple
for dim, value in zip(self.dims, key):
if isinstance(value, slice):
value = np.arange(self.sizes[dim])[value]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the slice.indices method here, to construct the desired array without indexing:

In [14]: x = slice(0, 10, 2)

In [15]: np.arange(*x.indices(5))
Out[15]: array([0, 2, 4])


try: # TODO we need our own Exception.
variable = as_variable(value, name=dim)
except ValueError as e:
if "cannot set variable" in str(e):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current implementation for this exception handling is rather bad.
I want to change this to something like

except DimensionMismatchError:
    raise IndexError('...')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Can you switch this for as_variable?

Inherit from ValueError for backwards compatibility. Something like MissingDimensionsError should be more broadly useful for cases like this where we can't safely guess a dimension name.

raise IndexError("Unlabelled multi-dimensional array "
"cannot be used for indexing.")
else:
raise e
variables.append(variable)
variables = _broadcast_compat_variables(*variables)
dims = variables[0].dims # all variables have the same dims
key = tuple(variable.data for variable in variables)
return dims, key

def getitem2(self, key):
"""Return a new Array object whose contents are consistent with
Expand Down
107 changes: 64 additions & 43 deletions xarray/tests/test_variable.py
Original file line number Diff line number Diff line change
Expand Up @@ -709,75 +709,96 @@ def test_items(self):
v[range(10), range(11)] = 1
self.assertArrayEqual(v.values, np.ones((10, 11)))

def test_getitem2(self):
def test_getitem2_basic(self):
v = self.cls(['x', 'y'], [[0, 1, 2], [3, 4, 5]])

with self.assertRaisesRegexp(IndexError, "Indexing with multiple"):
v.getitem2(dict(x=[0, 1], y=[0, 1]))
v_new = v.getitem2(dict(x=0))
self.assertTrue(v_new.dims == ('y', ))
self.assertArrayEqual(v_new, v._data[0])

with self.assertRaisesRegexp(IndexError, "Indexing with a multi-"):
v.getitem2([[0, 1], [1, 2]])
v_new = v.getitem2(dict(x=0, y=slice(None)))
self.assertTrue(v_new.dims == ('y', ))
self.assertArrayEqual(v_new, v._data[0])

v_new = v.getitem2(dict(x=0, y=1))
self.assertTrue(v_new.dims == ())
self.assertArrayEqual(v_new, v._data[0, 1])

v_new = v.getitem2(dict(y=1))
self.assertTrue(v_new.dims == ('x', ))
self.assertArrayEqual(v_new, v._data[:, 1])

# tuple argument
v_new = v.getitem2((slice(None), 1))
self.assertTrue(v_new.dims == ('x', ))
self.assertArrayEqual(v_new, v._data[:, 1])

def test_getitem2_advanced(self):
v = self.cls(['x', 'y'], [[0, 1, 2], [3, 4, 5]])

# orthogonal indexing
v_new = v.getitem2(([0, 1], [1, 0]))
self.assertTrue(v_new.dims == ('x', 'y'))
self.assertArrayEqual(v_new, v._data[[0, 1]][:, [1, 0]])

dims, index_tuple = v._broadcast_indexes([0, 1])
self.assertTrue(dims == ['x', 'y'])
self.assertTrue(np.allclose(index_tuple[0], [0, 1]))
self.assertTrue(index_tuple[1] == slice(None, None, None))
v_new = v.getitem2([0, 1])
self.assertTrue(v_new.dims == ('x', 'y'))
self.assertArrayEqual(v_new, v._data[[0, 1]])

ind = Variable(['a', 'b'], [[0, 1, 1], [1, 1, 0]])
dims, index_tuple = v._broadcast_indexes(ind)
self.assertTrue(dims == ['a', 'b', 'y'])
self.assertTrue(np.allclose(index_tuple[0], [[0, 1, 1], [1, 1, 0]]))
self.assertTrue(index_tuple[1] == slice(None, None, None))
v_new = v.getitem2(ind)
self.assertTrue(v_new.dims == ('a', 'b', 'y'))
self.assertArrayEqual(v_new, v._data[([0, 1, 1], [1, 1, 0]), :])

ind = Variable(['a', 'b'], [[0, 1, 2], [2, 1, 0]])
dims, index_tuple = v._broadcast_indexes(dict(y=ind))
self.assertTrue(dims == ['x', 'a', 'b'])
self.assertTrue(len(index_tuple) == 2)
self.assertTrue(index_tuple[0] == slice(None, None, None))
self.assertTrue(np.allclose(index_tuple[1], [[0, 1, 2], [2, 1, 0]]))
v_new = v.getitem2(dict(y=ind))
self.assertTrue(v_new.dims == ('x', 'a', 'b'))
self.assertArrayEqual(v_new, v._data[:, ([0, 1, 2], [2, 1, 0])])

# with broadcast
# with mixed arguments
ind = Variable(['a'], [0, 1])
dims, index_tuple = v._broadcast_indexes(dict(x=[0, 1], y=ind))
self.assertTrue(dims == ['a'])
self.assertTrue(np.allclose(index_tuple[0], [0, 1]))
self.assertTrue(np.allclose(index_tuple[1], [0, 1]))
v_new = v.getitem2(dict(x=[0, 1], y=ind))
self.assertArrayEqual(v_new, v._data[[0, 1], [0, 1]])
self.assertTrue(v_new.dims == ('x', 'a'))
self.assertArrayEqual(v_new, v._data[[0, 1]][:, [0, 1]])

ind = Variable(['a', 'b'], [[0, 0], [1, 1]])
dims, index_tuple = v._broadcast_indexes(dict(x=[[1, 0], [1, 0]],
y=ind))
self.assertTrue(dims == ['a', 'b'])
self.assertTrue(np.allclose(index_tuple[0], [[1, 0], [1, 0]]))
self.assertTrue(np.allclose(index_tuple[1], [[0, 0], [1, 1]]))
v_new = v.getitem2(dict(x=[[1, 0], [1, 0]], y=ind))
self.assertArrayEqual(v_new,
v._data[([1, 0], [1, 0]), ([0, 0], [1, 1])])

# broadcast impossible case
with self.assertRaisesRegexp(IndexError, "Broadcasting failed "):
ind = Variable(['a'], [0, 1])
v.getitem2(dict(x=[[1, 0], [1, 0]], y=ind))
v_new = v.getitem2(dict(x=[1, 0], y=ind))
self.assertTrue(v_new.dims == ('x', 'a', 'b'))
self.assertArrayEqual(v_new, v._data[[1, 0]][:, ind])

# with integer
ind = Variable(['a', 'b'], [[0, 0], [1, 1]])
dims, index_tuple = v._broadcast_indexes(dict(x=0, y=ind))
self.assertTrue(dims == ['a', 'b'])
self.assertTrue(np.allclose(index_tuple[0], 0))
self.assertTrue(np.allclose(index_tuple[1], [[0, 0], [1, 1]]))
v_new = v.getitem2(dict(x=0, y=ind))
self.assertArrayEqual(v_new,
v._data[0, ([0, 0], [1, 1])])
self.assertTrue(v_new.dims == ('a', 'b'))
self.assertArrayEqual(v_new[0], v._data[0][[0, 0]])
self.assertArrayEqual(v_new[1], v._data[0][[1, 1]])

# with slice
ind = Variable(['a', 'b'], [[0, 0], [1, 1]])
v_new = v.getitem2(dict(x=slice(None), y=ind))
self.assertTrue(v_new.dims == ('x', 'a', 'b'))
self.assertArrayEqual(v_new, v._data[:, [[0, 0], [1, 1]]])

ind = Variable(['a', 'b'], [[0, 0], [1, 1]])
v_new = v.getitem2(dict(x=ind, y=slice(None)))
self.assertTrue(v_new.dims == ('a', 'b', 'y'))
self.assertArrayEqual(v_new, v._data[[[0, 0], [1, 1]], :])

ind = Variable(['a', 'b'], [[0, 0], [1, 1]])
v_new = v.getitem2(dict(x=ind, y=slice(None, 1)))
self.assertTrue(v_new.dims == ('a', 'b', 'y'))
self.assertArrayEqual(v_new, v._data[[[0, 0], [1, 1]], slice(None, 1)])

def test_getitem2_error(self):
v = self.cls(['x', 'y'], [[0, 1, 2], [3, 4, 5]])

with self.assertRaisesRegexp(IndexError, "Unlabelled multi-"):
v.getitem2([[0, 1], [1, 2]])

with self.assertRaisesRegexp(ValueError, "operands cannot be "):
ind_x = Variable(['a', 'b'], [[0, 0], [1, 1]])
ind_y = Variable(['a'], [0])
v.getitem2((ind_x, ind_y))

def test_isel(self):
v = Variable(['time', 'x'], self.d)
Expand Down