Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What was the bug?
TL;DR:
UnionDataset
doesn't work and has never worked.In
UnionDataset.__getitem__
, we were using the following check:The idea was that we shouldn't be trying to query from a dataset unless the dataset contains files that overlap with the query.
However, this doesn't work.
rtree.index.Index.intersection
returns a generator, which always evaluates to True. This resulted in bugs like #769.What was the fix?
The fix is simple: just convert the generator to a list. The list will evaluate to False if it's empty, and the dataset won't be sampled from because it has no overlap.
Why didn't the tests catch this?
Our existing tests were basically useless. Our
CustomGeoDatset.__getitem__
just returned the query, so even if the query didn't overlap at all with the dataset, it didn't raise an error. And ourUnionDataset
tests only tested the union of overlapping datasets, they never tested disparate geospatial locations.I changed the tests to actually check the rtree index for the query, and to test the union of two disjoint datasets. The tests fail without the fix, and pass with the fix. Hopefully the new and improved tests will be more useful.