Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnionDataset: fix __getitem__ bug #786

Merged
merged 1 commit into from
Sep 25, 2022
Merged

UnionDataset: fix __getitem__ bug #786

merged 1 commit into from
Sep 25, 2022

Conversation

adamjstewart
Copy link
Collaborator

What was the bug?

TL;DR: UnionDataset doesn't work and has never worked.

In UnionDataset.__getitem__, we were using the following check:

if ds.index.intersection(tuple(query)):
    samples.append(ds[query])

The idea was that we shouldn't be trying to query from a dataset unless the dataset contains files that overlap with the query.

However, this doesn't work. rtree.index.Index.intersection returns a generator, which always evaluates to True. This resulted in bugs like #769.

What was the fix?

The fix is simple: just convert the generator to a list. The list will evaluate to False if it's empty, and the dataset won't be sampled from because it has no overlap.

Why didn't the tests catch this?

Our existing tests were basically useless. Our CustomGeoDatset.__getitem__ just returned the query, so even if the query didn't overlap at all with the dataset, it didn't raise an error. And our UnionDataset tests only tested the union of overlapping datasets, they never tested disparate geospatial locations.

I changed the tests to actually check the rtree index for the query, and to test the union of two disjoint datasets. The tests fail without the fix, and pass with the fix. Hopefully the new and improved tests will be more useful.

@adamjstewart adamjstewart added this to the 0.3.2 milestone Sep 21, 2022
@github-actions github-actions bot added datasets Geospatial or benchmark datasets testing Continuous integration testing labels Sep 21, 2022
@calebrob6
Copy link
Member

I confirmed this fix works today by patching it into my code that uses a UnionDataset with a semantic segmentation trainer.

@adamjstewart adamjstewart merged commit a6f0cc7 into main Sep 25, 2022
@adamjstewart adamjstewart deleted the fixes/union branch September 25, 2022 19:20
@adamjstewart adamjstewart modified the milestones: 0.3.2, 0.4.0 Jan 23, 2023
yichiac pushed a commit to yichiac/torchgeo that referenced this pull request Apr 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets testing Continuous integration testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants