Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

L7 Irish: add new dataset/datamodule #1197

Merged
merged 60 commits into from
Apr 12, 2023
Merged
Show file tree
Hide file tree
Changes from 48 commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
3871588
Landsat 7 Irisih: add new dataset/datamodule
yichiac Mar 26, 2023
1211d33
# Changes to be committed:
yichiac Mar 28, 2023
9963dba
# Changes to be committed:
yichiac Mar 28, 2023
9e22c85
add: test_l7irish.py and data.py
yichiac Apr 1, 2023
ad8bfa8
# Changes to be committed:
yichiac Apr 1, 2023
ba3c8c6
modified: tests/datasets/test_l7irish.py,
yichiac Apr 1, 2023
63d29a3
added data.py, austral.tar.gz, test_l7irish.py
yichiac Apr 4, 2023
de91df1
remove comments in test_l7irish.py
yichiac Apr 4, 2023
6d65ec7
Merge branch 'main' into main
yichiac Apr 4, 2023
56dce86
resolve black and flake8 issues
yichiac Apr 4, 2023
00e91f9
Fixed _getitem
yichiac Apr 4, 2023
0b54298
Added L7 Irish datamodule
yichiac Apr 5, 2023
6f0c2b6
fix flake8 space error
yichiac Apr 5, 2023
fc9c788
fix black test error
yichiac Apr 5, 2023
ca2a826
chmod +x for data.py
yichiac Apr 5, 2023
094adf9
Update docs/api/datamodules.rst
yichiac Apr 5, 2023
12b699e
Update docs/api/datasets.rst
yichiac Apr 5, 2023
29bff42
Update docs/api/geo_datasets.csv
yichiac Apr 5, 2023
c6b78d9
Update torchgeo/datasets/l7irish.py
yichiac Apr 5, 2023
dd48635
Update torchgeo/datasets/l7irish.py
yichiac Apr 5, 2023
7064617
Resolved minor issues in l7irish.py
yichiac Apr 5, 2023
c839725
Improved _getitem and plot functions
yichiac Apr 5, 2023
9884f05
Added new artificial data with 5 scenes
yichiac Apr 5, 2023
2065e20
remove comments in l7irish.py
yichiac Apr 5, 2023
f6a67e0
Merge branch 'main' into datasets/l7irish
yichiac Apr 5, 2023
6394729
resolve black, flake8, and isort errors
yichiac Apr 5, 2023
0ca2043
add l7irish.yaml and refine test_segmentation.py
yichiac Apr 5, 2023
0c12781
modified l7irish.yaml
yichiac Apr 5, 2023
b81aa19
revert a change in .gitignore
yichiac Apr 5, 2023
218a776
add function test_rgb_bands_absent_plot()
yichiac Apr 6, 2023
222c8a9
resolve black test issue
yichiac Apr 6, 2023
30ab20f
Update torchgeo/datasets/l7irish.py
yichiac Apr 6, 2023
fd036d4
Update torchgeo/datasets/l7irish.py
yichiac Apr 6, 2023
61eeba5
Merge branch 'main' into datasets/l7irish
yichiac Apr 7, 2023
ce01caa
Updaye l7irish.py and create new test data
yichiac Apr 7, 2023
cbc7691
update l7irish.py for style tests
yichiac Apr 7, 2023
d4f069e
remove old test data
yichiac Apr 7, 2023
56473e9
Update tests/data/l7irish/data.py
yichiac Apr 7, 2023
a93ac58
Update torchgeo/datasets/l7irish.py
yichiac Apr 7, 2023
9b6c868
update data.py and l7irish.py
yichiac Apr 7, 2023
340d8b6
update md5s, citations, masks, and thermal bands
yichiac Apr 11, 2023
8cb6480
update mask mapping
yichiac Apr 11, 2023
4bfa372
update formatting
yichiac Apr 11, 2023
b4e5d17
update mask path
yichiac Apr 11, 2023
eb78c37
Merge branch 'main' into datasets/l7irish
yichiac Apr 11, 2023
e23ae78
Merge branch 'main' into datasets/l7irish
yichiac Apr 11, 2023
d4fa226
Merge branch 'main' into datasets/l7irish
yichiac Apr 11, 2023
65ea08c
Merge branch 'main' into datasets/l7irish
yichiac Apr 12, 2023
6b38751
Update torchgeo/datasets/l7irish.py
yichiac Apr 12, 2023
d7806ba
Update torchgeo/datasets/l7irish.py
yichiac Apr 12, 2023
64353ce
Update tests/data/l7irish/data.py
yichiac Apr 12, 2023
50db19d
Update docs/api/geo_datasets.csv
yichiac Apr 12, 2023
9dba933
Update tests/conf/l7irish.yaml
yichiac Apr 12, 2023
cfa20d3
resolve issues from comments
yichiac Apr 12, 2023
79cc501
Merge branch 'main' into datasets/l7irish
yichiac Apr 12, 2023
fe42927
Update L7 Irish link
yichiac Apr 12, 2023
01e1e80
Merge branch 'main' into datasets/l7irish
yichiac Apr 12, 2023
11149db
update mask data generation and review changes
yichiac Apr 12, 2023
796a607
Merge branch 'main' into datasets/l7irish
yichiac Apr 12, 2023
1107eaf
Fix checksums
adamjstewart Apr 12, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/api/datamodules.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,11 @@ Chesapeake Land Cover

.. autoclass:: ChesapeakeCVPRDataModule

L7 Irish
^^^^^^^^

.. autoclass:: L7IrishDataModule

L8 Biome
^^^^^^^^

Expand Down
5 changes: 5 additions & 0 deletions docs/api/datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,11 @@ iNaturalist

.. autoclass:: INaturalist

L7 Irish
^^^^^^^^

.. autoclass:: L7Irish

L8 Biome
^^^^^^^^

Expand Down
1 change: 1 addition & 0 deletions docs/api/geo_datasets.csv
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Dataset,Type,Source,Size (px),Resolution (m)
`GBIF`_,Points,Citizen Scientists,-,-
`GlobBiomass`_,Masks,Landsat,"45,000x45,000",100
`iNaturalist`_,Points,Citizen Scientists,-,-
`L7 Irish`_,"Imagery, Masks",Landsat,"8,400x7,500","15, 30, 60"
yichiac marked this conversation as resolved.
Show resolved Hide resolved
`LandCover.ai Geo`_,"Imagery, Masks",Aerial,"4,200--9,500",0.25--0.5
`Landsat`_,Imagery,Landsat,"8,900x8,900",30
`L8 Biome`_,"Imagery, Masks",Landsat,"8,900x8,900","15, 30"
yichiac marked this conversation as resolved.
Show resolved Hide resolved
Expand Down
21 changes: 21 additions & 0 deletions tests/conf/l7irish.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
experiment:
task: "l7irish"
module:
loss: "ce"
model: "unet"
backbone: "resnet18"
weights: null
learning_rate: 1e-3
learning_rate_schedule_patience: 6
verbose: false
in_channels: 9
num_classes: 5
num_filters: 1
ignore_index: null
yichiac marked this conversation as resolved.
Show resolved Hide resolved
datamodule:
root: "tests/data/l7irish"
download: true
batch_size: 1
patch_size: 32
length: 5
num_workers: 0
Binary file added tests/data/l7irish/austral.tar.gz
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added tests/data/l7irish/boreal.tar.gz
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
117 changes: 117 additions & 0 deletions tests/data/l7irish/data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
#!/usr/bin/env python3

# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.

import hashlib
import os
import shutil
from typing import Dict, List, Union

import numpy as np
import rasterio
from rasterio import Affine
from rasterio.crs import CRS

SIZE = 36

np.random.seed(0)

FILENAME_HIERARCHY = Union[Dict[str, "FILENAME_HIERARCHY"], List[str]]

bands = [
"B10.TIF",
"B20.TIF",
"B30.TIF",
"B40.TIF",
"B50.TIF",
"B61.TIF",
"B62.TIF",
"B70.TIF",
"B80.TIF",
]

filenames: FILENAME_HIERARCHY = {
"austral": {"p226_r98": [], "p227_r98": [], "p231_r93_2": []},
"boreal": {"p2_r27": [], "p143_r21_3": []},
}
prefixes = [
"L71226098_09820011112",
"L71227098_09820011103",
"L71231093_09320010507",
"L71002027_02720010604",
"L71143021_02120010803",
]

for land_type, patches in filenames.items():
for patch in patches:
path, row = patch.split("_")[:2]
path = path[1:].zfill(3)
row = row[1:].zfill(3)
key = path + row
yichiac marked this conversation as resolved.
Show resolved Hide resolved
for prefix in prefixes:
if key in prefix:
for band in bands:
if band in ["B62.TIF", "B70.TIF", "B80.TIF"]:
prefix = prefix.replace(prefix[2], "2", 1)
filenames[land_type][patch].append(f"{prefix}_{band}")

filenames[land_type][patch].append(f"L7_{patch}_newmask2015.TIF")
yichiac marked this conversation as resolved.
Show resolved Hide resolved


def create_file(path: str) -> None:
dtype = "uint8"
profile = {
"driver": "GTiff",
"dtype": dtype,
"width": SIZE,
"height": SIZE,
"count": 1,
"crs": CRS.from_epsg(32719),
"transform": Affine(30.0, 0.0, 462884.99999999994, 0.0, -30.0, 4071915.0),
}

if path.endswith("B80.TIF"):
profile["transform"] = Affine(
15.0, 0.0, 462892.49999999994, 0.0, -15.0, 4071907.5
)
profile["width"] = profile["height"] = SIZE * 2

if path.endswith("_newmask2015.TIF"):
Z = np.random.randint(5, size=(SIZE, SIZE), dtype=dtype)
adamjstewart marked this conversation as resolved.
Show resolved Hide resolved

else:
Z = np.random.randn(SIZE, SIZE).astype(profile["dtype"])

with rasterio.open(path, "w", **profile) as src:
src.write(Z, 1)


def create_directory(directory: str, hierarchy: FILENAME_HIERARCHY) -> None:
if isinstance(hierarchy, dict):
# Recursive case
for key, value in hierarchy.items():
path = os.path.join(directory, key)
os.makedirs(path, exist_ok=True)
create_directory(path, value)
else:
# Base case
for value in hierarchy:
path = os.path.join(directory, value)
create_file(path)


if __name__ == "__main__":
create_directory(".", filenames)

directories = ["austral", "boreal"]
for directory in directories:
filename = str(directory)

# Create tarballs
shutil.make_archive(filename, "gztar", ".", directory)

# # Compute checksums
with open(f"{filename}.tar.gz", "rb") as f:
md5 = hashlib.md5(f.read()).hexdigest()
print(filename, md5)
94 changes: 94 additions & 0 deletions tests/datasets/test_l7irish.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.

import glob
import os
import shutil
from pathlib import Path

import matplotlib.pyplot as plt
import pytest
import torch
import torch.nn as nn
from _pytest.monkeypatch import MonkeyPatch
from rasterio.crs import CRS

import torchgeo.datasets.utils
from torchgeo.datasets import BoundingBox, IntersectionDataset, L7Irish, UnionDataset


def download_url(url: str, root: str, *args: str, **kwargs: str) -> None:
shutil.copy(url, root)


class TestL7Irish:
@pytest.fixture
def dataset(self, monkeypatch: MonkeyPatch, tmp_path: Path) -> L7Irish:
monkeypatch.setattr(torchgeo.datasets.l7irish, "download_url", download_url)
md5s = {
"austral": "2aade4740a7a236aac17ddf01835ab6a",
"boreal": "de6e8574af0bdc1e03c4d77e11c2671e",
}

url = os.path.join("tests", "data", "l7irish", "{}.tar.gz")
monkeypatch.setattr(L7Irish, "url", url)
monkeypatch.setattr(L7Irish, "md5s", md5s)
root = str(tmp_path)
transforms = nn.Identity()
return L7Irish(root, transforms=transforms, download=True, checksum=True)

def test_getitem(self, dataset: L7Irish) -> None:
x = dataset[dataset.bounds]
assert isinstance(x, dict)
assert isinstance(x["crs"], CRS)
assert isinstance(x["image"], torch.Tensor)
yichiac marked this conversation as resolved.
Show resolved Hide resolved
assert isinstance(x["mask"], torch.Tensor)
yichiac marked this conversation as resolved.
Show resolved Hide resolved

def test_and(self, dataset: L7Irish) -> None:
ds = dataset & dataset
assert isinstance(ds, IntersectionDataset)

def test_or(self, dataset: L7Irish) -> None:
ds = dataset | dataset
assert isinstance(ds, UnionDataset)

def test_plot(self, dataset: L7Irish) -> None:
x = dataset[dataset.bounds]
dataset.plot(x, suptitle="Test")
plt.close()

def test_already_extracted(self, dataset: L7Irish) -> None:
L7Irish(root=dataset.root, download=True)

def test_already_downloaded(self, tmp_path: Path) -> None:
pathname = os.path.join("tests", "data", "l7irish", "*.tar.gz")
root = str(tmp_path)
for tarfile in glob.iglob(pathname):
shutil.copy(tarfile, root)
L7Irish(root)

def test_not_downloaded(self, tmp_path: Path) -> None:
with pytest.raises(RuntimeError, match="Dataset not found"):
L7Irish(str(tmp_path))

def test_plot_prediction(self, dataset: L7Irish) -> None:
x = dataset[dataset.bounds]
x["prediction"] = x["mask"].clone()
dataset.plot(x, suptitle="Prediction")
plt.close()

def test_invalid_query(self, dataset: L7Irish) -> None:
query = BoundingBox(0, 0, 0, 0, 0, 0)
with pytest.raises(
IndexError, match="query: .* not found in index with bounds:"
):
dataset[query]

def test_rgb_bands_absent_plot(self, dataset: L7Irish) -> None:
with pytest.raises(
ValueError, match="Dataset doesn't contain some of the RGB bands"
):
ds = L7Irish(root=dataset.root, bands=["B1", "B2", "B5"])
x = ds[ds.bounds]
ds.plot(x, suptitle="Test")
plt.close()
2 changes: 2 additions & 0 deletions tests/trainers/test_segmentation.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
ETCI2021DataModule,
GID15DataModule,
InriaAerialImageLabelingDataModule,
L7IrishDataModule,
L8BiomeDataModule,
LandCoverAIDataModule,
LoveDADataModule,
Expand Down Expand Up @@ -64,6 +65,7 @@ class TestSemanticSegmentationTask:
("etci2021", ETCI2021DataModule),
("gid15", GID15DataModule),
("inria", InriaAerialImageLabelingDataModule),
("l7irish", L7IrishDataModule),
("l8biome", L8BiomeDataModule),
("landcoverai", LandCoverAIDataModule),
("loveda", LoveDADataModule),
Expand Down
2 changes: 2 additions & 0 deletions torchgeo/datamodules/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
from .geo import GeoDataModule, NonGeoDataModule
from .gid15 import GID15DataModule
from .inria import InriaAerialImageLabelingDataModule
from .l7irish import L7IrishDataModule
from .l8biome import L8BiomeDataModule
from .landcoverai import LandCoverAIDataModule
from .loveda import LoveDADataModule
Expand All @@ -35,6 +36,7 @@
__all__ = (
# GeoDataset
"ChesapeakeCVPRDataModule",
"L7IrishDataModule",
"L8BiomeDataModule",
"NAIPChesapeakeDataModule",
# NonGeoDataset
Expand Down
76 changes: 76 additions & 0 deletions torchgeo/datamodules/l7irish.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.

"""L7 Irish datamodule."""

from typing import Any, Tuple, Union

import torch

from ..datasets import L7Irish, random_bbox_assignment
from ..samplers import GridGeoSampler, RandomBatchGeoSampler
from .geo import GeoDataModule


class L7IrishDataModule(GeoDataModule):
"""LightningDataModule implementation for the L7 Irish dataset.

.. versionadded:: 0.5
"""

mean = torch.tensor(0)
std = torch.tensor(10000)

def __init__(
self,
batch_size: int = 1,
patch_size: Union[int, Tuple[int, int]] = 32,
length: int = 5,
num_workers: int = 0,
**kwargs: Any,
) -> None:
"""Initialize a new L7IrishDataModule instance.

Args:
batch_size: Size of each mini-batch.
patch_size: Size of each patch, either ``size`` or ``(height, width)``.
length: Length of each training epoch.
num_workers: Number of workers for parallel data loading.
**kwargs: Additional keyword arguments passed to
:class:`~torchgeo.datasets.L7Irish`.
"""
super().__init__(
L7Irish,
batch_size=batch_size,
patch_size=patch_size,
length=length,
num_workers=num_workers,
**kwargs,
)

def setup(self, stage: str) -> None:
"""Set up datasets.

Args:
stage: Either 'fit', 'validate', 'test', or 'predict'.
"""
dataset = L7Irish(**self.kwargs)
generator = torch.Generator().manual_seed(0)
(
self.train_dataset,
self.val_dataset,
self.test_dataset,
) = random_bbox_assignment(dataset, [0.6, 0.2, 0.2], generator)

if stage in ["fit"]:
self.train_batch_sampler = RandomBatchGeoSampler(
self.train_dataset, self.patch_size, self.batch_size, self.length
)
if stage in ["fit", "validate"]:
self.val_sampler = GridGeoSampler(
self.val_dataset, self.patch_size, self.patch_size
)
if stage in ["test"]:
self.test_sampler = GridGeoSampler(
self.test_dataset, self.patch_size, self.patch_size
)
2 changes: 2 additions & 0 deletions torchgeo/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@
from .idtrees import IDTReeS
from .inaturalist import INaturalist
from .inria import InriaAerialImageLabeling
from .l7irish import L7Irish
from .l8biome import L8Biome
from .landcoverai import LandCoverAI, LandCoverAIBase, LandCoverAIGeo
from .landsat import (
Expand Down Expand Up @@ -138,6 +139,7 @@
"GBIF",
"GlobBiomass",
"INaturalist",
"L7Irish",
"L8Biome",
"LandCoverAIBase",
"LandCoverAIGeo",
Expand Down
Loading