Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSL4EO-L: add new dataset #1332

Merged
merged 5 commits into from
May 14, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/api/datamodules.rst
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,7 @@ SpaceNet
SSL4EO
^^^^^^

.. autoclass:: SSL4EOLDataModule
.. autoclass:: SSL4EOS12DataModule

SustainBench Crop Yield
Expand Down
2 changes: 2 additions & 0 deletions docs/api/datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -327,6 +327,8 @@ SpaceNet
SSL4EO
^^^^^^

.. autoclass:: SSL4EO
.. autoclass:: SSL4EOL
.. autoclass:: SSL4EOS12

SustainBench Crop Yield
Expand Down
3 changes: 2 additions & 1 deletion docs/api/non_geo_datasets.csv
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,8 @@ Dataset,Task,Source,# Samples,# Classes,Size (px),Resolution (m),Bands
`SKIPP'D`_,R,"Fish-eye","363,375",-,64x64,-,RGB
`So2Sat`_,C,Sentinel-1/2,"400,673",17,32x32,10,"SAR, MSI"
`SpaceNet`_,I,WorldView-2/3 Planet Lab Dove,"1,889--28,728",2,102--900,0.5--4,MSI
`SSL4EO`_,T,Sentinel-1/2,1M,-,264x264,10,"SAR, MSI"
`SSL4EO`_-L,T,Landsat,1M,-,264x264,30,MSI
`SSL4EO`_-S12,T,Sentinel-1/2,1M,-,264x264,10,"SAR, MSI"
`SustainBench Crop Yield`_,R,MODIS,11k,-,32x32,-,MSI
`Tropical Cyclone`_,R,GOES 8--16,"108,110",-,256x256,4K--8K,MSI
`UC Merced`_,C,USGS National Map,"2,100",21,256x256,0.3,RGB
Expand Down
15 changes: 15 additions & 0 deletions tests/conf/ssl4eo_l_byol_1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
module:
_target_: torchgeo.trainers.BYOLTask
in_channels: 7
backbone: "resnet18"
learning_rate: 1e-3
learning_rate_schedule_patience: 6
weights: null

datamodule:
_target_: torchgeo.datamodules.SSL4EOLDataModule
root: "tests/data/ssl4eo/l/tm_toa"
split: "tm_toa"
seasons: 1
batch_size: 2
num_workers: 0
15 changes: 15 additions & 0 deletions tests/conf/ssl4eo_l_byol_2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
module:
_target_: torchgeo.trainers.BYOLTask
in_channels: 6
backbone: "resnet18"
learning_rate: 1e-3
learning_rate_schedule_patience: 6
weights: null

datamodule:
_target_: torchgeo.datamodules.SSL4EOLDataModule
root: "tests/data/ssl4eo/l/tm_sr"
split: "tm_sr"
seasons: 2
batch_size: 2
num_workers: 0
17 changes: 17 additions & 0 deletions tests/conf/ssl4eo_l_moco_1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
module:
_target_: torchgeo.trainers.MoCoTask
model: "resnet18"
in_channels: 9
version: 1
weight_decay: 1e-4
temperature: 0.07
memory_bank_size: 10
moco_momentum: 0.999

datamodule:
_target_: torchgeo.datamodules.SSL4EOLDataModule
root: "tests/data/ssl4eo/l/etm_toa"
split: "etm_toa"
seasons: 1
batch_size: 2
num_workers: 0
20 changes: 20 additions & 0 deletions tests/conf/ssl4eo_l_moco_2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
module:
_target_: torchgeo.trainers.MoCoTask
model: "resnet18"
in_channels: 11
version: 2
layers: 2
hidden_dim: 10
output_dim: 5
weight_decay: 1e-4
temperature: 0.07
memory_bank_size: 10
moco_momentum: 0.999

datamodule:
_target_: torchgeo.datamodules.SSL4EOLDataModule
root: "tests/data/ssl4eo/l/oli_tirs_toa"
split: "oli_tirs_toa"
seasons: 2
batch_size: 2
num_workers: 0
18 changes: 18 additions & 0 deletions tests/conf/ssl4eo_l_simclr_1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
module:
_target_: torchgeo.trainers.SimCLRTask
model: "resnet18"
in_channels: 7
version: 1
layers: 2
hidden_dim: 8
output_dim: 8
weight_decay: 1e-6
memory_bank_size: 0

datamodule:
_target_: torchgeo.datamodules.SSL4EOLDataModule
root: "tests/data/ssl4eo/l/oli_sr"
split: "oli_sr"
seasons: 1
batch_size: 2
num_workers: 0
18 changes: 18 additions & 0 deletions tests/conf/ssl4eo_l_simclr_2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
module:
_target_: torchgeo.trainers.SimCLRTask
model: "resnet18"
in_channels: 7
version: 2
layers: 3
hidden_dim: 8
output_dim: 8
weight_decay: 1e-4
memory_bank_size: 10

datamodule:
_target_: torchgeo.datamodules.SSL4EOLDataModule
root: "tests/data/ssl4eo/l/tm_toa"
split: "tm_toa"
seasons: 2
batch_size: 2
num_workers: 0
3 changes: 2 additions & 1 deletion tests/conf/ssl4eo_s12_byol_1.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
module:
_target_: torchgeo.trainers.BYOLTask
in_channels: 13
in_channels: 2
backbone: "resnet18"
learning_rate: 1e-3
learning_rate_schedule_patience: 6
Expand All @@ -9,6 +9,7 @@ module:
datamodule:
_target_: torchgeo.datamodules.SSL4EOS12DataModule
root: "tests/data/ssl4eo/s12"
split: "s1"
seasons: 1
batch_size: 2
num_workers: 0
1 change: 1 addition & 0 deletions tests/conf/ssl4eo_s12_byol_2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ module:
datamodule:
_target_: torchgeo.datamodules.SSL4EOS12DataModule
root: "tests/data/ssl4eo/s12"
split: "s2c"
seasons: 2
batch_size: 2
num_workers: 0
3 changes: 2 additions & 1 deletion tests/conf/ssl4eo_s12_moco_1.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
module:
_target_: torchgeo.trainers.MoCoTask
model: "resnet18"
in_channels: 13
in_channels: 12
version: 1
weight_decay: 1e-4
temperature: 0.07
Expand All @@ -11,6 +11,7 @@ module:
datamodule:
_target_: torchgeo.datamodules.SSL4EOS12DataModule
root: "tests/data/ssl4eo/s12"
split: "s2a"
seasons: 1
batch_size: 2
num_workers: 0
3 changes: 2 additions & 1 deletion tests/conf/ssl4eo_s12_moco_2.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
module:
_target_: torchgeo.trainers.MoCoTask
model: "resnet18"
in_channels: 13
in_channels: 2
version: 2
layers: 2
hidden_dim: 10
Expand All @@ -14,6 +14,7 @@ module:
datamodule:
_target_: torchgeo.datamodules.SSL4EOS12DataModule
root: "tests/data/ssl4eo/s12"
split: "s1"
seasons: 2
batch_size: 2
num_workers: 0
1 change: 1 addition & 0 deletions tests/conf/ssl4eo_s12_simclr_1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ module:
datamodule:
_target_: torchgeo.datamodules.SSL4EOS12DataModule
root: "tests/data/ssl4eo/s12"
split: "s2c"
seasons: 1
batch_size: 2
num_workers: 0
3 changes: 2 additions & 1 deletion tests/conf/ssl4eo_s12_simclr_2.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
module:
_target_: torchgeo.trainers.SimCLRTask
model: "resnet18"
in_channels: 13
in_channels: 12
version: 2
layers: 3
hidden_dim: 8
Expand All @@ -12,6 +12,7 @@ module:
datamodule:
_target_: torchgeo.datamodules.SSL4EOS12DataModule
root: "tests/data/ssl4eo/s12"
split: "s2a"
seasons: 2
batch_size: 2
num_workers: 0
150 changes: 150 additions & 0 deletions tests/data/ssl4eo/l/data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
#!/usr/bin/env python3

# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.

import hashlib
import os
import shutil
from typing import Union

import numpy as np
import rasterio
from rasterio import Affine
from rasterio.crs import CRS

SIZE = 36

np.random.seed(0)

FILENAME_HIERARCHY = Union[dict[str, "FILENAME_HIERARCHY"], list[str]]

filenames: FILENAME_HIERARCHY = {
"tm_toa": {
"0000002": {
"LE07_172034_20010526": ["all_bands.tif"],
"LE07_172034_20020310": ["all_bands.tif"],
"LE07_172034_20020902": ["all_bands.tif"],
"LE07_172034_20021121": ["all_bands.tif"],
},
"0000005": {
"LE07_223084_20010413": ["all_bands.tif"],
"LE07_223084_20011225": ["all_bands.tif"],
"LE07_223084_20020619": ["all_bands.tif"],
"LE07_223084_20020923": ["all_bands.tif"],
},
},
"tm_sr": {
"0000002": {
"LE07_172034_20010526": ["all_bands.tif"],
"LE07_172034_20020310": ["all_bands.tif"],
"LE07_172034_20020902": ["all_bands.tif"],
"LE07_172034_20021121": ["all_bands.tif"],
},
"0000005": {
"LE07_223084_20010413": ["all_bands.tif"],
"LE07_223084_20011225": ["all_bands.tif"],
"LE07_223084_20020619": ["all_bands.tif"],
"LE07_223084_20020923": ["all_bands.tif"],
},
},
"etm_toa": {
"0000002": {
"LE07_172034_20010526": ["all_bands.tif"],
"LE07_172034_20020310": ["all_bands.tif"],
"LE07_172034_20020902": ["all_bands.tif"],
"LE07_172034_20021121": ["all_bands.tif"],
},
"0000005": {
"LE07_223084_20010413": ["all_bands.tif"],
"LE07_223084_20011225": ["all_bands.tif"],
"LE07_223084_20020619": ["all_bands.tif"],
"LE07_223084_20020923": ["all_bands.tif"],
},
},
"oli_tirs_toa": {
"0000002": {
"LC08_172034_20210306": ["all_bands.tif"],
"LC08_172034_20210829": ["all_bands.tif"],
"LC08_172034_20211203": ["all_bands.tif"],
"LC08_172034_20220715": ["all_bands.tif"],
},
"0000005": {
"LC08_223084_20210412": ["all_bands.tif"],
"LC08_223084_20211005": ["all_bands.tif"],
"LC08_223084_20220618": ["all_bands.tif"],
"LC08_223084_20221211": ["all_bands.tif"],
},
},
"oli_sr": {
"0000002": {
"LC08_172034_20210306": ["all_bands.tif"],
"LC08_172034_20210829": ["all_bands.tif"],
"LC08_172034_20211203": ["all_bands.tif"],
"LC08_172034_20220715": ["all_bands.tif"],
},
"0000005": {
"LC08_223084_20210412": ["all_bands.tif"],
"LC08_223084_20211005": ["all_bands.tif"],
"LC08_223084_20220618": ["all_bands.tif"],
"LC08_223084_20221211": ["all_bands.tif"],
},
},
}

num_bands = {"tm_toa": 7, "tm_sr": 6, "etm_toa": 9, "oli_tirs_toa": 11, "oli_sr": 7}


def create_file(path: str) -> None:
profile = {
"driver": "GTiff",
"dtype": "uint8",
"width": SIZE,
"height": SIZE,
"count": num_bands[path.split(os.sep)[1]],
"crs": CRS.from_epsg(4326),
"transform": Affine(
0.00033331040066238285,
0.0,
40.31409193350423,
0.0,
-0.0002658855613264443,
37.60408425220701,
),
"compress": "lzw",
"predictor": 2,
}

Z = np.random.randn(SIZE, SIZE).astype(profile["dtype"])

with rasterio.open(path, "w", **profile) as src:
for i in src.indexes:
src.write(Z, i)


def create_directory(directory: str, hierarchy: FILENAME_HIERARCHY) -> None:
if isinstance(hierarchy, dict):
# Recursive case
for key, value in hierarchy.items():
path = os.path.join(directory, key)
os.makedirs(path, exist_ok=True)
create_directory(path, value)
else:
# Base case
for value in hierarchy:
path = os.path.join(directory, value)
create_file(path)


if __name__ == "__main__":
create_directory(".", filenames)

directories = filenames.keys()
for directory in directories:
# Create tarballs
shutil.make_archive(directory, "gztar", ".", directory)

# Compute checksums
with open(f"{directory}.tar.gz", "rb") as f:
md5 = hashlib.md5(f.read()).hexdigest()
print(directory, md5)
Binary file added tests/data/ssl4eo/l/etm_toa.tar.gz
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added tests/data/ssl4eo/l/oli_sr.tar.gz
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added tests/data/ssl4eo/l/oli_tirs_toa.tar.gz
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added tests/data/ssl4eo/l/tm_sr.tar.gz
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added tests/data/ssl4eo/l/tm_toa.tar.gz
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading