torchchronos is an experimental PyTorch and Lightning compatible library that provides easy and flexible access to various time-series datasets for classification and regression tasks. It also provides a simple and extensible transform API to preprocess data. It is inspired by the much more complicated torchtime.
You can install torchchronos via pip:
pip install torchchronos
torchchronos currently provides access to several popular time-series datasets, including:
- UCR/UEA Time Series Classification Repository:
torchchronos.datasets.UCRUEADataset
- Time series as preprocessed in the TFC paper:
torchchronos.datasets.TFCPretrainDataset
(datasetsGesture
andEMG
)
To use a dataset, you can simply import the corresponding dataset class and create an instance:
from torchchronos.datasets import UCRUEADataset
from torchchronos.transforms import PadFront
from torchchronos.download import download_uea_ucr
download_uea_ucr("ECG5000",Path(".cache/data"))
dataset = UCRUEADataset('ECG5000', path=Path(".cache") / "data", transforms=PadFront(10))
torchchronos also provides Lightning compatible DataModules
to make it easy to load and preprocess data. They support common use cases like (multi-)GPU training and train/test/val-splitting out of the box. For example:
from torchchronos.lightning import UCRUEADataModule
from torchchronos.transforms import PadFront, PadBack
module = UCRUEAModule('ECG5000', split_ratio= (0.75, 0.15), batch_size= 32,
transforms=Compose([PadFront(10), PadBack(10)]))
Analogous the the datasets above, these dataloaders are supported as of now, wrapping the respective datasets:
torchchronos.lightning.UCRUEADataModule
torchchronos.lightning.TFCPretrainDataModule
torchchronos provides a flexible transform API to preprocess time-series data. For example, to normalize a dataset, you can define a custom Transform
like this:
from torchchronos.transforms import Transform
class Normalize(Transform):
def __init__(self, mean=None, std=None):
self.mean = mean
self.std = std
def fit(self, data) -> Self:
self.mean = data.mean()
self.std = data.std()
return self
def __call__(self, data):
return (data - self.mean) / self.std
- The dataset SpokenArabicDigits does not seem to work due to a missmatch of TRAIN and TEST size
- The dataset UrbanSound does not seem to work due to missing ts files
The following features are planned for future releases of torchchronos:
- Support for additional time-series datasets, including:
- Energy consumption dataset
- Traffic dataset
- PhysioNet Challenge 2012 (in-hospital mortality)
- PhysioNet Challenge 2019 (sepsis prediction) datasets
- Additional transform classes, including:
- Resampling
- Missing value imputation
If you have any feature requests or suggestions, please open an issue on our GitHub page.