This is a collection of 22 time series features contained in the hctsa toolbox coded in C.
Features were selected by their classification performance across a collection of 93 real-world time-series classification problems (according to the op_importance
repository).
NOTE: The included features only evaluate dynamical properties of time series and do not respond to basic differences in the location (e.g., mean) or spread (e.g., variance). If you think features of the raw distribution may be important for your application, we suggest you add them (in the simplest case, the mean and standard deviation) to this feature set.
For information on how this feature set was constructed see our open-access 🔓 paper:
- C.H. Lubba, S.S. Sethi, P. Knaute, S.R. Schultz, B.D. Fulcher, N.S. Jones. catch22: CAnonical Time-series CHaracteristics. Data Mining and Knowledge Discovery 33, 1821 (2019).
For information on the full set of over 7000 features, see the following (open 🔓) publications:
- B.D. Fulcher and N.S. Jones. hctsa: A computational framework for automated time-series phenotyping using massive feature extraction. Cell Systems 5, 527 (2017).
- B.D. Fulcher, M.A. Little, N.S. Jones Highly comparative time-series analysis: the empirical structure of time series and their methods. J. Roy. Soc. Interface 10, 83 (2013).
The fast C-coded functions in this repository can be used in Python, Matlab, and R following the instructions below. Time series are z-scored internally which means e.g., constant time series will lead to NaN outputs. The wrappers for Matlab and Python run using either GCC or MSVC as compiler. The R wrapper so far only runs using GCC and was only tested on OS X.
Installation of the Python wrapper differs slightly between Python 2 and 3.
Manual installation through distutils
python3 setup_P3.py build
python3 setup_P3.py install
Or using pip
pip install catch22
Go to the directory wrap_Python
and run the following
python setup.py build
python setup.py install
or alternatively, using pip, go to main directory and run
pip install -e wrap_Python
To test that the catch22 wrapper was installed successfully and works run (NB: replace python
with python3
for Python 3):
$ python testing.py
The module is now available under the name catch22
.
Each feature function can be accessed individually and takes arrays as tuple or lists (not numpy
arrays).
E.g., for loaded data, tsData
in Python:
import catch22
catch22.CO_f1ecac(tsData)
All features are bundelled in the method catch22_all
which also accepts numpy
arrays and gives back a dictionary containing the entries catch22_all['names']
for feature names and catch22_all['values']
for feature outputs.
from catch22 import catch22_all
catch22_all(tsData)
This assumes your have R
installed and the package Rcpp
is available.
Clang is required.
Copy all .c
- and .h
-files from ./C
to ./wrap_R/catch22/src
.
Then go to the directory ./wrap_R
and run the following two lines while replacing x.y
by the current version number
R CMD build catch22
R CMD INSTALL catch22_x.y.tar.gz
To test if the installation was successful, navigate to ./wrap_R
in the console and run:
$ Rscript testing.R
The module is now available in R
as catch22
. Single functions can be accessed by their name, all functions are bundelled as catch22_all
which can be called with a data vector tsData
as an argument and gives back a data frame with the variables name
for feature names and values
for feature outputs:
library(catch22)
catch22_out = catch22_all(tsData);
print(catch22_out)
Go to the wrap_Matlab
directory and call mexAll
from within Matlab.
Include the folder in your Matlab path to use the package.
To test, navigate to the wrap_Matlab
directory from within Matlab and run:
testing
All feature can be called individually, e.g., catch22_CO_f1ecac
.
Alternatively, all features are bundeled in a function catch22_all
which returns an array of feature outputs and, as a second output, a cell array of feature names.
With loaded data tsData
:
[vals, names] = catch22_all(data);
gcc -o run_features main.c CO_AutoCorr.c DN_HistogramMode_10.c DN_HistogramMode_5.c DN_OutlierInclude.c FC_LocalSimple.c IN_AutoMutualInfoStats.c MD_hrv.c PD_PeriodicityWang.c SB_BinaryStats.c SB_CoarseGrain.c SB_MotifThree.c SB_TransitionMatrix.c SC_FluctAnal.c SP_Summaries.c butterworth.c fft.c helper_functions.c histcounts.c splinefit.c stats.c
As for OS X but with -lm
switch in from of every source-file name.
The compiled run_features
program only takes one time series at a time. Usage is ./run_features <infile> <outfile>
in the terminal, where specifying <outfile>
is optional, it prints to stdout
by default.
For multiple time series, put them – one file for each – into a folder timeSeries
and call ./runAllTS.sh
.
The output will be written into a folder featureOutput
.
Change the permissions of runAllTS.sh
to executable by calling chmod 755 runAllTS.sh
.
Each line of the output correponds to one feature; the three comma-separated entries per line correspond to feature value, feature name and feature execution time in milliseconds. For example:
0.29910714285714, CO_Embed2_Basic_tau.incircle_1, 0.341000
0.57589285714286, CO_Embed2_Basic_tau.incircle_2, 0.296000
...
Sample outputs for the time series test.txt
and test2.txt
are provided as test_output.txt
and test2_output.txt
.
The first two entries per line should always be the same.
The third one (execution time) will be different.