-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the possibility to use cross validation when training PyAF models #105
Comments
Classical PyAF modeling is a special case of this cross validation with 1 split (nfolds =5 , split = [1 2 3 4] [5] ). So the implementation should be made by adapting the existing code. Training each one of the splits is equivalent to training an old model. |
Hi, I've been watching your project for a while (mostly - I have been working on a similar project, which comes at this from a different perspective 😛). |
Thanks a lot for your interest in PyAF. Comments like these a re always welcome. Hope you enjoyed. Models with state/hidden components are not yet supported but if you look closely, PyAF is always evolving, Cross validation work started a year ago, its first implementation will be available in the few coming weeks. Can you please elaborate a little bit more on the second case (python example in a gist ?). Any docs/references ? |
I don't quite have the time to make a full example, I hope a block thing will work. :) Full Set: Train (same for all): Validation: If you only use stateless models, this is the same as validating on sets [N+1, ... 2N]. However, for stateful models, this means you will always be using [N*num_per_set] steps to "warm up" your model, and thus get consistent behavior (you'd do this in production, as well). As an alternative, you could use the following scheme for stateless models as well: Trains on [1 ... N], predicts [N+1] This will always give a "window", and again be consistent. However, the end use of these methods is different. 😃 |
The block thing is clear and very interesting ;). Will keep this aside for implementing support for stateful models. Do you have any book reference for this kind of stuff ? putting time series models in production etc. |
I'm going mainly by experience, sorry that I can't give any written reference. Cheers! |
Cheers! |
What about summarizing your experience in a github repository (markdown) ? I am also not aware of a written reference for this kind of stuff. Please think of this when you have some time. Thanks a lot. |
…105 Option. Cross validation control.
…105 The etst dataset is optional.
…105 The Test dataset is optional
…105 Added separate cSignalDecompositionTrainer and cSignalDecompositionTrainer_CrossValidation
…105 Added two tests for cross validation.
This is how to adapt the training process to activate the cross validation in PyAF (with 7 folds) : import pyaf.ForecastEngine as autof
lEngine = autof.cForecastEngine()
lEngine.mOptions.mCrossValidationOptions.mMethod = "TSCV";
lEngine.mOptions.mCrossValidationOptions.mNbFolds = 7
lEngine.train(ozone_dataframe , 'Month' , 'Ozone', 12);
lEngine.getModelInfo(); |
…105 Added a jupyter notebook with ozone case
…105 Added a jupyter notebook with air passengers case
Add the possibility to use cross validation when training PyAF models #105
FIXED!!!! |
Following the investigation performed in #53, implement a form of cross validation for PyAF models.
Specifications :
Cut the dataset in many folds according to a scikit-learn time series split :
http://scikit-learn.org/stable/modules/cross_validation.html#cross-validation
number of folds => user option (default = 10)
To have enough data, use only the last n/2 folds for estimating the models (thanks to forecast R package ;). The default splits look like this :
[5 ] [6]
[5 6 ] [7]
[5 6 7] [8]
[5 6 7 8] [9]
[5 6 7 8 9] [10]
Use the model decomposition type or formula as a hyperparameter and optimize it. select the decomposition(s) with the lowest mean MAPE on the validation datasets of all the possible splits.
Among all the chosen decompositions, select the model with lowest complexity (~ number of inputs)
Execute the procedure on the ozone and air passengers datsets and compare with the non-cross validation models (=> 2 jupyter notebooks)
The text was updated successfully, but these errors were encountered: