-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AutoMLBenchmark TimeSeries Prototype. #6
AutoMLBenchmark TimeSeries Prototype. #6
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome contribution! Added some comments, once they are addressed I can test it out and see if it works properly, and if so we can merge.
frameworks/AutoGluonTS/exec.py
Outdated
@@ -61,16 +61,18 @@ def run(dataset, config): | |||
) | |||
|
|||
with Timer() as predict: | |||
predictions = predictor.predict(train_data) | |||
test_data_past = test_data.copy().slice_by_timestep(slice(None, -prediction_length)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do this prior to entering with Timer()
so it doesn't add to the recorded inference time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
frameworks/AutoGluonTS/exec.py
Outdated
@@ -88,28 +90,43 @@ def run(dataset, config): | |||
target_is_encoded=False, | |||
models_count=num_models_trained, | |||
training_duration=training.duration, | |||
predict_duration=predict.duration) | |||
predict_duration=predict.duration, | |||
quantiles=predictions.iloc[:, 1:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can make more explicit by dropping the 'mean' column by name rather than by assumed position. This makes it easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
timestamp_column = dataset.timestamp_column | ||
id_column = dataset.id_column | ||
prediction_length = dataset.prediction_length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can remove the TODOs that are resolved due to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
frameworks/AutoGluonTS/__init__.py
Outdated
timestamp_column=dataset.timestamp_column if dataset.timestamp_column is not None else None, | ||
id_column=dataset.id_column if dataset.id_column is not None else None, | ||
prediction_length=dataset.prediction_length if dataset.prediction_length is not None else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we instead check if dataset.timestamp_column
exists rather than if it is not None? ditto for the others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently these if/else don't actually do anything
amlb/results.py
Outdated
@@ -308,6 +313,16 @@ def save_predictions(dataset: Dataset, output_file: str, | |||
|
|||
df = df.assign(predictions=preds) | |||
df = df.assign(truth=truth) | |||
if quantiles is not None: | |||
quantiles.reset_index(drop=True, inplace=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is bold to do an inplace operation that alters outer context, let's be safe and avoid inplace operations. (I know that it is probably ok here, but trust me when I say that the nastiest bugs are those involving outer context manipulation caused by inplace operations)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
amlb/results.py
Outdated
quantiles.reset_index(drop=True, inplace=True) | ||
df = pd.concat([df, quantiles], axis=1) | ||
if dataset.type == DatasetType.timeseries: | ||
period_length = 1 # this period length could be adapted to the Dataset, but then we need to pass this information as well. As of now this should be fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a TODO style comment, then mark it as TODO so it isn't forgotten
@@ -39,11 +39,15 @@ def read_csv(path, nrows=None, header=True, index=False, as_data_frame=True, dty | |||
:param dtype: data type for columns. | |||
:return: a DataFrame | |||
""" | |||
if dtype is not None and timestamp_column is not None and timestamp_column in dtype: | |||
del dtype[timestamp_column] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid outer context manipulation, instead copy dtype to a new object and then delete timestamp_column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added copy()
@@ -255,7 +259,8 @@ def save_predictions(dataset: Dataset, output_file: str, | |||
predictions: Union[A, DF, S] = None, truth: Union[A, DF, S] = None, | |||
probabilities: Union[A, DF] = None, probabilities_labels: Union[list, A] = None, | |||
target_is_encoded: bool = False, | |||
preview: bool = True): | |||
preview: bool = True, | |||
quantiles: Union[A, DF] = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add quantiles to docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
amlb/results.py
Outdated
item_ids, inverse_item_ids = np.unique(dataset.test.X[dataset.id_column].squeeze().to_numpy(), return_index=False, return_inverse=True) | ||
y_past = [dataset.test.y.squeeze().to_numpy()[inverse_item_ids == i][:-dataset.prediction_length] for i in range(len(item_ids))] | ||
y_past_period_error = [np.abs(y_past_item[period_length:] - y_past_item[:-period_length]).mean() for y_past_item in y_past] | ||
y_past_period_error_rep = np.repeat(y_past_period_error, dataset.prediction_length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add inline comments to explain this, it is a lot to take in when unfamiliar.
@@ -16,6 +16,7 @@ venv/ | |||
.idea/ | |||
*.iml | |||
*.swp | |||
launch.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
visualcode creates it for debugging
if hasattr(dataset, 'timestamp_column') is False: | ||
dataset.timestamp_column = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This edits outer context of dataset. No need to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this update!!
* Add AutoGluon TimeSeries Prototype * AutoMLBenchmark TimeSeries Prototype. (#6) * fixed loading test & train, changed pred.-l. 5->30 * ignore launch.json of vscode * ensuring timestamp parsing * pass config, save pred, add results * remove unused code * add readability, remove slice from timer * ensure autogluonts has required info * add comments for readability * setting defaults for timeseries task * remove outer context manipulation * corrected spelling error for quantiles * adding mape, correct available metrics * beautify config options * fixed config for public access * Update readme * Autogluon timeseries, addressed comments by sebhrusen (#7) * fixed loading test & train, changed pred.-l. 5->30 * ignore launch.json of vscode * ensuring timestamp parsing * pass config, save pred, add results * remove unused code * add readability, remove slice from timer * ensure autogluonts has required info * add comments for readability * setting defaults for timeseries task * remove outer context manipulation * corrected spelling error for quantiles * adding mape, correct available metrics * beautify config options * fixed config for public access * no outer context manipulation, add dataset subdir * add more datasets * include error raising for too large pred. length. * mergin AutoGluonTS framework folder into AutoGluon * renaming ts.yaml to timeseries.yaml, plus ext. * removing presets, correct latest config for AGTS * move dataset timeseries ext to datasets/file.py * dont bypass test mode * move quantiles and y_past_period_error to opt_cols * remove whitespaces * deleting merge artifacts * delete merge artifacts * renaming prediction_length to forecast_range_in_steps * use public dataset, reduced range to maximum * fix format string works * fix key error bug, remove magic time limit * Addressed minor comments, and fixed version call for tabular and timeseries modularities (#8) * fixed loading test & train, changed pred.-l. 5->30 * ignore launch.json of vscode * ensuring timestamp parsing * pass config, save pred, add results * remove unused code * add readability, remove slice from timer * ensure autogluonts has required info * add comments for readability * setting defaults for timeseries task * remove outer context manipulation * corrected spelling error for quantiles * adding mape, correct available metrics * beautify config options * fixed config for public access * no outer context manipulation, add dataset subdir * add more datasets * include error raising for too large pred. length. * mergin AutoGluonTS framework folder into AutoGluon * renaming ts.yaml to timeseries.yaml, plus ext. * removing presets, correct latest config for AGTS * move dataset timeseries ext to datasets/file.py * dont bypass test mode * move quantiles and y_past_period_error to opt_cols * remove whitespaces * deleting merge artifacts * delete merge artifacts * renaming prediction_length to forecast_range_in_steps * use public dataset, reduced range to maximum * fix format string works * fix key error bug, remove magic time limit * swapped timeseries and tabular to set version * make warning message more explicit * remove outer context manipulation * split timeseries / tabular into functions Co-authored-by: Leo <LeonhardSommer96@gmail.com>
Done
FIXME: Why does leaderboard claim a different test score than AutoMLBenchmark for RMSE?
-> ensured correct data loading, with timestamp parsing and correct split into train and test dataset
FIXME: Currently ignoring test_path, just using train data for evaluation
-> done
TODO: How to evaluate more complex metrics like MAPE?
-> added metrics MASE, NCRPS, SMAPE, ND, NRMSE, for that created new TimeSeriesResult class with quantiles and y_past_period_error
How to pass timestamp_column?
How to pass id_column?
How to pass prediction_length?
-> added configuration and passed them to the Dataset object
TODO:
doublecheck metrics implementation
cumbersome testing to ensure not to break other things