Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a basic Holt-Winters algorithm to the query engine #6621

Merged
merged 1 commit into from
May 19, 2016

Conversation

nathanielc
Copy link
Contributor

@nathanielc nathanielc commented May 12, 2016

This PR adds a basic implementation of the Holt-Winters forecasting method. Specifically the Damped Additive Trend and Multiplicative Seasonal version.

Holt-Winters expects the data be on a regular time intervals as such an aggregate function and group by time are always required.

The basic usage is as follows assuming you are receiving values every 1m:

SELECT holt_winters(first(value), 10, 4) FROM mymeasurement WHERE time > now() - 1h GROUP BY time(1m)

The holt_winters method will return 10 forecasted points past the end of the selected data, for a total of 10m of data. The second argument is the seasonal value. In this example we are saying that the data is expected to have a repeating seasonal pattern every 4 data points (aka 4m). A value of 1 or less will disable seasonal evaluation, since a seasonal period repeating every 1 or fewer times has no meaning.

If you want the full fit data returned in addition to the forecasted data use holt_winters_with_fit

SELECT holt_winters_with_fit(first(value), 10, 4) FROM mymeasurement WHERE time > now() - 1h GROUP BY time(1m)

This will return the a total of 70 points spaced 1m apart. The first 60 points represent the fitted data from the holt_winters method and the last 10 points are the forecasted points, which are the same 10 points returned when holt_winters is called.

Holt-Winters will greatly benefit being able to phase shift group by boundaries since in many cases data points and seasonal patterns may not line up exactly with our default group by boundaries.

Questions:

  • Where should the neldermead package live? Inside influxql seems odd but works for now. Nelder-Mead is a numeric optimization(aka minimization) algorithm that is used by the holt_winters method.

TODO:

  • Rebased/mergable
  • Tests pass
  • CHANGELOG.md updated
  • Implement for other types beyond float64
  • Update functions_test.go to expect actual values.

@nathanielc nathanielc force-pushed the nc-holt-winters branch 2 times, most recently from 23adff7 to c02e53a Compare May 16, 2016 21:17
@nathanielc nathanielc changed the title [WIP] Add a basic Holt-Winters algorithm to the query engine Add a basic Holt-Winters algorithm to the query engine May 16, 2016
}
for i, arg := range expr.Args[1:3] {
if _, ok := arg.(*IntegerLiteral); !ok {
return fmt.Errorf("expected integer argument as %dth arg in %s", i+1, expr.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error message will be 1th arg or 2th arg. https://play.golang.org/p/GpCOR6nn_O

@e-dard
Copy link
Contributor

e-dard commented May 17, 2016

Aside from a couple of bits to address, overall the code looks good. Not sure about where neldermead package should live to be honest.

I have a more general question about the methodology for non-seasonal forecasting. Holt Winters is not good for non-seasonal forecasting but I see it's dynamically supported in this PR when the number of seasonal periods is fewer than 2. What approach are you using when not using the seasonal component?

@nathanielc
Copy link
Contributor Author

In answer to question 2, by removing the seasonal component of the forecasting the method reduces to a Holt like algorithm, specifically the Damped Additive Trend method in the family of exponential smoothing methods.

As for

Holt Winters is not good for non-seasonal forecasting

This is true if you try and apply the seasonal component to something non seasonal, but it works well if you turn off the seasonal component.

Specifically the included tests have two data sets, a seasonal data set and a non-seasonal dataset.

Here is a screen grab of a non-seasonal fit.

image

@nathanielc
Copy link
Contributor Author

Notice how the fit in this graph is always slightly higher than the actual data?

image

Query: SELECT holt_winters_with_fit(first("pop"), 4,0) FROM "uspop" WHERE ... GROUP BY time(90d)

Also notice that the fit line (yellow) starts before the data line (green)? This is because the fit line has to land on 90d boundaries, and the data is slightly offset from those points. If you were to slide the yellow line right by the difference you would see a much better fit.

This is what I mean that the holt_winters method will benefit from being able to shift group by boundaries in time.

@nathanielc
Copy link
Contributor Author

@e-dard @dgnorton Thanks for the review. How does it look now?

@e-dard
Copy link
Contributor

e-dard commented May 18, 2016

LGTM 👍

@nathanielc
Copy link
Contributor Author

FWIW The ability to slide group by boundaries as added in master already ;)

Here is a screenshot with a better fit using the group by offset.

image

Query: SELECT holt_winters_with_fit(first("pop"), 4,0) FROM "uspop" WHERE ... GROUP BY time(90d, 71d)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants