Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

groupBy node doesn't have expected behaviour #423

Closed
jonseymour opened this issue Apr 7, 2016 · 6 comments
Closed

groupBy node doesn't have expected behaviour #423

jonseymour opened this issue Apr 7, 2016 · 6 comments

Comments

@jonseymour
Copy link
Contributor

I defined this tick script:

var aggregate = stream
    |from()
        .database('sampledb')
        .retentionPolicy('default')
        .measurement('sample')
    |window()
        .align()
        .period(5m)
        .every(5m)
    |groupBy('node')
    |mean('metric1').as('metric1')
    |influxDBOut()
        .database('sampledb')
        .measurement('actual')

My expectation is that when this script ran, it would produce the equivalent of:

select mean(metric1) as metric1 
into actual 
from sample 
where time >= '2016-03-08 05:00:00' and time < '2016-03-08 08:00:00' 
group by time(5m), *

However, what actually happened is that only one of the "node" groups was written into "actual" - the other groups were discarded.

Is my expectation of what the groupBy node is meant to do incorrect or is this a bug?

I have a Vagrantfile project that includes test data and reproduces the problem available here: https://github.com/wildducktheories/influxdata-testcase

I was using kapacitor 0.12 and influxdb 0.11

@jonseymour
Copy link
Contributor Author

Ok, so I tweaked the test to log after the window() node and sure enough that's what the issue is - the data in the recording is sorted by node first, then time and so the window node skips all the records from subsequent nodes.

@nathanielc - is there a way to ensure that queries are stored in timestamp sorted order?

@nathanielc
Copy link
Contributor

@jonseymour Hmm, is seems odd that a recording is not sorted by time. I thought that was impossible. I'll have to go check.

Any way you probably want to groupBy before the window anyway since then the windows will be windowed per group. It shouldn't make a difference but under the hood it will be more efficient.

var aggregate = stream
    |from()
        .database('sampledb')
        .retentionPolicy('default')
        .measurement('sample')
        .groupBy('node')
    |window()
        .align()
        .period(5m)
        .every(5m)
    |mean('metric1').as('metric1')
    |influxDBOut()
        .database('sampledb')
        .measurement('actual')

@jonseymour
Copy link
Contributor Author

Ok, so I was deceived by the documentation here.

The distinction between groupBy() the property of a stream node and groupBy() the chaining method wasn't clear from reading the documentation.

(The groupBy() as property isn't really documented, although the example for the chaining method does, somewhat confusingly, include an example of the use of the property syntax).

@jonseymour
Copy link
Contributor Author

@nathanielc I've created a PR against the documentation influxdata/docs.influxdata.com-ARCHIVE#379 to fix the documentation issue.

Even if this solution works, it isn't obvious to me why the other solution didn't work.

Do you want to leave this issue open until you have had a chance to check why the window() first technique didn't work?

@nathanielc
Copy link
Contributor

Yes, let's leave it open.Thanks
On Apr 7, 2016 6:00 PM, "Jon Seymour" notifications@github.com wrote:

@nathanielc https://github.com/nathanielc I've created a PR against the
documentation influxdata/docs.influxdata.com-ARCHIVE#379
influxdata/docs.influxdata.com-ARCHIVE#379 to fix the
documentation issue.

Even if this solution works, it isn't obvious to me why the other solution
didn't work.

Do you want to leave this issue open until you have had a chance to check
why the window() first technique didn't work?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#423 (comment)

@rolele
Copy link

rolele commented Oct 25, 2016

@jonseymour very interesting test-case thanks.
Can you tell me how you got the query that get produced by the tick script?
I try to debug my scripts but I find it very difficult.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants