Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to be able to aggregate over GROUP BY time within each series before aggregating across series #6907

Closed
MatMeredith opened this issue Jun 24, 2016 · 2 comments

Comments

@MatMeredith
Copy link

I've hit an issue that I think is going to prevent us from deploying InfluxDB. The following is an example of this issue, but in reality it manifests in different ways in many of our use cases.

I have a set of nodes each reporting the rate at which differnet SIP messages are being received (in terms of messages per s)

timestamp | tag_node | tag_message_type | value

      | node1    | INVITE             | 1728
      | node1    | BYE                | 1051
      | node2    | INVITE             | 1521
      | node2    | BYE                | 1241

Each node reports writes these stats to InfluxDB roughly every 5s.

I want to draw charts showing the total rate of different types of SIP messages being processed across my deployment, split by message_type.

Ideally I would do this like so:

SELECT sum("value") FROM "sipMessages" WHERE WHERE time > now() - 7d GROUP BY time(1h), "tag_message_type"

The problem here is that this sums across different values of node, but also sums multiple points within each series. So the number I get returned is ~720 times higher than it should be. If I knew that there were always going to be exactly 720 measurements in each time interval I could divide by 720, but this 720 is only an approximation. [it is also the case that using Grafana to draw the graphs this "1h" period changes automatically based on the period that I'm graphing over]

What I really need to do is to run an average aggregation within each series over each GROUP BY time period (so that for each node / message_type combination I have one data point per time period), and then sum the results of those aggregations across the different series.

I don't believe this is possibe though, and I can't see any way to work around it?

What I don't understand is why this isn't a critical issue for lots of users as this seems like a very standard use case. Is there a reason that other users are able to avoid this issue?

Would adding this capability (to do per series aggregation within "GROUP BY time" intervals) be technically difficult?

@srikara
Copy link

srikara commented Jun 24, 2016

+1

@jwilder
Copy link
Contributor

jwilder commented Nov 17, 2016

I believe this would be addressed by #4619

@jwilder jwilder closed this as completed Nov 17, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants