Need to be able to aggregate over GROUP BY time within each series before aggregating across series #6907

MatMeredith · 2016-06-24T12:10:54Z

I've hit an issue that I think is going to prevent us from deploying InfluxDB. The following is an example of this issue, but in reality it manifests in different ways in many of our use cases.

I have a set of nodes each reporting the rate at which differnet SIP messages are being received (in terms of messages per s)

timestamp | tag_node | tag_message_type | value

      | node1    | INVITE             | 1728
      | node1    | BYE                | 1051
      | node2    | INVITE             | 1521
      | node2    | BYE                | 1241

Each node reports writes these stats to InfluxDB roughly every 5s.

I want to draw charts showing the total rate of different types of SIP messages being processed across my deployment, split by message_type.

Ideally I would do this like so:

SELECT sum("value") FROM "sipMessages" WHERE WHERE time > now() - 7d GROUP BY time(1h), "tag_message_type"

The problem here is that this sums across different values of node, but also sums multiple points within each series. So the number I get returned is ~720 times higher than it should be. If I knew that there were always going to be exactly 720 measurements in each time interval I could divide by 720, but this 720 is only an approximation. [it is also the case that using Grafana to draw the graphs this "1h" period changes automatically based on the period that I'm graphing over]

What I really need to do is to run an average aggregation within each series over each GROUP BY time period (so that for each node / message_type combination I have one data point per time period), and then sum the results of those aggregations across the different series.

I don't believe this is possibe though, and I can't see any way to work around it?

What I don't understand is why this isn't a critical issue for lots of users as this seems like a very standard use case. Is there a reason that other users are able to avoid this issue?

Would adding this capability (to do per series aggregation within "GROUP BY time" intervals) be technically difficult?

The text was updated successfully, but these errors were encountered:

srikara · 2016-06-24T17:05:02Z

+1

jwilder · 2016-11-17T06:55:01Z

I believe this would be addressed by #4619

jwilder closed this as completed Nov 17, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need to be able to aggregate over GROUP BY time within each series before aggregating across series #6907

Need to be able to aggregate over GROUP BY time within each series before aggregating across series #6907

MatMeredith commented Jun 24, 2016

srikara commented Jun 24, 2016

jwilder commented Nov 17, 2016

Need to be able to aggregate over GROUP BY time within each series before aggregating across series #6907

Need to be able to aggregate over GROUP BY time within each series before aggregating across series #6907

Comments

MatMeredith commented Jun 24, 2016

timestamp | tag_node | tag_message_type | value

srikara commented Jun 24, 2016

jwilder commented Nov 17, 2016