Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add stats to tasks #323

Merged
merged 1 commit into from
Mar 18, 2016
Merged

Conversation

yosiat
Copy link
Contributor

@yosiat yosiat commented Mar 12, 2016

See issue - #321

Currently I implemented the input nodes, I am waiting for feedback about this and a suggestion of how to implement the metrics fetching for the outputs.

@yosiat yosiat changed the title Add stats to tasks [WIP] Add stats to tasks Mar 12, 2016
@yosiat yosiat mentioned this pull request Mar 12, 2016
@nathanielc
Copy link
Contributor

@yosiat This is great work thanks!

A few administrative things: Can you sing the CLA? and add an entry to the CHANGELOG about this change? Thanks

I like the structure that you have added but I think it would be easy and valuable to expose more stats.

First what stats are you after?

I see two paths we could go down:

  1. Provide just summary stats about the task. Similar to the throughput stat that already exists/
  2. Expose all stats from all nodes so that the DOT data does not need to be parsed.

We can even do both.. And I think we should. I'll comment directly in the code where I think we could update what you have already.

// Input nodes and their emitted points
// where input nodes - are batch/stream node
// for example: stream1 -> 12
Inputs map[string]int64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update the struct to have both summary stats for the task and a map of all stats for the all nodes in the task.

maybe a structure like

type ExecutionStats struct {
    TaskStats map[string]interface{} // stats name -> value i.e throughput -> 4.3 points /sec
    NodeStats map[string]map[string]interface{} // node name -> stat name -> value i.e influxdb_out -> point_written -> 1000
}

Then use the statsMap that each node has to populate the node stats maps.

Thoughts?

@yosiat
Copy link
Contributor Author

yosiat commented Mar 14, 2016

@nathanielc Thanks!

I already signed the CLA (https://influxdb.com/community/cla.html) and I will add an entry to the change log once I will finish the output nodes as well.

About the structure of stats I will change it but before, I want to clarify the reason for this API - I want to expose general stats - "stats for glance" on all tasks, I don't want to extract all the data from the dot (for example: how much each node processed and how much time it took).

I think it should contain:

  • Task stats
    • Throughput of the task - currently the throughput is all Kapacitor throughput I want to know how point s the whole pipeline gets (only those whom filtered by the measurement, db & rp)
    • Filtered points count - how much points we filtered
    • Total points we processed
  • Input and Output stats - this the "glance" data of the nodes - I want to know how much points I processed and how much I outputed (alerts/written to influx) (and this why I did direct mapping from node name to it's output value)

But in the overall sense I agree this structure is more acceptable:

type ExecutionStats struct {
    TaskStats map[string]interface{}
    NodeStats struct {
     Input map[string]map[string]interface{}
     Output map[string]map[string]interface{}
    }
}

I changed NodeStats to be struct that will contain only the Input and Output to note that "for all nodes stats - there is a dot graph, for general data (like input & output), this is the place.

About the values - why this is interface{} and not floats, I think those metrics should be raw value like "789" and not "789 points/s" - the resolution should be noted on the stat name for example "PointsPerSecond" => 789.

Aside the structure, I don't know how to implement the output nodes stats (as I wrote in the issue), currently I think to fetch all output nodes by predicate: node that have parents and doesn't have child nodes. but I don't know how to expose the output stats -

  1. should it be a function that will expose the "statsMap" or map[string]float ?
  2. and where this function will be? on "Node" or add another interface for "OutputNodes" ?

@yosiat
Copy link
Contributor Author

yosiat commented Mar 16, 2016

@nathanielc any comment? do you want me to clarify my answer better? currently I am waiting to your response before proceeding on.

@nathanielc
Copy link
Contributor

@yosiat Sorry for the delay.

I like the plan, stats at a glance is a good approach. I am a little confused about the Input, Output stats. Are input stats just counts for how many points a node has received? If so why a map its just a single count per node? Same question for output stats.

About the values - why this is interface{} and not floats, I think those metrics should be raw value like "789" and not "789 points/s" - the resolution should be noted on the stat name for example "PointsPerSecond" => 789.

Agreed, the interface is so you can use int64 or float64 but should always be one or the other.

@yosiat
Copy link
Contributor Author

yosiat commented Mar 16, 2016

@nathanielc it's ok :)
The "inputs" and "outputs" are plural because I want to handle edge cases where user might write something like this:

var a = stream.from('a')
var b= stream.from('b')

a.join(b)
  // do some alerting logic

And output might be creating an alert and writing to influx

What do you think about the output nodes implementation? what do you prefer - specific function or exposing the statsMap?

@yosiat
Copy link
Contributor Author

yosiat commented Mar 18, 2016

Changed the ExecutionStats structure, waiting to @nathanielc to answer on how to implement the output nodes.

@nathanielc
Copy link
Contributor

@yosiat I think simpler is better here, since we are doing stats at a glance.

// Stats about the execution of a task
type ExecutionStats struct {
    // Stats global to the task
    // throughput etc...
    TaskStats map[string]interface{}
    // Specific node stats,
    // Emitted, Collected counts plus anything in statsMap.
    NodeStats map[string]interface{}
}

NodeStats will be populated by whatever is in statsMap using statsMap.Do and the add in collected and emitted stats using the node.collectedCount and node.emittedCount stats.

Populating TaskStats will be a bit more work since it doesn't already have a framework in place. We should probably add a statsMap to ExecutingTask too if it make sense. Otherwise we can populate TaskStats by aggregating other stats from nodes.

This should get all the stats we were looking for.

@yosiat
Copy link
Contributor Author

yosiat commented Mar 18, 2016

@nathanielc So you want ExecutionStats to contain all nodes? If so, what is the difference between this and a Dot?

@nathanielc
Copy link
Contributor

@yosiat Its exposed via HTTP in JSON so you do not have to parse DOT since its not a good method for conveying numerical data.

@yosiat
Copy link
Contributor Author

yosiat commented Mar 18, 2016

@nathanielc ok, I am doing this right now.
Just to make sure, you wrote:

NodeStats map[string]interface{}

instead of:

NodeStats map[string]map[string]interface{}

Was it by mistake?

@nathanielc
Copy link
Contributor

@yosiat Yes is should be NodeStats map[string][string]interface{}

@yosiat
Copy link
Contributor Author

yosiat commented Mar 18, 2016

@nathanielc I finished.

Here is the final result:

{
    Name: "cpu_alert",
    Type: 0,
    DBRPs: [{
        db: "kapacitor_example",
        rp: "default"
    }],
    Enabled: true,
    Executing: true,
    ExecutionStats: {
        TaskStats: {
            throuput: 0
        },
        NodeStats: {
            alert2: {
                alerts_triggered: "0",
                avg_exec_time_ns: "0",
                collected: 0,
                emitted: 0
            },
            srcstream0: {
                avg_exec_time_ns: "0",
                collected: 0,
                emitted: 0
            },
            stream1: {
                avg_exec_time_ns: "0",
                collected: 0,
                emitted: 0
            }
        }
    }
}

Do you know how change statsMap representation to be numbers and not strings?

@nathanielc
Copy link
Contributor

Do you know how change statsMap representation to be numbers and not strings?

Yes , have a look at https://github.com/influxdata/kapacitor/blob/master/global_stats.go#L132
Kapacitor has created a bunch of special expvar types just for that purpose.

Also quick typo fix throuput -> throughput. Looks great.

@yosiat
Copy link
Contributor Author

yosiat commented Mar 18, 2016

@nathanielc done.
Do you want me to squash the commits?

@nathanielc
Copy link
Contributor

@yosiat yes, please.

@yosiat
Copy link
Contributor Author

yosiat commented Mar 18, 2016

@nathanielc done 👍

@@ -246,6 +250,30 @@ func (n *node) collectedCount() (count int64) {
return
}

func (n *node) emittedCount() (count int64) {
for _, out := range n.outs {
count += out.emittedCount()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to read count += out.collectedCount() see comment in node.collectedCount

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain? I don't see any comment on node.collectedCount

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the comment is gone, what it said was:

node collected count is the sum of emitted counts of parent edges

The opposite is true for node emitted count:

node emitted count is the sum of collected counts of children edges

Edges can buffer so we care about the actual points this node has collected or emitted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed 👍

@yosiat yosiat force-pushed the tasks-metrics branch 2 times, most recently from ee88d5b to 19d0ae1 Compare March 18, 2016 17:48
@@ -246,6 +250,30 @@ func (n *node) collectedCount() (count int64) {
return
}

func (n *node) emittedCount() (count int64) {
for _, out := range n.outs {
count += out.collectedCount()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add those comments in so its not confusing for the next guy? Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nathanielc Just to make sure:
node collected count is the sum of emitted counts of parent edges above collectedCount
node emitted count is the sum of collected counts of children edges above emittedCount

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep
On Mar 18, 2016 11:55 AM, "Yosi Attias" notifications@github.com wrote:

In node.go
#323 (comment):

@@ -246,6 +250,30 @@ func (n *node) collectedCount() (count int64) {
return
}

+func (n *node) emittedCount() (count int64) {

  • for _, out := range n.outs {
  •   count += out.collectedCount()
    

Just to make sure:
node collected count is the sum of emitted counts of parent edges above
collectedCount
node emitted count is the sum of collected counts of children edges above
emittedCount

?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/influxdata/kapacitor/pull/323/files/19d0ae1ab9e484ab4420b8f8901013506cb50800#r56696878

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done :)

@nathanielc
Copy link
Contributor

@yosiat Thanks! merging...

nathanielc pushed a commit that referenced this pull request Mar 18, 2016
Add stats to executing tasks
@nathanielc nathanielc merged commit 83906b4 into influxdata:master Mar 18, 2016
@yosiat yosiat deleted the tasks-metrics branch March 18, 2016 19:57
@yosiat
Copy link
Contributor Author

yosiat commented Mar 18, 2016

@nathanielc I forgot to add a change log entry.

@nathanielc
Copy link
Contributor

I added no worries

On Fri, Mar 18, 2016 at 1:58 PM, Yosi Attias notifications@github.com
wrote:

@nathanielc https://github.com/nathanielc I forgot to add a change log
entry.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#323 (comment)

Nathaniel Cook
Kapacitor Lead
▼▴

InfluxData.com http://influxdata.com/

Twitter http://www.twitter.com/nathanielvcook

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants