Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.0.0~beta1] .Time returns different values #627

Closed
ljagiello opened this issue Jun 9, 2016 · 9 comments
Closed

[1.0.0~beta1] .Time returns different values #627

ljagiello opened this issue Jun 9, 2016 · 9 comments

Comments

@ljagiello
Copy link
Contributor

Kapacitor: 1.0.0~beta1-1 (official deb package)
OS: Ubuntu 14.04 LTS

I was playing with holtWinters and I've noticed weird behaviour of .Time. It returns different values depends on how it was triggered.

TICK file (almost like example):

var growth_interval = 1h
var forecast_count = 10
var history = 2d

var threshold = 90.0

batch
    |query('''
    SELECT max(used_percent) as used_percent
    FROM "telegraf"."default"."disk"
    GROUP BY "host", "dc", "path"
''')
        .period(history)
        .every(growth_interval)
        .align()
        .groupBy(time(growth_interval), *)
    |holtWinters('used_percent', forecast_count, 0, growth_interval)
        .as('used_percent')
    |max('used_percent')
        .as('used_percent')
    |alert()
        .message('PREDICTION({{ .Time }}) - {{ .ID }} disk usage will be {{ .Level }} ({{ index .Fields "used_percent" | printf "%.2f"}}%) Take action now!')
        .crit(lambda: "used_percent" > threshold)
        .stateChangesOnly()
        .slack()

When I run:

kapacitor define disk_usage_forecast -tick disk_usage_forecast.tick -type batch -dbrp telegraf.default
kapacitor replay-live batch -task disk_usage_forecast -past 49h

Result is:

PREDICTION(2016-06-11 06:32:01.803806719 +0000 UTC) - disk:dc=XYZ,env=dev,fstype=ext4,host=somehostname,path=/data, disk usage will be CRITICAL (94.14%) Take action now! 

and that is correct (it's future time).

When kapacitor triggers task result is:

PREDICTION(2016-06-09 07:00:00 +0000 UTC) - disk:dc=XYZ,env=dev,fstype=ext4,host=somehostname,path=/data, disk usage will be CRITICAL (94.14%) Take action now! 

This is current time when task was triggered.

Question is how to keep .Time always the same?

@nathanielc
Copy link
Contributor

@ljagiello When using holt-winters it may return less data then asked if the prediction algorithm becomes unstable. This might be what you are seeing.

Can you add a |log() after the holtWinters call to see what data its returning?
Then we can confirm if the times themselves are changing or if less than expected data is being returned.

@ljagiello
Copy link
Contributor Author

@nathanielc quick question where I should see result of that |log()?

@ljagiello
Copy link
Contributor Author

That's DOT when it's triggered by kapacitord and Time is wrong

DOT:
digraph ops_disk_usage_forecast {
graph [throughput="0.00 batches/s"];

query1 [avg_exec_time_ns="0" batches_queried="179" connect_errors="0" points_queried="8537" query_errors="0" ];
query1 -> holt_winters2 [processed="179"];

holt_winters2 [avg_exec_time_ns="0" ];
holt_winters2 -> log3 [processed="179"];

log3 [avg_exec_time_ns="0" ];
log3 -> max4 [processed="179"];

max4 [avg_exec_time_ns="0" ];
max4 -> alert5 [processed="179"];

alert5 [alerts_triggered="4" avg_exec_time_ns="22.252µs" crits_triggered="4" infos_triggered="0" oks_triggered="0" warns_triggered="0" ];
}

@nathanielc
Copy link
Contributor

@ljagiello The output of |log() is with the Kapacitor daemon logs typically STDERR or /var/log/kapacitor/kapacitor.log.

@ljagiello
Copy link
Contributor Author

ljagiello commented Jun 10, 2016

Bad .Time:

[ops_disk_usage_forecast:log3] 2016/06/10 22:00:02 I!  {disk dc=XYZ,env=dev,fstype=ext4,host=test-host,path=/, 2016-06-10 22:00:00 +0000 UTC map[dc:XYZ env:dev fstype:ext4 host:test-host path:/] [{2016-06-10 22:00:00 +0000 UTC map[used_percent:90.44329273661894] map[]} {2016-06-10 22:00:00 +0000 UTC map[used_percent:90.44329273661894] map[]} {2016-06-10 22:00:00 +0000 UTC map[used_percent:90.44329273661894] map[]} {2016-06-10 22:00:00 +0000 UTC map[used_percent:90.44329273661894] map[]} {2016-06-10 22:00:00 +0000 UTC map[used_percent:90.44329273661894] map[]} {2016-06-10 22:00:00 +0000 UTC map[used_percent:90.44329273661894] map[]} {2016-06-10 22:00:00 +0000 UTC map[used_percent:90.44329273661894] map[]} {2016-06-10 22:00:00 +0000 UTC map[used_percent:90.44329273661894] map[]} {2016-06-10 22:00:00 +0000 UTC map[used_percent:90.44329273661894] map[]} {2016-06-10 22:00:00 +0000 UTC map[used_percent:90.44329273661894] map[]}]}

Good .Time:

[ops_disk_usage_forecast:log3] 2016/06/10 22:19:32 I!  {disk dc=XYZ,env=dev,fstype=ext4,host=test-host,path=/, 2016-06-12 22:19:30.965954558 +0000 UTC map[dc:XYZ env:dev fstype:ext4 host:test-host path:/] [{2016-06-12 22:19:30.965954558 +0000 UTC map[used_percent:90.44329273661894] map[]} {2016-06-12 22:19:30.965954558 +0000 UTC map[used_percent:90.44329273661894] map[]} {2016-06-12 22:19:30.965954558 +0000 UTC map[used_percent:90.44329273661894] map[]} {2016-06-12 22:19:30.965954558 +0000 UTC map[used_percent:90.44329273661894] map[]} {2016-06-12 22:19:30.965954558 +0000 UTC map[used_percent:90.44329273661894] map[]} {2016-06-12 22:19:30.965954558 +0000 UTC map[used_percent:90.44329273661894] map[]} {2016-06-12 22:19:30.965954558 +0000 UTC map[used_percent:90.44329273661894] map[]} {2016-06-12 22:19:30.965954558 +0000 UTC map[used_percent:90.44329273661894] map[]} {2016-06-12 22:19:30.965954558 +0000 UTC map[used_percent:90.44329273661894] map[]} {2016-06-12 22:19:30.965954558 +0000 UTC map[used_percent:90.44329273661894] map[]}]}

@nathanielc
Copy link
Contributor

nathanielc commented Jun 13, 2016

@ljagiello Yeah, something is definitely odd about that. Could you create a recording of your test data and share it with me so I can play around with it locally?

kapacitor record batch -task ops_disk_usage_forecast -past 5d

A file with the recording ID will be placed in /var/lib/kapacitor/replay/ID.brpl or wherever you have configured it. If you can send me that file I can reproduce the results locally and get to the bottom of the issue.

@ljagiello
Copy link
Contributor Author

@nathanielc
Copy link
Contributor

@ljagiello Thanks for the replay file it was useful in fixing the issue. I have merged a fix. Can you confirm that its working as expected? The fix will be in the next nightly build and part of the 1.0.0-beta3 release.

@ljagiello
Copy link
Contributor Author

@nathanielc I build the latest master and behaviour is still the same.

|log() is a bit more verbose now:
Correct .Time value triggered with kapacitor replay-live batch -task ops_disk_usage_forecast -past 49h

[ops_disk_usage_forecast:log3] 2016/06/27 21:38:49 I!  {disk dc=XYZ,env=dev,fstype=ext4,host=devhost,path=/data, 2016-06-29 21:38:46.377652791 +0000 UTC map[dc:XYZ env:dev fstype:ext4 host:devhost path:/data] [{2016-06-29 22:38:46.377652791 +0000 UTC map[used_percent:94.81936286042523] map[dc:XYZ env:dev fstype:ext4 host:devhost path:/data]} {2016-06-29 23:38:46.377652791 +0000 UTC map[used_percent:94.81936286042523] map[dc:XYZ env:dev fstype:ext4 host:devhost path:/data]} {2016-06-30 00:38:46.377652791 +0000 UTC map[used_percent:94.81936286042523] map[env:dev fstype:ext4 host:devhost path:/data dc:XYZ]} {2016-06-30 01:38:46.377652791 +0000 UTC map[used_percent:94.81936286042523] map[path:/data dc:XYZ env:dev fstype:ext4 host:devhost]} {2016-06-30 02:38:46.377652791 +0000 UTC map[used_percent:94.81936286042523] map[env:dev fstype:ext4 host:devhost path:/data dc:XYZ]} {2016-06-30 03:38:46.377652791 +0000 UTC map[used_percent:94.81936286042523] map[fstype:ext4 host:devhost path:/data dc:XYZ env:dev]} {2016-06-30 04:38:46.377652791 +0000 UTC map[used_percent:94.81936286042523] map[dc:XYZ env:dev fstype:ext4 host:devhost path:/data]} {2016-06-30 05:38:46.377652791 +0000 UTC map[used_percent:94.81936286042523] map[path:/data dc:XYZ env:dev fstype:ext4 host:devhost]} {2016-06-30 06:38:46.377652791 +0000 UTC map[used_percent:94.81936286042523] map[dc:XYZ env:dev fstype:ext4 host:devhost path:/data]} {2016-06-30 07:38:46.377652791 +0000 UTC map[used_percent:94.81936286042523] map[dc:XYZ env:dev fstype:ext4 host:devhost path:/data]}]}

That's the result of the same test triggered by kapacitord:

[ops_disk_usage_forecast:log3] 2016/06/27 21:50:02 I!  {disk dc=XYZ,env=dev,fstype=ext4,host=devhost,path=/data, 2016-06-27 21:50:00 +0000 UTC map[dc:XYZ env:dev fstype:ext4 host:devhost path:/data] [{2016-06-27 22:00:00 +0000 UTC map[used_percent:94.81936286042523] map[host:devhost path:/data dc:XYZ env:dev fstype:ext4]} {2016-06-27 23:00:00 +0000 UTC map[used_percent:94.81936286042523] map[host:devhost path:/data dc:XYZ env:dev fstype:ext4]} {2016-06-28 00:00:00 +0000 UTC map[used_percent:94.81936286042523] map[path:/data dc:XYZ env:dev fstype:ext4 host:devhost]} {2016-06-28 01:00:00 +0000 UTC map[used_percent:94.81936286042523] map[dc:XYZ env:dev fstype:ext4 host:devhost path:/data]} {2016-06-28 02:00:00 +0000 UTC map[used_percent:94.81936286042523] map[dc:XYZ env:dev fstype:ext4 host:devhost path:/data]} {2016-06-28 03:00:00 +0000 UTC map[used_percent:94.81936286042523] map[dc:XYZ env:dev fstype:ext4 host:devhost path:/data]} {2016-06-28 04:00:00 +0000 UTC map[used_percent:94.81936286042523] map[fstype:ext4 host:devhost path:/data dc:XYZ env:dev]} {2016-06-28 05:00:00 +0000 UTC map[used_percent:94.81936286042523] map[env:dev fstype:ext4 host:devhost path:/data dc:XYZ]} {2016-06-28 06:00:00 +0000 UTC map[used_percent:94.81936286042523] map[dc:XYZ env:dev fstype:ext4 host:devhost path:/data]} {2016-06-28 07:00:00 +0000 UTC map[used_percent:94.81936286042523] map[host:devhost path:/data dc:XYZ env:dev fstype:ext4]}]}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants