Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Group By fields #624

Closed
nathanielc opened this issue Jun 8, 2016 · 6 comments · Fixed by #731
Closed

Support Group By fields #624

nathanielc opened this issue Jun 8, 2016 · 6 comments · Fixed by #731

Comments

@nathanielc
Copy link
Contributor

Kapacitor only allows you to group by tag values. It seems useful to enable Kapacitor to be able to group by fields. This will introduce the potential for high cardinality issues but after #610 we will have a solution to help relieve that pressure.

Also this fix should probably incorporate a fix for #466

@nathanielc
Copy link
Contributor Author

nathanielc commented Jul 20, 2016

I have started working on this and I think I want to approach it slightly differently.

Instead of allowing fields to be specified as a groupBy dimension, rather allow fields to be converted into tags. The end result is the same but since the conversion from field to tag can be complicated or even invalid I would rather that conversion be explicit.

For example:

stream
     |from()
         .measurement('usage')
     |eval(lambda: "version")
        .as('version')
        .tag('version')
     |groupBy('version')

But what if the field wasn't a string to begin with. You could for example group ints or floats into buckets.

stream
     |from()
         .measurement('cpu')
     |eval(lambda: string(floor("idle" /  10.0) * 10.0))
        .as('idle_bucket')
        .asTags('idle_bucket')
     |groupBy('idle_bucket')

Or if the value is an int with low cardinality like an enum:

stream
     |from()
         .measurement('service')
     |eval(lambda: string("status"))
        .as('status')
        .asTags('status')
     |groupBy('status')

Or multiple fields into tags at once.

stream
     |from()
         .measurement('service')
     |eval(lambda: string("status"), lambda: string("isalive"))
        .as('status', 'isalive')
        .asTags('status', 'isalive')
     |groupBy('status', 'isalive')

Or doing something invalid will log an error. For example given that "idle" is a float, if its not explicitly converted it will raise an error.

stream
     |from()
         .measurement('cpu')
     |eval(lambda: "idle") // field idle is a float, will cause error since tags must be strings.
        .as('idle')
        .asTags('idle')
     |groupBy('idle')

Using this approach provides greater flexibility and control over how a field becomes a tag. Its also useful beyond simply grouping by fields.

Does the asTags method read well?

@nathanielc nathanielc mentioned this issue Jul 20, 2016
3 tasks
@nathanielc
Copy link
Contributor Author

@gunnaraasen @rkuchan @rossmcdonald Do you have any thoughts on this? How to make it more clear that the result of an eval is a tag?

@gunnaraasen
Copy link
Member

I like the flexibility of this approach, although I can't think of any other alternatives off the top of my head.

What do you think about implicitly casting everything passed through asTags as a string? Seems like the only benefit to throwing an error would be (maybe?) helping someone catch a potential high cardinality situation.

I'm not a huge fan of the name asTags. The as prefix doesn't really seem to follow the naming convention of other properties in Kapacitor. How about using tags or newTags or addTags?

Does grouping fields into tags drop the field?

@nathanielc
Copy link
Contributor Author

nathanielc commented Jul 21, 2016

@gunnaraasen Thanks for the input.

I like .tags as it less to read and makes sense in this context, that the result is saved as a tag.

Does grouping fields into tags drop the field?

Ah, good question. In most cases I would assume yes, because it is redundant information at that point. But in some cases like bucket example above it could make sense to keep the field. To make it simple to reason about I say we let that be decided via the keep property.

Examples:

stream
     |from()
         .measurement('cpu')
     |eval(lambda: string(floor("idle" /  10.0) * 10.0))
        .as('idle_bucket')
        .tags('idle_bucket')
        .keep() // keep all fields, resulting point has field `idle` and tag `idle_bucket`
     |groupBy('idle_bucket')
stream
     |from()
         .measurement('cpu')
     // point has fields `idle`, and `user`
     |eval(lambda: string(floor("idle" /  10.0) * 10.0))
        .as('idle_bucket')
        .tags('idle_bucket')
        .keep('idle') // keep only the field `idle`, resulting point has field `idle` and tag `idle_bucket`, the field `user` was dropped.  NOTE tags are always kept.
     |groupBy('idle_bucket')

What do you think about implicitly casting everything passed through asTags as a string?

I like it because it will allow us to type check the expression and provide useful errors to the user, when they define the task as opposed to when they enable the task for the first time. There is change that needs to happen before that will work, but long term it will work, so better set the correct expectation now.

@gunnaraasen
Copy link
Member

+1 to .tags

I like the idea of choosing whether to keep the field or not, but the keep name feels a little ambiguous. What's the canonical way to drop tags and fields right now?

I agree that type checking the expression to see if it returns a string is useful. Would it be possible to return an error when the task is defined instead of during runtime?

@nathanielc
Copy link
Contributor Author

nathanielc commented Jul 25, 2016

Would it be possible to return an error when the task is defined instead of during runtime?

Not yet, but once we implement typed functions in TICKscript then it will be possible.

What's the canonical way to drop tags and fields right now?

There isn't, hence I have added a DeleteNode as part of this change.

|delete()
    .field('value')
    .tag('host')

it mirrors well with the Default node

|default()
    .field('value', 42.0)
    .tag('host', 'unknown')

The eval node has always had the keep option. Since it can create new field (and now tags), we needed an explicit way to mark which fields you want to keep, and which you want to drop.

beckettsean added a commit that referenced this issue Aug 30, 2016
nathanielc pushed a commit that referenced this issue Aug 31, 2016
* moving docs #624 to kap

* Update window.go

* Update window.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants