-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more math functions to influxql #9620
Add more math functions to influxql #9620
Conversation
I'm going to review this, but before I do, I just want to point out this: 🎉 🎉 🎉 I'm very happy. I hope this refactor made it much easier to implement these for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks good and thank you so much for helping us to add these functions.
Would it be possible for you to remove exp(x)
though? We are still discussing internally how we should handle special constants like π and I think e fits into that same discussion. I think it would be good for us to make an issue for supporting π and e (especially after this is merged) so we can have a public discussion about what's best, but I don't want to block the other functions.
Another one is log()
. @rbetts @timhallinflux do you have any specific opinion about log()
? I was thinking we should have ln(x)
and log(x, y)
instead, but I'm not sure what is more commonly expected by users and the change of base formula is pretty easy. I'm just used to those being the functions on calculators.
query/compile.go
Outdated
} | ||
|
||
// Compile all the argument expressions that are not just literals | ||
for _, arg := range expr.Args { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just do this instead?
if _, ok := arg.(influxql.Literal); ok {
continue
}
// stuff
I don't think the switch is necessary and I think it hurts the readability.
query/compile.go
Outdated
return fmt.Errorf("invalid number of arguments for %s, expected %d, got %d", expr.Name, exp, got) | ||
|
||
// How many arguments are we expecting? | ||
var nargs int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nargs := 1
switch expr.Name {
case "atan2", "pow":
nargs = 2
}
It makes it so the default clause isn't necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be neater, but I saw this: https://github.com/influxdata/influxdb/blob/master/CODING_GUIDELINES.md#always-include-a-default-case-in-a-switch-statement
query/compile.go
Outdated
} | ||
return c.validateCondition(expr.Args[0]) | ||
return nil | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra newline here.
query/math.go
Outdated
switch name { | ||
case "abs": | ||
fn = math.Abs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to be more careful with this. Most of these functions operate the same as the trigonometry functions. For example, log()
and its derivatives should always return a float. I don't think it makes sense for them not to. But some of these methods it doesn't necessarily make sense. For example, if I have abs(value::integer)
I don't think it makes sense for it to return a float. I also don't think pow(value::integer, 2)
makes sense to return a float either. If I were a user, I would expect using two integers with pow()
to return an integer. On the other hand, I would expect sqrt(x)
to always return a float since I don't think there's any scenario where it makes sense for it not to return a float.
So not all of these can reuse that code. On first glance, I see abs()
and pow()
as the only two I would apply that integer rule to. The others should always return floats in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, last time I implemented these I did preserve int datatypes where possible, but given that the latest code for sin
happily accepted strings and all sorts I figured datatypes weren't that important now :-)
Happy to try and preserve ints where possible, I'll have a look in the next few days if I have time. pow
is quite subtle because pow(x, y)
only (usually) returns an int if x
is and integer AND y
is a positive integer...
query/math.go
Outdated
if fn != nil { | ||
result := fn(arg0, arg1) | ||
if math.IsNaN(result) { | ||
return nil, false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably return nil, true
and should probably be more like this:
result := fn(arg0, arg1)
if math.IsNaN(result) {
result = nil
}
return result, true
The ok
is supposed to signify "was this call handled?" It was handled, it just returned a result that we do not support and cannot serialize back to the user to begin with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, I guessed that might be what I should return but wasn't sure.
Also, can you add a test for each of the new functions? Just having one query that is valid in |
query/math.go
Outdated
case "sin": | ||
return v.callTrigFunction(math.Sin, args[0]) | ||
fn = math.Sin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about avoiding the double indirection (switch+indirect call) to give the inliner a chance to do its job? E.g. instead of
fn = math.FnName
call the function directly
return handleNaN(math.FnName(arg0))
where handleNaN is
func handleNaN(result float64) (float64, bool) {
if math.IsNaN(result) {
return nil, false
}
return result, true
}
If there are readability concerns then the something like the following should take care of them:
var res float64
switch name {
case "abs":
res = math.Abs(arg0)
...
default:
return nil, false
}
return handleNaN(res)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds fine to me. You would likely need to do something like this:
switch name {
case "sin", "cos", ... all of the other functions that return float ...:
arg0, ok := asFloat(args[0])
if !ok {
return nil, true
}
var res float64
// Need to do another switch to allow inlining.
switch name {
case "sin":
res = math.Sin(arg0))
default:
panic("need to panic here")
}
if math.IsNaN(res) {
return nil, true
}
return res, true
case "abs":
// Handled differently because it has different code depending on whether the input
// is a float or integer.
}
It would be a tiny bit more repetitive with some extra typing (plus, you have to do a name switch twice). I don't think that's a big deal though.
I was thinking about it though and you probably don't want handleNaN
. The main issue is that it would have to return an interface so returning a value type as an interface{}
usually results in an allocation. That would likely undo any extra optimizations from inlining.
I think we need to get the original PR merged though before talking about optimizations.
As an aside, I just realized the current code is wrong for string and boolean inputs. When it encounters something like sin(x::string)
, it should return nil, true
, but it currently returns nil, false
.
(edited, I really need to read the names of users before responding)
@Tomcat-Engineering for function names, we should just match the functions Prometheus has. So |
@Tomcat-Engineering whenever you have the time, this PR is relevant to the NaN question: #9637. This came about when a regression test caught That likely means you don't have to check for NaN. It'll be handled as a null value which is the equivalent of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more comment and then we can probably merge this.
query/math.go
Outdated
if arg0, ok := v.getIntegerValue(args[0]); ok { | ||
if arg1, ok := v.getIntegerValue(args[1]); ok && arg1 >= 0 { | ||
// Do the calc as a float then convert back to int if possible | ||
result := math.Pow(float64(arg0), float64(arg1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer if we did the integer power using only integers if possible. It should likely be implemented like this:
switch arg1 {
case 2:
return arg0 << 1, true
case 4:
return arg0 << 2, true
case 8:
return arg0 << 3, true
case 16:
return arg0 << 4, true
default:
val := arg0
for i := 1; i < arg1; i++ {
val *= arg0
}
return val, true
}
This would also include an optimization for bitshift. I'm not sure if the go compiler would get that automatically, but I would guess it would not in this scenario. It might be worth checking the optimizer though.
One advantage of the method used here is that we would be able to detect overflow, but no other integer operation really checks for that so I don't think that's as important as accuracy for the integer type.
As a reminder, absolute value and power need to also be implemented for the unsigned type. For unsigned, if one of the two is unsigned and one is integer, it needs to treat it as unsigned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, bitshifting two places to the left would multiply by 4, not raise to the power of 4. And I suspect that the naive "iterate arg1 times" approach will usually be way slower than converting to float, using the heavily optimised math library algorithm and converting back to int.
Maybe we should handle powers of 2 and 3 by using basic multiplication, that should optimise the most likely scenarios I guess. I'll push a change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we try to return a consistent datatype across the calculation? At the moment we are returning int for pow(3, my_int_series)
where my_int_series is >= 0 and a float otherwise... depending on the values of the series we could have a mixed return type. Should we actually only return int if the second argument is a literal int >= 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of questions that are coming up because I'm being dumb is becoming problematic so I'll say something different that will hopefully be less dumb.
We should go back to your original implementation and pow()
should only return float values and then it should be put in the documentation. For anything with x^2
you can always just do x * x
if you need to guarantee accuracy as an int. For something like 2^x
, we support bitshifts and I hope that's good enough. We wouldn't have 3^x
, but I think the additional complication caused by this request is creating too big of an issue.
Then we just put it in the documentation. I hope this isn't a bad idea as I cannot predict what people are going to need, but we could add a new power function for influxql if it comes up and we can fix the behavior in ifql.
Sorry for throwing you around so much on this. I didn't realize it would be so complicated to do what I thought.
It would be nice to have some unit tests which actually check the values and datatypes returned from these functions... if you guys provide some for the trig functions that you implemented, I'm happy to extend the coverage to these new functions. The scaffolding required (as seen in select_test.go) looks somewhat complicated so I'm going to let you do that bit! |
I'll add a unit test for that. I've been meaning to do that anyway. I'm thinking of changing the interface for call functions though. I thought the original interface would work well, but I've been finding the interface from the type mapper is much better and easier for construction. So here's the interface I'm thinking of changing it to: type CallValuer interface {
Call(name string, nargs int, args func() []interface{}) (interface{}, bool)
} So similar to the current interface, but it passes the name of the function, the number of arguments, and a function that can be used to obtain the arguments. The purpose is to make it so that the call arguments aren't evaluated until the function knows it can handle them. Thoughts? |
I've now updated the current code to use the new interface: #9666 |
Hi @Tomcat-Engineering, do you have time to complete this? We are planning to do a release candidate probably early next week and I would like to get this included. I can help finish this up if you don't have the time. |
- abs - asin, acos, atan, atan2 - exp, ln, log, log2, log10 - pow, sqrt
Thanks @Tomcat-Engineering for your help in starting this. I did the work of adapting my comments and then squashing it. I'm going to merge it now. Once again, thank you so much for your help in getting these additional functions in. We're going to get them into 1.6. |
Basic trig functions were recently added by @jsternberg. This PR adds a load more, using the same mechanisms.
New functions
Required for all non-trivial PRs
Required only if applicable
You can erase any checkboxes below this note if they are not applicable to your Pull Request.