[0.9.0-rc28] Issue with number values being inserted as strings. Select 'mean(value)' crashes the node. #2346

svscorp · 2015-04-20T14:22:17Z

Hi,

I am facing a weird issue on 3-server-cluster setup (replication factor = 2), which might be related to #2272

I am having two measurements: memory and network_in (I have more, but let's just pock those two)

Query to the cluster:

curl -G http://influxdb:8086/query?pretty=true --data-urlencode "q=select mean(value) from memory where platform='p01'" --data-urlencode "db=stats"

answer:

{
    "results": [
        {
            "series": [
                {
                    "name": "memory",
                    "columns": [
                        "time",
                        "mean"
                    ],
                    "values": [
                        [
                            "1970-01-01T00:00:00Z",
                            546.1333333333333
                        ]
                    ]
                }
            ]
        }
    ]
}

Then, I am making same query

curl -G http://influxdb:8086/query?pretty=true --data-urlencode "q=select mean(value) from network_in where platform='p01'" --data-urlencode "db=stats"

And result is (with crashing a node):

<html><body><h1>502 Bad Gateway</h1>
The server returned an invalid or incomplete response.
</body></html>

Is there something I am missing?

svscorp · 2015-04-20T14:34:58Z

Update

Sometimes it returns this error (instead of Bad Gateway):

{
    "results": [
        {
            "error": "Post http://dbserver02:8086/data/run_mapper: EOF"
        }
    ]
}

If you are wondering, what are the values for 'network_in' measurement, same query with removing 'mean' keyword is giving:

{
    "results": [
        {
            "series": [
                {
                    "name": "network_in",
                    "columns": [
                        "time",
                        "value"
                    ],
                    "values": [
                        [
                            "2015-04-20T14:27:40Z",
                            "26517893"
                        ],
                        [
                            "2015-04-20T14:27:40Z",
                            "10174"
                        ],
                        [
                            "2015-04-20T14:27:40Z",
                            "76"
                        ],
                        [
                            "2015-04-20T14:27:40Z",
                            "10312"
                        ],
                        [
                            "2015-04-20T14:32:27Z",
                            "265208164"
                        ],
                        [
                            "2015-04-20T14:32:27Z",
                            "10378"
                        ],
                        [
                            "2015-04-20T14:32:27Z",
                            "228"
                        ],
                        [
                            "2015-04-20T14:32:27Z",
                            "10228"
                        ]
                    ]
                }
            ]
        }
    ]
}

Does it have something to do with the number type? (the assumption sounds wrong for that kind of "storage", but who knows)

svscorp · 2015-04-20T14:37:47Z

Update 2:

I tried to store some DISK values, and selecting that is behaving same. Taking the fact of big numbers into account, I think it is related.

svscorp · 2015-04-22T08:02:49Z

Update 3: 0.9.0-RC26, issue is still there

neonstalwart · 2015-04-22T14:51:01Z

i'm guessing that calculating the mean is causing an overflow when calculating the sum of large numbers.

calculating the sum can be avoided as long as we have the current average and a count for each of the 2 means we are trying to merge.

average = average1 * (count1 / (count1 + count2)) + average2 * (count2 / (count1 + count2))

since count1 / (count1 + count2) and count2 / (count1 + count2) are each less than 1, the mean of 2 means should be able to be calculated without using numbers that are larger than the largest number in the 2 sample sets.

it's probably simple enough to update the mean calculations to use this method and avoid overflow. MapMean should return a count and an average using:

average += (value - average) / ++count

and then ReduceMean should merge these together using the first equation above.

neonstalwart · 2015-04-22T15:57:57Z

@svscorp i can't reproduce your issue. do you have some curl commands to reproduce this from a clean db?

meanwhile, i may still go ahead and change the way mean is calculated but it doesn't seem to be the cause of your problem from what i can tell.

svscorp · 2015-04-22T16:03:14Z

I'll give it when will reach my laptop. But it happenes when I add a graph in graphana. You can change it editing the query, but that's the default query if you use the US to add filters for a graph.

svscorp · 2015-04-23T12:48:32Z

@neonstalwart

I was trying to reproduce it from the scratch and couldn't either. It is only reproducable when the data is being inserted with a scheduled script.

I'm now busy logging and making a valid case. Will get back once done.

Thanks for reaction, though!

svscorp · 2015-04-23T13:06:16Z

@neonstalwart I got it again. To not overload this thread I've put it here: CURL sequence with queries

neonstalwart · 2015-04-23T16:36:31Z

ok, i should have seen this sooner but it's a very subtle issue. your values are strings - e.g. "10228", "0.083334". you need to make them numbers - i.e. 10228, 0.083334.

once i had your sample data, i could see the error for myself

panic: interface conversion: interface is string, not float64

goroutine 119258 [running]:
github.com/influxdb/influxdb/influxql.MapMean(0x7fdcf623c688, 0xc2082f3760, 0x0, 0x0)
        /root/.gvm/pkgsets/go1.4.2/global/src/github.com/influxdb/influxdb/influxql/functions.go:214 +0xbf
github.com/influxdb/influxdb.(*LocalMapper).NextInterval(0xc2082f3760, 0x0, 0x0, 0x0, 0x0)
        /root/.gvm/pkgsets/go1.4.2/global/src/github.com/influxdb/influxdb/tx.go:401 +0xd3
github.com/influxdb/influxdb/influxql.(*MapReduceJob).processAggregate(0xc2081c65b0, 0xc2085ec660, 0xac2db8, 0xc2083eeee0, 0x1, 0x1, 0x0, 0x0)
        /root/.gvm/pkgsets/go1.4.2/global/src/github.com/influxdb/influxdb/influxql/engine.go:626 +0x264
github.com/influxdb/influxdb/influxql.(*MapReduceJob).Execute(0xc2081c65b0, 0xc2085a1d40, 0xc208270200)
        /root/.gvm/pkgsets/go1.4.2/global/src/github.com/influxdb/influxdb/influxql/engine.go:167 +0x8bf
github.com/influxdb/influxdb/influxql.(*Executor).execute(0xc208355340, 0xc2085a1d40)
        /root/.gvm/pkgsets/go1.4.2/global/src/github.com/influxdb/influxdb/influxql/engine.go:774 +0xba
created by github.com/influxdb/influxdb/influxql.(*Executor).Execute
        /root/.gvm/pkgsets/go1.4.2/global/src/github.com/influxdb/influxdb/influxql/engine.go:753 +0x5a

this made me realize that your values were strings. the fix is easy - make them numbers. https://gist.github.com/neonstalwart/00d106d8de7c9a960696/5e0a181eb83a649bbb34965dbf6e49347e2a7e29 is a reduced example that demonstrates the problem and https://gist.github.com/neonstalwart/00d106d8de7c9a960696 is the same example with the values changed to numbers and it works.

otoolep · 2015-04-23T16:55:48Z

This is a known issue, which we will address before the 0.9.0 release.

#2299

svscorp · 2015-04-29T09:09:09Z

It works when sending values as floats

abhiofdoon · 2015-07-29T21:09:55Z

I see Influxdb 0.9 crash on a simple select query as well.

curl -G 'http://localhost:8086/query' --data-urlencode "db=graphite" --data-urlencode "q=select * from "xxx-com.cloud.gauges.raw_storage_usage"" | python -m json.tool

The data in influxdb looks like this:
$ curl -G 'http://localhost:8086/query' --data-urlencode "db=graphite" --data-urlencode "q=select * from "xxx-com.cloud.gauges.raw_storage_usage" where time > now() - 5m"
{"results":[{"series":[{"name":"xxx-com.cloud.gauges.raw_storage_usage","columns":["time","value"],"values":[["2015-07-29T21:06:07Z",152506]]}]}]}

Here are the influxdb logs.

[http] 2015/07/29 20:59:10 127.0.0.1 - - [29/Jul/2015:20:59:10 +0000] GET /query?q=select+value+from+%22xxx-com.cloud.gauges.num_sessions%22+where+time+%3E+1435611550s+and+time+%3C+1438203551s&p=root&u=root&db=graphite HTTP/1.1 200 3198 - python-requests/2.2.1 CPython/2.7.6 Linux/3.16.0-38-generic a930fa47-3634-11e5-801c-000000000000 7.753755ms
[http] 2015/07/29 20:59:12 127.0.0.1 - - [29/Jul/2015:20:59:12 +0000] GET /query?q=show+series+from+%2F%5Exxx-com%5C.cloud%5C.gauges%5C.num_sessions%2F&p=root&u=root&db=graphite HTTP/1.1 200 122 - python-requests/2.2.1 CPython/2.7.6 Linux/3.16.0-38-generic aa5ba27d-3634-11e5-801d-000000000000 8.497784ms
[http] 2015/07/29 20:59:12 127.0.0.1 - - [29/Jul/2015:20:59:12 +0000] GET /query?q=show+series+from+%2F%5Exxx-com%5C.cloud%5C.gauges%5C.num_sessions%2F&p=root&u=root&db=graphite HTTP/1.1 200 122 - python-requests/2.2.1 CPython/2.7.6 Linux/3.16.0-38-generic aa5d9916-3634-11e5-801e-000000000000 13.248976ms
[http] 2015/07/29 20:59:12 127.0.0.1 - - [29/Jul/2015:20:59:12 +0000] GET /query?q=select+value+from+%22xxx-com.cloud.gauges.num_sessions%22+where+time+%3E+1435611552s+and+time+%3C+1438203553s&p=root&u=root&db=graphite HTTP/1.1 200 3198 - python-requests/2.2.1 CPython/2.7.6 Linux/3.16.0-38-generic aa603c90-3634-11e5-801f-000000000000 14.286768ms
panic: runtime error: slice bounds out of range

goroutine 3633 [running]:
github.com/influxdb/influxdb/tsdb.scanTagValue(0xc2086c0a00, 0x80, 0x80, 0x81, 0x103d, 0x80, 0xc2086c0a5d, 0x23)
/root/.gvm/pkgsets/go1.4.2/global/src/github.com/influxdb/influxdb/tsdb/points.go:436 +0x74
github.com/influxdb/influxdb/tsdb.(*point).Tags(0xc2086c0a80, 0xc2086c0a00)
/root/.gvm/pkgsets/go1.4.2/global/src/github.com/influxdb/influxdb/tsdb/points.go:559 +0x1bc

svscorp changed the title ~~Select 'mean' query for a one series works, for another - crashes the node~~ [0.9.0-rc26] Issue with big values stored. Select 'mean(value)' query for a one series works, for another - crashes the node Apr 22, 2015

neonstalwart mentioned this issue Apr 22, 2015

add test for large mean aggregations #2390

Merged

svscorp changed the title ~~[0.9.0-rc26] Issue with big values stored. Select 'mean(value)' query for a one series works, for another - crashes the node~~ [0.9.0-rc28] Issue with big values stored. Select 'mean(value)' query for a one series works, for another - crashes the node Apr 29, 2015

svscorp changed the title ~~[0.9.0-rc28] Issue with big values stored. Select 'mean(value)' query for a one series works, for another - crashes the node~~ [0.9.0-rc28] Issue with number values being inserted as strings. Select 'mean(value)' crashes the node. Apr 29, 2015

svscorp closed this as completed Apr 29, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[0.9.0-rc28] Issue with number values being inserted as strings. Select 'mean(value)' crashes the node. #2346

[0.9.0-rc28] Issue with number values being inserted as strings. Select 'mean(value)' crashes the node. #2346

svscorp commented Apr 20, 2015

svscorp commented Apr 20, 2015

svscorp commented Apr 20, 2015

svscorp commented Apr 22, 2015

neonstalwart commented Apr 22, 2015

neonstalwart commented Apr 22, 2015

svscorp commented Apr 22, 2015

svscorp commented Apr 23, 2015

svscorp commented Apr 23, 2015

neonstalwart commented Apr 23, 2015

otoolep commented Apr 23, 2015

svscorp commented Apr 29, 2015

abhiofdoon commented Jul 29, 2015

[0.9.0-rc28] Issue with number values being inserted as strings. Select 'mean(value)' crashes the node. #2346

[0.9.0-rc28] Issue with number values being inserted as strings. Select 'mean(value)' crashes the node. #2346

Comments

svscorp commented Apr 20, 2015

svscorp commented Apr 20, 2015

svscorp commented Apr 20, 2015

svscorp commented Apr 22, 2015

neonstalwart commented Apr 22, 2015

neonstalwart commented Apr 22, 2015

svscorp commented Apr 22, 2015

svscorp commented Apr 23, 2015

svscorp commented Apr 23, 2015

neonstalwart commented Apr 23, 2015

otoolep commented Apr 23, 2015

svscorp commented Apr 29, 2015

abhiofdoon commented Jul 29, 2015