Continuous Queries causing 500 Timeouts #3368

jhedlund · 2015-07-17T19:37:47Z

I have run into an issue where after a minute or two of writes, I start getting 500 Timeout errors and the database becomes unresponsive (similar to issue #3199, but I am not using collectd, and also similar to issue #3362 (I am not crashing, but I may not be leaving my service up long enough to see the out of memory / crash).

I am using the HTTP POST protocol, posting in batches of 50 (though it occurred with smaller batches as well). I post about a 1000 data points per minute (I am running the POST at the top of every minute, pushing the ~1000 data points in batches of 50).

It would fail in the 2nd minute almost everytime. No bump in CPU, and a lot of free memory.

In issue #3346 (similar again, but doesn't mention any 500 errors), there were some questions around having continous queries.

I tried disabling my one continuous query and the problem has so far gone away (been running about 20 minutes so far without any 500 errors, all 204).

Can I provide more data to help diagnose the problem?

This is the continuous query:
CREATE CONTINUOUS QUERY ohlc_1m ON prices BEGIN
SELECT first(last) as open, max(bid) as high, min(ask) as low, last(last) as close INTO ohlc FROM hour.ticks GROUP BY time(1m), *
END

Thanks,
Jeff

beckettsean · 2015-07-17T20:13:39Z

There appear to be performance issues with large writes and concurrent continuous queries. I think you've linked to the relevant issues, and it's informative to know that you are seeing this behavior without the graphite or collectd plugin.

jhedlund · 2015-07-17T20:40:26Z

Thanks Sean. Any idea if there is a workaround to still have the downsampling in a continuous query?

beckettsean · 2015-07-17T21:20:46Z

The only facility for downsampling is continuous queries. They are due for significant work in the 0.9.3 version, so my best advice is limp along until August 13th, if you can.

There are CQ tuning parameters that are still poorly documented but the names make them fairly intuitive. Have you experimented with these settings in the config file?

[continuous_queries]
  enabled = true
  recompute-previous-n = 2
  recompute-no-older-than = "10m"
  compute-runs-per-interval = 10
  compute-no-more-than = "2m"

jhedlund · 2015-07-20T19:02:30Z

Is there some extra logging I can turn on to see what might be causing the problem of the continuous query?

Maybe some of that logging would point me in the direction of how to modify those settings...

beckettsean · 2015-07-20T20:56:18Z

@jhedlund I'm unaware of any different log levels right now. The internals have changed enough that the self-diagnostics and monitoring are being redone from scratch, so we might not know what's happening for another point release or two.

jhedlund · 2015-07-20T21:07:41Z

Ok, thanks.

I change recompute-previous-n to 0 to see if that made a difference (after reading a bit about them here https://github.com/influxdb/influxdb/blob/bf219cad358637b7771eced94a9ad0a7b5fa4b80/services/continuous_querier/config.go )

Did not make any difference. I have also tried CQs of similar nature using time(2m) and (5m) - both fail with the 500 timeouts after about a minute.

I'm trying time(1h) right now and it is so far holding up... I'll let it roll longer to see if it gets into the same situation, but so far so good.

I still need the 1m, 2m, 5m, etc, but I can possibly just query them from the raw table for a while. I'm going to see how low I can get the rollup in minutes to go.

Maybe some of that provides some clues for where the issue is...

Thanks
Jeff

jhedlund · 2015-07-20T22:29:03Z

Update: The group by time (1h) started to fail at about the 2nd hour mark with 500 errors.... eventually locking up queries on the database as well.

dim · 2015-07-30T14:43:02Z

I managed to recreate the problem in #3517

otoolep · 2015-09-09T21:16:56Z

We have improved CQ performance and believe this issue has been addressed.

beckettsean added area/performance area/continuous queries labels Jul 17, 2015

beckettsean added this to the 0.9.3 milestone Jul 17, 2015

jhedlund mentioned this issue Jul 21, 2015

Duplicate Data with same timestamp #3419

Closed

nicksellen mentioned this issue Jul 23, 2015

[0.9.1] 500 errors, then out of memory, then crash #3362

Closed

beckettsean modified the milestones: 0.9.4, 0.9.3 Aug 6, 2015

otoolep closed this as completed Sep 9, 2015

tiankui626 mentioned this issue Apr 18, 2016

[Question] all writes timeout after executing continuous query #6389

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Continuous Queries causing 500 Timeouts #3368

Continuous Queries causing 500 Timeouts #3368

jhedlund commented Jul 17, 2015

beckettsean commented Jul 17, 2015

jhedlund commented Jul 17, 2015

beckettsean commented Jul 17, 2015

jhedlund commented Jul 20, 2015

beckettsean commented Jul 20, 2015

jhedlund commented Jul 20, 2015

jhedlund commented Jul 20, 2015

dim commented Jul 30, 2015

otoolep commented Sep 9, 2015

Continuous Queries causing 500 Timeouts #3368

Continuous Queries causing 500 Timeouts #3368

Comments

jhedlund commented Jul 17, 2015

beckettsean commented Jul 17, 2015

jhedlund commented Jul 17, 2015

beckettsean commented Jul 17, 2015

jhedlund commented Jul 20, 2015

beckettsean commented Jul 20, 2015

jhedlund commented Jul 20, 2015

jhedlund commented Jul 20, 2015

dim commented Jul 30, 2015

otoolep commented Sep 9, 2015