Add buffering and fault tolerance to the InfluxDBOut Node. #145

nathanielc · 2016-01-14T16:36:15Z

The current implementation just writes points as it receives them to the InfluxDB. In high volume scenarios this will not be sufficient. The node should buffer the data points before sending them on to InfluxDB. This is appropriate since if you wanted any near real-time analytics of the data that is being buffered you already have them in Kapacitor and can do so. The small delay in being written to InfluxDB will be inconsequential.

By fault tolerance the data points writes should be retried etc before erroring out.

RonRothman · 2016-02-10T18:53:39Z

Greetings,

Would you mind elaborating on how this problem would manifest itself? We're seeing behaviour which might be related, but we're not sure how to tell.

We have a simple [stream] tickscript that copies a subset of measurements from one db into another. (Both dbs are on the same Influxdb server.) In general it runs well, but a few times an hour (on no discernible schedule), some measurements don't get copied into the dest db.

Could this be a capacity issue? FWIW, the machine has plenty of spare CPU; Kapacitor itself seems to stay below 6 or 7% CPU usage. We're running Kapacitor 0.10.0.

Here's an excerpt from our tickscript:

stream
  .from()
    .measurement('count')
    .where(lambda: "service" == 'service-1')
    .influxDBOut()
      .database('destdb')
      .measurement('count')

Note in that snippet that we're reading only those measurements that apply to service-1. We have 5 such services. It seems that the services with fewer data points (measurements) in the source db do not exhibit the problem of dropped measurements in the dest db. But those services that are more active and have more measurements in the source db are in fact the ones where we observe missing measurements in the dest db.

So,

Possibly related to the point you made in the initial comment, regarding volume? How could we diagnose?
Is there a recommended way to debug missing measurements from an InfluxDBOut node?
Do you want me to file this as its own issue rather than as a response to your initial comment?

Thanks!

nathanielc · 2016-02-10T21:29:58Z

@RonRothman Do the measurements ever show up later? Meaning does Kapacitor just get behind or is data really missing?

I would expect Kapacitor to only get behind but not drop any data. How many points per second are you writing to Kapacitor?

Answers:

Its possible, you should see the logs on the InfluxDB host for each write, Kapacitor specifies the name Kapacitor as the user agent so you should easily be able to determine which writes are coming from Kapacitor. You will also be able to see how long each write takes.
The output of kapacitor show task_name should print out the number of points the InfluxDBOut node has processed. You can compare that number to the number of points written to the db to make sure they are all made it. Also Kapacitor will log an error if it fails to write data to InfluxDB.
This issue is fine. Thanks for checking.

RonRothman · 2016-02-12T02:47:35Z

Thanks for the responses.

Nope, the data is really missing--it never appears in the dest db.

We're writing a few hundred measurements per minute to the source db.

Have not observed any error messages in kapacitord.err or kapacitor.log.

We've solved this for the moment by using continuous queries (in lieu of Kapacitor) to aggregate/copy the data from one db to the other. Both dbs are on the same server node, so we can get away with this for now.

We'll make some time to dig into the logs when we move back to Kapacitor (if not for data copying, then for alerts).

Thanks again.

nathanielc · 2016-02-16T16:51:36Z

@RonRothman Thanks for the details report. Glad you have a working solution. I'll work on reproducing the data missing issues and report back. This is obviously a serious bug. I may need more information about your setup etc as I dig in. Thanks again.

nathanielc · 2016-03-10T20:43:10Z

@RonRothman I have been able to write many millions of points with out dropping any. For now I am going to close this issue, but if you are still having issue please open a new one with relevant details.

RonRothman · 2016-03-10T21:10:45Z

@nathanielc Thanks--sorry for going MIA. We ended up using continuous queries instead of Kapacitor, so this issue became less urgent on our end. I'll repoen when we revisit Kapacitor. Thanks again!

nathanielc added this to the v0.11 milestone Feb 5, 2016

nathanielc added the bug label Feb 16, 2016

nathanielc mentioned this issue Feb 19, 2016

Add buffering to InfluxDBOutNode #250

Closed

nathanielc closed this as completed Mar 10, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add buffering and fault tolerance to the InfluxDBOut Node. #145

Add buffering and fault tolerance to the InfluxDBOut Node. #145

nathanielc commented Jan 14, 2016

RonRothman commented Feb 10, 2016

nathanielc commented Feb 10, 2016

RonRothman commented Feb 12, 2016

nathanielc commented Feb 16, 2016

nathanielc commented Mar 10, 2016

RonRothman commented Mar 10, 2016

Add buffering and fault tolerance to the InfluxDBOut Node. #145

Add buffering and fault tolerance to the InfluxDBOut Node. #145

Comments

nathanielc commented Jan 14, 2016

RonRothman commented Feb 10, 2016

nathanielc commented Feb 10, 2016

RonRothman commented Feb 12, 2016

nathanielc commented Feb 16, 2016

nathanielc commented Mar 10, 2016

RonRothman commented Mar 10, 2016