Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add buffering and fault tolerance to the InfluxDBOut Node. #145

Closed
nathanielc opened this issue Jan 14, 2016 · 6 comments
Closed

Add buffering and fault tolerance to the InfluxDBOut Node. #145

nathanielc opened this issue Jan 14, 2016 · 6 comments
Labels
Milestone

Comments

@nathanielc
Copy link
Contributor

The current implementation just writes points as it receives them to the InfluxDB. In high volume scenarios this will not be sufficient. The node should buffer the data points before sending them on to InfluxDB. This is appropriate since if you wanted any near real-time analytics of the data that is being buffered you already have them in Kapacitor and can do so. The small delay in being written to InfluxDB will be inconsequential.

By fault tolerance the data points writes should be retried etc before erroring out.

@nathanielc nathanielc added this to the v0.11 milestone Feb 5, 2016
@RonRothman
Copy link

Greetings,

Would you mind elaborating on how this problem would manifest itself? We're seeing behaviour which might be related, but we're not sure how to tell.

We have a simple [stream] tickscript that copies a subset of measurements from one db into another. (Both dbs are on the same Influxdb server.) In general it runs well, but a few times an hour (on no discernible schedule), some measurements don't get copied into the dest db.

Could this be a capacity issue? FWIW, the machine has plenty of spare CPU; Kapacitor itself seems to stay below 6 or 7% CPU usage. We're running Kapacitor 0.10.0.

Here's an excerpt from our tickscript:

stream
  .from()
    .measurement('count')
    .where(lambda: "service" == 'service-1')
    .influxDBOut()
      .database('destdb')
      .measurement('count')

Note in that snippet that we're reading only those measurements that apply to service-1. We have 5 such services. It seems that the services with fewer data points (measurements) in the source db do not exhibit the problem of dropped measurements in the dest db. But those services that are more active and have more measurements in the source db are in fact the ones where we observe missing measurements in the dest db.

So,

  1. Possibly related to the point you made in the initial comment, regarding volume? How could we diagnose?
  2. Is there a recommended way to debug missing measurements from an InfluxDBOut node?
  3. Do you want me to file this as its own issue rather than as a response to your initial comment?

Thanks!

@nathanielc
Copy link
Contributor Author

@RonRothman Do the measurements ever show up later? Meaning does Kapacitor just get behind or is data really missing?

I would expect Kapacitor to only get behind but not drop any data. How many points per second are you writing to Kapacitor?

Answers:

  1. Its possible, you should see the logs on the InfluxDB host for each write, Kapacitor specifies the name Kapacitor as the user agent so you should easily be able to determine which writes are coming from Kapacitor. You will also be able to see how long each write takes.
  2. The output of kapacitor show task_name should print out the number of points the InfluxDBOut node has processed. You can compare that number to the number of points written to the db to make sure they are all made it. Also Kapacitor will log an error if it fails to write data to InfluxDB.
  3. This issue is fine. Thanks for checking.

@RonRothman
Copy link

Thanks for the responses.

Nope, the data is really missing--it never appears in the dest db.

We're writing a few hundred measurements per minute to the source db.

Have not observed any error messages in kapacitord.err or kapacitor.log.

We've solved this for the moment by using continuous queries (in lieu of Kapacitor) to aggregate/copy the data from one db to the other. Both dbs are on the same server node, so we can get away with this for now.

We'll make some time to dig into the logs when we move back to Kapacitor (if not for data copying, then for alerts).

Thanks again.

@nathanielc
Copy link
Contributor Author

@RonRothman Thanks for the details report. Glad you have a working solution. I'll work on reproducing the data missing issues and report back. This is obviously a serious bug. I may need more information about your setup etc as I dig in. Thanks again.

@nathanielc
Copy link
Contributor Author

@RonRothman I have been able to write many millions of points with out dropping any. For now I am going to close this issue, but if you are still having issue please open a new one with relevant details.

@RonRothman
Copy link

@nathanielc Thanks--sorry for going MIA. We ended up using continuous queries instead of Kapacitor, so this issue became less urgent on our end. I'll repoen when we revisit Kapacitor. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants