Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add deterministic upload mode #602

Merged

Conversation

glasser
Copy link
Contributor

@glasser glasser commented Mar 1, 2019

This is to avoid #600.

In this mode, decisions about whether to upload files are only based on
properties of the input messages themselves: timestamps and input message
payload size. We don't care about real-world time, disk file timestamps, or log
file size; we don't support upload on shutdown; and we check for uploads after
every message.

Configuration:

  • set secor.upload.deterministic=true
  • Configure at least one of secor.max.file.timestamp.range.millis and
    secor.max.input.payload.size.bytes.
  • If you've configured secor.max.file.timestamp.range.millis, you must
    set kafka.useTimestamp=true and ensure that your FileReader/FileWriter
    supports timestamps.

@glasser
Copy link
Contributor Author

glasser commented Mar 1, 2019

I don't know if this fix is good to be upstreamed or if it's what the project wants, but it's what we're going to try for our installation.

@glasser glasser marked this pull request as ready for review March 1, 2019 06:15
@glasser glasser force-pushed the glasser/deterministic-upload branch 5 times, most recently from 0e78c64 to 1143908 Compare March 1, 2019 06:26
This is to avoid pinterest#600.

In this mode, decisions about whether to upload files are *only* based on
properties of the input messages themselves: timestamps and input message
payload size.  We don't care about real-world time, disk file timestamps, or log
file size; we don't support upload on shutdown; and we check for uploads after
every message.

Configuration:

- set secor.upload.deterministic=true
- Configure at least one of secor.max.file.timestamp.range.millis and
  secor.max.input.payload.size.bytes.
- If you've configured secor.max.file.timestamp.range.millis, you must
  set kafka.useTimestamp=true and ensure that your FileReader/FileWriter
  supports timestamps.
@glasser glasser force-pushed the glasser/deterministic-upload branch from 1143908 to 19dfa6f Compare March 1, 2019 09:11
@HenryCaiHaiying
Copy link
Contributor

Looks good to me. I will merge in this PR, it's a good fix for secor.

@HenryCaiHaiying HenryCaiHaiying merged commit 7f1c4b2 into pinterest:master Mar 2, 2019
@jeremyplichtafc
Copy link
Contributor

@glasser - I was reading your code and trying to understand something. If deterministic mode only looks at the time difference in the message for that partition and the size of input for that partition wont you have the case where a partition only gets a handful of messages (never triggering the time or size criteria) and is never uploaded? Especially if you are splitting them up into smaller partitions using SplitByFieldMessageParter.

@glasser
Copy link
Contributor Author

glasser commented May 10, 2019

That's probably correct, and should be mentioned in the docs for the option — open a PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants