-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make backfill resilient #45
Conversation
Where is this 10k limit coming from? It's just that this approach seems reasonable if we have a vague limit of about 10k, but if this is a vague limit then what is the limitation, really? |
I made it up! See description. I want to experiment to confirm the actual limit — docs haven't been helpful. |
What errors are we seeing? |
e99a5df
to
d40d0ff
Compare
OK I've simplified this on the basis of https://cloud.google.com/bigquery/quotas#streaming_inserts new routine: break everything up into 500-row chunks as suggested by google |
if payload_byte_size > BQ_BATCH_MAX_BYTES | ||
events.each_slice((events.size / 2.0).round).to_a.each do |half_batch| | ||
Rails.logger.info "Halving batch of size #{payload_byte_size} for #{model_class.name}" | ||
DfE::Analytics::SendEvents.perform_now(half_batch.as_json) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if instead of sending these half-batches immediately, we re-queued them? This would give us a little bit of extra insurance in case some entities are truly HHUUUGGGEEE!
DfE::Analytics::SendEvents.perform_now(half_batch.as_json) | |
DfE::Analytics::LoadEntityBatch.perform_later(model_class.to_s, half_batch.map(&:id)) |
This means we don't take up queue space when we're processing backfills
BQ accepts max 10MB per request and recommends batches of 500. This allows 20kb per event. Batch everything in 500s. If a given batch payload exceeds 10MB, split it before sending.
When the block threw an exception (as in a spec for SendEvents) this method didn't have a chance to clean up and we got stuck in webmock mode
546dd2f
to
e7e55e8
Compare
Objectives:
Hopefully this isn't too complicated.
Still to do: find out the actual byte limit and set it