Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Depends on #20.
The old way 🪨
When bulk-importing existing records into BQ, we used to:
$batch_size
records from the table usingfind_in_batches
SendEvents
job with all the events in$sleep_time
so we don't enqueue too many (??)This wasn't great because we ran this via rake over ssh, and often the loop would outlive the ssh session timeout. Or the loop would crash for some reason with half the set enqueued. This meant we acquired some other "features", like being able to restart the process from a given ID (which would be helpfully logged when the process crashed).
The new way ✨
Break up the finding of the records with the building of the events
$batch_size
records from the table usingfind_in_batches
LoadEntityBatch
jobAnd in
LoadEntityBatch
:$batch_size
events inAlso, cache the
analytics
data from the YAML so we don't read and parse it once per recordAdvantages 📈
Next 🔮
The outer loop should now be fast enough to run in-process from a web request, so we could try to build a little UI on top of it!