-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(3/3) Create one-time trigger all historical landing page - fetch all stories #25
Comments
DynamoDB ModelingPrimary table: just UUID Landing page table:
Action items
|
Test the entire pipeline
|
* Ready to test * Fix db field first char not lowercase Tracked by #25 (comment) * Fix permission of db index, S3 pull Tracked by #25 (comment) * All tests complete Tracked by #25 (comment)
* Complete tf surgery; Identify all TODOs in golang For #25 * fix compile error; progress in metadata cronjob add query * Ready to test (#36) * Ready to test * Fix db field first char not lowercase Tracked by #25 (comment) * Fix permission of db index, S3 pull Tracked by #25 (comment) * All tests complete Tracked by #25 (comment)
* Draft for put landing page; identified TODOs Issue: #25 * Completed tf surgery; Identify all TODOs in golang (#35) * Complete tf surgery; Identify all TODOs in golang For #25 * fix compile error; progress in metadata cronjob add query * Ready to test (#36) * Ready to test * Fix db field first char not lowercase Tracked by #25 (comment) * Fix permission of db index, S3 pull Tracked by #25 (comment) * All tests complete Tracked by #25 (comment)
* Draining mechanism draft - identify all TODOs #25 (comment) * Draft for put landing page; identified TODOs (#34) * Draft for put landing page; identified TODOs Issue: #25 * Completed tf surgery; Identify all TODOs in golang (#35) * Complete tf surgery; Identify all TODOs in golang For #25 * fix compile error; progress in metadata cronjob add query * Ready to test (#36) * Ready to test * Fix db field first char not lowercase Tracked by #25 (comment) * Fix permission of db index, S3 pull Tracked by #25 (comment) * All tests complete Tracked by #25 (comment)
One time batch processingBetter build a tool that would be useful later in the future.Basically: turn S3 object(s) into a brand new DDB item.
To kick start,
Simplest way to do it?Avoid writing unnecessary code. This one-time thing is going to be used really rare after this first trigger. Leverage the S3 trigger + "move/copy" feature in S3 bucket. The flow could be like:
|
There are quite big cost implication, however we don't know exact the amount of $$ we need to pay yet. But moving forward it's time to think about the fast track issues and cost saving issues. We should have another issue address these, since they are out of scope and no longer about achieving one-time batch processing. For now, we will disable cronjob and pause the pipeline. Next time, we may copy over the stories to prod for reuse. Once we have the fast track feature #41, those will be skipped and we won't lose the computation outcome of these days. |
* temp store all * remove go_poc * upgrade so project runs on M1 * Try S3 notification * Fix prefix to include newssite alias * Fix aws lambda PathError issue * Save to metadata.json complete * add untitled stories in metadata.json * rename stories function to landing_metadata * rename batch stories fetch tf to metadata * Improved metadata access s3 event * Metadata.json trigger computing env * read parse metadata.json * fetch a story POC #24 * Sfn map parallism POC #24 * randomize requests * Refactor to allow individual tf modules address #25 (comment) * scaffold table * draft table design * create table * Draining mechanism draft - identify all TODOs #25 (comment) * Draft for put landing page; identified TODOs Issue: #25 * Complete tf surgery; Identify all TODOs in golang For #25 * fix compile error; progress in metadata cronjob add query * Ready to test * Fix db field first char not lowercase Tracked by #25 (comment) * Fix permission of db index, S3 pull Tracked by #25 (comment) * All tests complete Tracked by #25 (comment) * Move landing PutItem out to s3 trigger lambda; ready for S3 batch move * create reusable lambda module; optimize package size #25 (comment) * Fix golang build path * Refactor to use our custom lambda module * add landing s3 trigger * rm golang module stories that are renamed * Fix env var * Fix permission for PutItem move from landing to s3 trigger * Fix metadata s3 trigger not fired * Fix s3 trigger not working - S3 notification can only have one resource * Make it easier to test * prod grade setting enabled * In Sfn pin lambda version, so rolling deploy works better for lambda * Display sfn map result / target stories count info in finalizer * stop landing s3 trigger from sending slack logs Fixes #40 * Let Sfn pin lambda version Fixes #39 * improve log for metadata trigger * improve cronjob log * log cronjob event for better understanding of how it get triggered * Disable cronjob to better debug Fixes #43 * workaround to scale up our Sfn pipeline Fix #44 * improve log for landing S3 trigger * re-enable prod config plus cronjob
Better way to run them all
landing.html
trigger - just write into DDB Draft for put landing page; identified TODOs #34. Move the metadata computing part into new cronjob below.s3Event
to pulling from DDB. (do a query; canlimit=1
for our "slow start" purpose) Completed tf surgery; Identify all TODOs in golang #35isMetadataEverComputed
pipelineEvents
.Reference
Proper Throttling
It'll be best to reuse the Sfn, but limit the amount of concurrent sfn execution; overall we should aim at 5~100 concurrent lambdas but nothing more. Ideal if we can throttle <1 request / 2s.
But to truly keep a low profile it's best to span the time across hours, if not days.
Moving forward
Daily cronjob should automatically trigger our new S3-driven pipeline. Any other concerns?
Staging drill: parameter all switch to prod; then copy all prod landing page over to dev S3 bucket to test.
200
, why? Because the landing page didn't have enough stories! Look at the db, it's88
only so.Improvements - only do low hanging fruit at this point! Don't do complicate task
MaxConcurrency
proactively #44lastEventName
to query (but then it could be similar to scan)? Or just a opposite toisDocTypeWaitingForMetadata
, likeisDocTypeMetadataDone
.Do the same above for prod (ready!)
The text was updated successfully, but these errors were encountered: