-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove node streams-based ingest code to prepare for post-ingest notifications #485
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
All the functionality has been extracted from this module so it can now be removed. We replicate the sequential processing of the previous stream-based implmenetation by adding the `asyncMapInSequence` helper. This commit does not reimplement the `bulk` feature of the database client previously used by the `_writev` mentod in the writable stream subclass.
jwalgran
force-pushed
the
jcw/remove-stream-based-ingest
branch
from
May 4, 2023 21:13
1c6559b
to
3b0ae0b
Compare
philvarner
reviewed
May 5, 2023
philvarner
reviewed
May 5, 2023
This single item convenience menthod is not use anywhere in the codebase.
jwalgran
force-pushed
the
jcw/remove-stream-based-ingest
branch
from
May 8, 2023 17:43
3b0ae0b
to
81b9ded
Compare
Thanks for the review @philvarner. I pushed 0d22f41 with removes I implemented a fix for #486 in separate PR #492 and targeted it at this branch, just to clearly separate the two. Both this PR and #492 are ready for review. |
philvarner
approved these changes
May 8, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Related Issue(s):
Proposed Changes:
ingestItem
and have it callingestItems
with a single-item list.lib/ingest.js
and calling it directly fromingestItems
My initial approach to preparing the code to support post-ingest notifications was to make a minimally invasive change that expanded the existing node streams-based implementation. Theoretically converting the
Writable
subclass to aDuplex
subclass so that items would be written to the DB and then the results would be made available to downstream consumer ( eventually an SNS notification) would be small in scope.I abandoned this approach for a number of reasons.
Duplex
implementation. The first item would be processed but failed to move forward._write
to process a single item or_writev
to process multiple items.INGEST_BATCH_SIZE
variable was misleading. It was only used to sethighWaterMark
of theSearchDatabaseWritableStream
This PR removes the node-stream based processing by extracting functionality from the
lib/databaseStream
module and moving it intolib/ingest
where it can be called directly.Unmerged PR #131 was a valuable reference but I did not adapt it directly because it was opened against a much older version of the codebase and it mixed removing the node streams-based implementation with SNS notifications. Follow up PRs to add this notification functionality will likely borrow more heavily from this previous attempt.
ingestItem
andingestItems
differed only in their logging, so to reduce duplication I have removed all the logic fromingestItem
and it now just callsingestItems
Batch processing is removed from this implementation.
ingestItems
saves items to the database sequentially, one at a time. Allowing items to be created in parallel without using thebulk
API resulted in test failures. If this is a significant performance issue in a high-volume setting it can be restored in a future pull request. Though not a load test, I didtime
the running of./bin/system-tests.sh
3 time on both the main branch and this branch as a minimal test that this did not change performance in small-load situations.main
real 1m4.807s
real 0m58.459s
real 0m59.376s
jcw/remove-stream-based-ingest
real 0m57.250s
real 0m57.398s
real 0m58.861s
PR Checklist: