-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-10976] Bundle finalization: E2E support #17045
[BEAM-10976] Bundle finalization: E2E support #17045
Conversation
Codecov Report
@@ Coverage Diff @@
## master #17045 +/- ##
==========================================
+ Coverage 73.75% 73.86% +0.11%
==========================================
Files 667 672 +5
Lines 87406 87679 +273
==========================================
+ Hits 64468 64766 +298
+ Misses 21831 21792 -39
- Partials 1107 1121 +14
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Assigning reviewers. If you would like to opt out of this review, comment R: @youngoli for label go. Available commands:
|
Reminder, please take a look at this pr: @youngoli |
@aaltay I think its because out of 306 non-test lines changed, 188 are generated file changes. In particular, |
I agree with you. I guess it would make sense to not count generated files toward code coverage. (Neither a high priority, nor a blocker for this PR.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is delightfully complete, other than the manual change to a generated file.
#16961 is revamping all the proto generation, which will make things a bit easier for you if you're willing to wait, as it's almost ready to merge.
I'll also note that the additional bump to the max "optimized" reflectx.Func types is not strictly required. Any user code that's taking in that many parameters is likely doing a large amount of work per element, dominating the reflection overhead. However, no reason not to keep the change as is presently.
Type_KV Type_Special = 11 | ||
Type_COGBK Type_Special = 13 | ||
Type_WINDOWEDVALUE Type_Special = 14 | ||
Type_BUNDLEFINALIZATION Type_Special = 23 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never manually update Generated Proto files:
You'll need to add the type to here, and regen the file.
WINDOWEDVALUE = 14; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof - not sure how I missed that this was generated.
I regenerated, if it is convenient to wait for that change to go in that's fine though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lostluck thoughts on taking this now? It will cause a merge conflict, but its inevitable that one of our prs will run into that and this one is ready to go (I think, obviously feel free to review/disagree), and that one has been hanging around for a bit. Would be nice to doc/finish this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with adding it now, I was merely hoping the other PR would get in. Merge conflicts happen, and it's not a hard merge, especially since the PR simplifies the generation step anyway.
Assigning reviewers. If you would like to opt out of this review, comment R: @lostluck for label go. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
R: @youngoli for final approval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, with one small nit that I don't think is worth blocking on. Let me know what you think of that suggestion and I can merge after.
Edit: I'll merge after Robert's comment is addressed too. I didn't notice it in my review but it does need to be addressed.
Daniel, Please do keep in mind We Should Not submit this with a manually
changed generate pb.go file. The bot tagged you in after my review for some
reason.
…On Fri, Mar 18, 2022, 7:09 PM Daniel Oliveira ***@***.***> wrote:
***@***.**** approved this pull request.
This looks good to me, with one small nit that I don't think is worth
blocking on. Let me know what you think of that suggestion and I can merge
after.
------------------------------
In sdks/go/pkg/beam/core/funcx/fn.go
<#17045 (comment)>:
> @@ -415,7 +432,7 @@ func SubReturns(list []ReturnParam, indices ...int) []ReturnParam {
}
// The order of present parameters and return values must be as follows:
-// func(FnContext?, FnPane?, FnWindow?, FnEventTime?, FnType?, FnRTracker?, (FnValue, SideInput*)?, FnEmit*) (RetEventTime?, RetOutput?, RetError?)
+// func(FnContext?, FnPane?, FnWindow?, FnEventTime?, FnType?, FnRTracker?, FnBundleFinalization?, (FnValue, SideInput*)?, FnEmit*) (RetEventTime?, RetOutput?, RetError?)
Nit: Personally I think BundleFinalization should come before the
RTracker, because the restriction is basically part of the main input so I
feel it should come right before the main input. Not something I'd block
on, just preference.
—
Reply to this email directly, view it on GitHub
<#17045 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADKDOFKXTQTJ7YXFCGYE4LDVAUZNNANCNFSM5QHNRCRA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Feel free to push back on the sort request. We should add it before the 2.39.0 cut at least, and adding it in this PR avoids a potential gap.
Let me know and then I'll merge.
Type_KV Type_Special = 11 | ||
Type_COGBK Type_Special = 13 | ||
Type_WINDOWEDVALUE Type_Special = 14 | ||
Type_BUNDLEFINALIZATION Type_Special = 23 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with adding it now, I was merely hoping the other PR would get in. Merge conflicts happen, and it's not a hard merge, especially since the PR simplifies the generation step anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Merging! Thanks!
This is a continuation of the effort to add Bundle Finalization started in #16980
Summary of Overall Changes
Bundle finalization enables a DoFn to perform side effects after a runner has acknowledged that it has durably persisted the output. Right now, Java and Python support bundle finalization by allowing a user to register a callback function which is invoked when the runner acknowledges that it has persisted all output, but Go does not have any such support. This is part of a larger change to add support to the Go Sdk as outlined in this design doc.
Summary of Changes in this PR
I completed most of the execution changes in the last pr, leaving this PR to handle the user experience and plumbing the user's bundle finalization parameter through to the execution layer.
Additional testing done
On top of the units added, I also was able to run an E2E example on Dataflow (FWIW, only Dataflow currently has bundle finalization support). In that example, I hijacked the wordcount example and added a bundleFinalizer to write a file to persistent storage for each line that had at least 3 words (chosen pretty randomly to minimize the chances of collisions). I'll omit the whole sample since its long, but it produced a bunch of files like this:
This indeed ran after the other data was persisted.
I decided not to add an integration test because of the complexity involved. Because only the dataflow runner supports finalization, any integration test would require finding some way to (a) modify dataflow state and query that state (probably not a great option since it requires modifying a devs personal GCP account, and its not obvious what the best thing to do actually would be without knowing more about their config), (b) creating some sort of local endpoint for dataflow to talk back to (technically feasible, but definitely non-trivial - would also add significant complexity outside of what we're actually testing), or (c) using some 3rd party to communicate between the 2 (non-ideal since it adds an extra dependency that isn't part of what's being tested just for this scenario). I'm definitely open to doing this, but at the moment with the information I have it doesn't feel worth it.
Next Steps
After this, I'll update the documentation here to include an example https://github.com/apache/beam/blob/6438626c059c19ff9ca32cd834d0aa62253e531b/website/www/site/content/en/documentation/programming-guide.md#127-bundle-finalization-bundle-finalization
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.