-
Notifications
You must be signed in to change notification settings - Fork 810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
agones-{extensions,allocator}: Pause after cancelling context #3843
Conversation
This reverts commit 68b04ee.
After googleforgames#3839 went in, we noticed the flakes in TestAllocatorAfterDeleteReplica disappear, but TestGameServerCreationRightAfterDeletingOneExtensionsPod remained. I looked more closely at this and I developed a theory: My guess is that `kube-apiserver` connections to webhooks are actually "sticky" using http(s) keepalives. If the TCP connection never closes, we'd see the behavior we do, which is the `EOF` from kube-apiserver going to write on a dead socket. I looked at how to close sockets on the server side to "finish" the drain, but realized that we actually do, by cancelling the context. The thing is that we cancel the context and immediately exit, but it leaves no time for anything to shut down. I tested this and it seems to drive away the flake, without adding in the extra delay. Reverts googleforgames#3839
Build Succeeded 👏 Build Id: 8679ca3f-4c75-46fe-b0bd-9a1408b22e87 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm down to give it a shot 😁
Build Succeeded 👏 Build Id: 06c30836-c078-47ed-9094-84555411c0f7 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
I ran the HA tests on googleforgames#3843 overnight in a loop and still noticed a very low grade flake in the allocator, so I decided to go ahead and clean up the issues I noticed while working on the previous PR: * adds an `httpserver` utility package to handle the `Run` function that controller/extensions use. Make that context aware using the same method as https.Run: https://github.com/googleforgames/agones/blob/dfa414e5e4da37798833bbf8c33919acb5f3c2ea/pkg/util/https/server.go#L127-L130 (note that I think this ^ is why the extensions flake disappeared after I added a pause). * also plumbs context-awareness through the allocator run{Mux,REST,GRPC} functions, which is I suspect hy we're seeing a low-grade flake in allocator still. (I think the delay I added previously has sufficient to drive it off, too, but this PR should be more better.)
This PR contains the following updates: | Package | Update | Change | |---|---|---| | [agones](https://agones.dev) ([source](https://togithub.com/googleforgames/agones)) | minor | `1.40.0` -> `1.41.0` | --- > [!WARNING] > Some dependencies could not be looked up. Check the Dependency Dashboard for more information. --- ### Release Notes <details> <summary>googleforgames/agones (agones)</summary> ### [`v1.41.0`](https://togithub.com/googleforgames/agones/blob/HEAD/CHANGELOG.md#v1410-2024-06-04) [Compare Source](https://togithub.com/googleforgames/agones/compare/v1.40.0...v1.41.0) [Full Changelog](https://togithub.com/googleforgames/agones/compare/v1.40.0...v1.41.0) **Implemented enhancements:** - Configure Allocator Status Code by [@​Kalaiselvi84](https://togithub.com/Kalaiselvi84) in [https://github.com/googleforgames/agones/pull/3782](https://togithub.com/googleforgames/agones/pull/3782) - Graduate Counters and Lists to Beta by [@​Kalaiselvi84](https://togithub.com/Kalaiselvi84) in [https://github.com/googleforgames/agones/pull/3801](https://togithub.com/googleforgames/agones/pull/3801) - Passthrough autopilot - Adds an AutopilotPassthroughPort Feature Gate and new pod label by [@​vicentefb](https://togithub.com/vicentefb) in [https://github.com/googleforgames/agones/pull/3809](https://togithub.com/googleforgames/agones/pull/3809) - CountsAndLists: Move to Beta Protobuf by [@​Kalaiselvi84](https://togithub.com/Kalaiselvi84) in [https://github.com/googleforgames/agones/pull/3806](https://togithub.com/googleforgames/agones/pull/3806) - feat: support multiple port ranges by [@​nrwiersma](https://togithub.com/nrwiersma) in [https://github.com/googleforgames/agones/pull/3747](https://togithub.com/googleforgames/agones/pull/3747) - Changes `sdk-server` to Patch instead of Update by [@​igooch](https://togithub.com/igooch) in [https://github.com/googleforgames/agones/pull/3803](https://togithub.com/googleforgames/agones/pull/3803) - Generate grpc for nodejs from alpha to beta by [@​lacroixthomas](https://togithub.com/lacroixthomas) in [https://github.com/googleforgames/agones/pull/3825](https://togithub.com/googleforgames/agones/pull/3825) - Update CountsAndLists from Alpha to Beta by [@​Kalaiselvi84](https://togithub.com/Kalaiselvi84) in [https://github.com/googleforgames/agones/pull/3824](https://togithub.com/googleforgames/agones/pull/3824) - feat(gameserver): New DirectToGameServer PortPolicy allows direct traffic to a GameServer by [@​daniellee](https://togithub.com/daniellee) in [https://github.com/googleforgames/agones/pull/3807](https://togithub.com/googleforgames/agones/pull/3807) - Passthrough autopilot - Adds mutating webhook by [@​vicentefb](https://togithub.com/vicentefb) in [https://github.com/googleforgames/agones/pull/3833](https://togithub.com/googleforgames/agones/pull/3833) - Passthrough autopilot - added ports array case and updated unit tests by [@​vicentefb](https://togithub.com/vicentefb) in [https://github.com/googleforgames/agones/pull/3842](https://togithub.com/googleforgames/agones/pull/3842) - Nodejs counters and lists by [@​steven-supersolid](https://togithub.com/steven-supersolid) in [https://github.com/googleforgames/agones/pull/3726](https://togithub.com/googleforgames/agones/pull/3726) - Promote AutopilotPassthroughPort feature gate to Alpha by [@​vicentefb](https://togithub.com/vicentefb) in [https://github.com/googleforgames/agones/pull/3849](https://togithub.com/googleforgames/agones/pull/3849) **Fixed bugs:** - Helm Param Update: Default to agones.controller if agones.extensions is Missing by [@​Kalaiselvi84](https://togithub.com/Kalaiselvi84) in [https://github.com/googleforgames/agones/pull/3773](https://togithub.com/googleforgames/agones/pull/3773) - fix: rollout strategy issues by [@​nrwiersma](https://togithub.com/nrwiersma) in [https://github.com/googleforgames/agones/pull/3762](https://togithub.com/googleforgames/agones/pull/3762) - Set Minimum Buffer Size to 1 by [@​Kalaiselvi84](https://togithub.com/Kalaiselvi84) in [https://github.com/googleforgames/agones/pull/3749](https://togithub.com/googleforgames/agones/pull/3749) - Pin ltsc2019 to older SHA by [@​zmerlynn](https://togithub.com/zmerlynn) in [https://github.com/googleforgames/agones/pull/3829](https://togithub.com/googleforgames/agones/pull/3829) - TestGameServerAllocationDuringMultipleAllocationClients: Readdress flake by [@​zmerlynn](https://togithub.com/zmerlynn) in [https://github.com/googleforgames/agones/pull/3831](https://togithub.com/googleforgames/agones/pull/3831) - Refactor finalizer name to include valid domain name and path by [@​indexjoseph](https://togithub.com/indexjoseph) in [https://github.com/googleforgames/agones/pull/3840](https://togithub.com/googleforgames/agones/pull/3840) - agones-{extensions,allocator}: Be more defensive about draining by [@​zmerlynn](https://togithub.com/zmerlynn) in [https://github.com/googleforgames/agones/pull/3839](https://togithub.com/googleforgames/agones/pull/3839) - agones-{extensions,allocator}: Pause after cancelling context by [@​zmerlynn](https://togithub.com/zmerlynn) in [https://github.com/googleforgames/agones/pull/3843](https://togithub.com/googleforgames/agones/pull/3843) - Change the line to modify in Quickstart: Edit a Game Server by [@​peterzhongyi](https://togithub.com/peterzhongyi) in [https://github.com/googleforgames/agones/pull/3844](https://togithub.com/googleforgames/agones/pull/3844) **Other:** - Prep for Release v1.41.0 by [@​Kalaiselvi84](https://togithub.com/Kalaiselvi84) in [https://github.com/googleforgames/agones/pull/3800](https://togithub.com/googleforgames/agones/pull/3800) - Update site documentation to reflect firewall prefix and default to Autopilot cluster creation for Agones by [@​vicentefb](https://togithub.com/vicentefb) in [https://github.com/googleforgames/agones/pull/3769](https://togithub.com/googleforgames/agones/pull/3769) - Add a System Diagram and overview page by [@​zmerlynn](https://togithub.com/zmerlynn) in [https://github.com/googleforgames/agones/pull/3792](https://togithub.com/googleforgames/agones/pull/3792) - Update Side Menu: Preserve and Restore Scroll Position by [@​Kalaiselvi84](https://togithub.com/Kalaiselvi84) in [https://github.com/googleforgames/agones/pull/3805](https://togithub.com/googleforgames/agones/pull/3805) - fix: typo by [@​skmpf](https://togithub.com/skmpf) in [https://github.com/googleforgames/agones/pull/3808](https://togithub.com/googleforgames/agones/pull/3808) - Helm Config: Add httpUnallocatedStatusCode in Allocator Service by [@​Kalaiselvi84](https://togithub.com/Kalaiselvi84) in [https://github.com/googleforgames/agones/pull/3802](https://togithub.com/googleforgames/agones/pull/3802) - Update Docs: CountersAndLists to Beta by [@​Kalaiselvi84](https://togithub.com/Kalaiselvi84) in [https://github.com/googleforgames/agones/pull/3810](https://togithub.com/googleforgames/agones/pull/3810) - Disable Dev feature FeatureAutopilotPassthroughPort by [@​vicentefb](https://togithub.com/vicentefb) in [https://github.com/googleforgames/agones/pull/3815](https://togithub.com/googleforgames/agones/pull/3815) - Disable FeatureAutopilotPassthroughPort in features.go by [@​vicentefb](https://togithub.com/vicentefb) in [https://github.com/googleforgames/agones/pull/3816](https://togithub.com/googleforgames/agones/pull/3816) - SDK proto compatibility guarantees and deprecation policies documentation by [@​igooch](https://togithub.com/igooch) in [https://github.com/googleforgames/agones/pull/3774](https://togithub.com/googleforgames/agones/pull/3774) - Fix dangling "as of" by [@​zmerlynn](https://togithub.com/zmerlynn) in [https://github.com/googleforgames/agones/pull/3827](https://togithub.com/googleforgames/agones/pull/3827) - Steps to Promote SDK Features from Alpha to Beta by [@​Kalaiselvi84](https://togithub.com/Kalaiselvi84) in [https://github.com/googleforgames/agones/pull/3814](https://togithub.com/googleforgames/agones/pull/3814) - Adds comment for help troubleshooting issues with terraform tfstate by [@​igooch](https://togithub.com/igooch) in [https://github.com/googleforgames/agones/pull/3822](https://togithub.com/googleforgames/agones/pull/3822) - docs: improve counter and list example comments by [@​yonbh](https://togithub.com/yonbh) in [https://github.com/googleforgames/agones/pull/3818](https://togithub.com/googleforgames/agones/pull/3818) - Skip /tmp/ on yamllint by [@​zmerlynn](https://togithub.com/zmerlynn) in [https://github.com/googleforgames/agones/pull/3838](https://togithub.com/googleforgames/agones/pull/3838) - TestAllocatorAfterDeleteReplica: More logging by [@​zmerlynn](https://togithub.com/zmerlynn) in [https://github.com/googleforgames/agones/pull/3837](https://togithub.com/googleforgames/agones/pull/3837) - Instructions for upgrading golang version by [@​gongmax](https://togithub.com/gongmax) in [https://github.com/googleforgames/agones/pull/3819](https://togithub.com/googleforgames/agones/pull/3819) - Remove unused function FindGameServerContainer by [@​zmerlynn](https://togithub.com/zmerlynn) in [https://github.com/googleforgames/agones/pull/3841](https://togithub.com/googleforgames/agones/pull/3841) - Adds Unreal to the List of URL Links to Not Check by [@​igooch](https://togithub.com/igooch) in [https://github.com/googleforgames/agones/pull/3847](https://togithub.com/googleforgames/agones/pull/3847) - docs: clarify virtualization setup for Windows versions by [@​andresromerodev](https://togithub.com/andresromerodev) in [https://github.com/googleforgames/agones/pull/3850](https://togithub.com/googleforgames/agones/pull/3850) **New Contributors:** - [@​skmpf](https://togithub.com/skmpf) made their first contribution in [https://github.com/googleforgames/agones/pull/3808](https://togithub.com/googleforgames/agones/pull/3808) - [@​yonbh](https://togithub.com/yonbh) made their first contribution in [https://github.com/googleforgames/agones/pull/3818](https://togithub.com/googleforgames/agones/pull/3818) - [@​peterzhongyi](https://togithub.com/peterzhongyi) made their first contribution in [https://github.com/googleforgames/agones/pull/3844](https://togithub.com/googleforgames/agones/pull/3844) - [@​andresromerodev](https://togithub.com/andresromerodev) made their first contribution in [https://github.com/googleforgames/agones/pull/3850](https://togithub.com/googleforgames/agones/pull/3850) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://togithub.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zOTAuMCIsInVwZGF0ZWRJblZlciI6IjM3LjM5MC4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9oZWxtIiwidHlwZS9taW5vciJdfQ==-->
After #3839 went in, we noticed the flakes in TestAllocatorAfterDeleteReplica disappear, but TestGameServerCreationRightAfterDeletingOneExtensionsPod remained. I looked more closely at this and I developed a theory: My guess is that
kube-apiserver
connections to webhooks are actually "sticky" using http(s) keepalives. If the TCP connection never closes, we'd see the behavior we do, which is theEOF
from kube-apiserver going to write on a dead socket.I looked at how to close sockets on the server side to "finish" the drain, but realized that we actually do, by cancelling the context. The thing is that we cancel the context and immediately exit, but it leaves no time for anything to shut down.
Of course, then I looked more into this and ... I'm not sure why this is helping. But I was able to reproduce the e2e flake locally, and with this small change I'm able to see it go away, so 🤷 .
Reverts #3839