-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Updated the unicorn_tcp fork behavior #409
Conversation
Updated the unicorn_tcp before_fork and after_fork behaviors to be consistent with mongodb documentation.
Thanks for the pull request, @Ardiea! Please note that it may take us up to several weeks or months to complete a review and merge your PR. Feel free to add as much of the following information to the ticket as you can:
All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here. Please let us know once your PR is ready for our review and all tests are green. |
Thanks @Ardiea! We'll run tests on this soon. |
@e0d When you get a minute could you please kick off a build for this PR? |
Codecov ReportPatch and project coverage have no change.
Additional details and impacted files@@ Coverage Diff @@
## master #409 +/- ##
=======================================
Coverage 96.12% 96.12%
=======================================
Files 58 58
Lines 4568 4568
=======================================
Hits 4391 4391
Misses 177 177 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
@Ardiea Looks like there are some failing checks, can you have a look? |
I'm not sure what I can do about these or what it has to do with my change?
|
@Ardiea You could try rebasing your changes on latest master. If the failures aren't related to your changes, that should fix the build. |
My fork is in sync with this repo and my branch is already against the latest commit from this repo. https://github.com/Ardiea/cs_comments_service/tree/master
|
It looks like the tests are failing because of changes to codecov. Can it be removed? |
It looks like we just needed to rerun the tests. It's passing now. |
Thanks @pdpinch. @Ardiea We can line this up for engineering review now. @asadazam93 This is ready for engineering review by Infinity. |
Hey @asadazam93, just checking in to see if you had any updates on when this PR could potentially be reviewed by Infinity? |
@Ardiea Have you tested this in a production environment? |
Yes this is running in four different MIT production environments right now plus an additional 7 non-prod environments. It resolved the issue mentioned in the top post immediately which at the time was affecting us 2-3 times a week across various environments. |
@Ardiea 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future. |
Updated the unicorn_tcp before_fork and after_fork behaviors to be consistent with mongodb documentation.
See closed PR: #406
Problem
@ MIT Office of Digital learning we use managed MongoDB Atlas as our backend datastore. Because of that, we don't always have visibility when maintenance events take place in the MongoDB cluster. We recently migrated
cs_comments_service
(we call itforum
) to docker containers and we began noticing that when a MongoDB maintenance event takes place, the forum application becomes non-responsive until the docker container is restarted, which generally requires a manual intervention.We are able to recreate this by manually triggering a primary failover in Atlas, and then observing the forum's behavior while the cluster makes this change. Once the primary has swapped, all further requests to the forum service result in the following error message:
Solution
Per the MongoDB documentation found here and specifically this subsection, I have updated the
config/unicorn_tcp.rb
to follow the suggested pattern.Relevant paragraph (emphasis added):
Which is almost exactly the behavior we observe in our installations, excepting that it simply times out rather than raising a
NoServerAvailable
exception.Disclaimer
I am not a ruby developer. I'm more of an operations person who doesn't want to be woken up by things like this. I am not sure if it is appropriate to make similar changes in
config/unicorn.rb
orconfig/unicorn.heroku.rb
nor am I in a position to test/validate such changes.This change, however, has been validated and we will be moving it to production shortly using the mitodl fork that this PR originates from.
Please let me know if there are any questions I can answer and I will try my best!