Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Updated the unicorn_tcp fork behavior #409

Merged
merged 1 commit into from
May 21, 2023

Conversation

Ardiea
Copy link
Contributor

@Ardiea Ardiea commented Mar 21, 2023

Updated the unicorn_tcp before_fork and after_fork behaviors to be consistent with mongodb documentation.

See closed PR: #406

Problem

@ MIT Office of Digital learning we use managed MongoDB Atlas as our backend datastore. Because of that, we don't always have visibility when maintenance events take place in the MongoDB cluster. We recently migrated cs_comments_service (we call it forum) to docker containers and we began noticing that when a MongoDB maintenance event takes place, the forum application becomes non-responsive until the docker container is restarted, which generally requires a manual intervention.

We are able to recreate this by manually triggering a primary failover in Atlas, and then observing the forum's behavior while the cluster makes this change. Once the primary has swapped, all further requests to the forum service result in the following error message:

forum  | I, [2023-02-23T20:06:51.883931 #53]  INFO -- : source=rack-timeout id=81a12329-cc66-431d-8e9e-ff2a4f6354c5 timeout=15000ms service=15002ms state=completed
forum  | 172.22.2.20 - - [23/Feb/2023:20:06:51 +0000] "GET /api/v1/users/10?complete=True&request_id=26a53562-a8a1-4690-9c3a-5dd4e0eef131 HTTP/1.1" 500 30 15.0013
forum  | E, [2023-02-23T20:07:00.594542 #52] ERROR -- : source=rack-timeout id=5623233a-bb6f-4c89-8266-f0a9ac383e38 timeout=15000ms service=15000ms state=timed_out
forum  | 2023-02-23 20:07:00 - Rack::Timeout::RequestTimeoutException - Request ran for longer than 15000ms :
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/semaphore.rb:33:in `sleep'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/semaphore.rb:33:in `wait'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/semaphore.rb:33:in `block in wait'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/semaphore.rb:32:in `synchronize'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/semaphore.rb:32:in `wait'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/server_selector/base.rb:609:in `wait_for_server_selection'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/server_selector/base.rb:266:in `block in select_server'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/server_selector/base.rb:247:in `loop'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/server_selector/base.rb:247:in `select_server'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/retryable.rb:521:in `select_server'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/retryable.rb:368:in `modern_read_with_retry'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/retryable.rb:127:in `read_with_retry'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/retryable.rb:63:in `read_with_retry_cursor'
forum  | 	/usr/local/lib/ruby/3.0.0/forwardable.rb:238:in `read_with_retry_cursor'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/collection/view/iterable.rb:126:in `select_cursor'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/collection/view/iterable.rb:60:in `each'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongoid-7.5.1/lib/mongoid/contextual/mongo.rb:287:in `first'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongoid-7.5.1/lib/mongoid/contextual/mongo.rb:287:in `find_first'
forum  | 	/usr/local/lib/ruby/3.0.0/forwardable.rb:238:in `find_first'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongoid-7.5.1/lib/mongoid/findable.rb:153:in `find_by'
forum  | 	/app/cs_comments_service/lib/helpers.rb:13:in `user'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/newrelic_rpm-8.12.0/lib/new_relic/agent/method_tracer.rb:339:in `block (3 levels) in _nr_define_traced_method'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/newrelic_rpm-8.12.0/lib/new_relic/agent/method_tracer.rb:74:in `block in trace_execution_scoped'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/newrelic_rpm-8.12.0/lib/new_relic/agent/method_tracer_helpers.rb:41:in `block in trace_execution_scoped'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/newrelic_rpm-8.12.0/lib/new_relic/agent/tracer.rb:355:in `capture_segment_error'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/newrelic_rpm-8.12.0/lib/new_relic/agent/method_tracer_helpers.rb:41:in `trace_execution_scoped'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/newrelic_rpm-8.12.0/lib/new_relic/agent/method_tracer.rb:72:in `trace_execution_scoped'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/newrelic_rpm-8.12.0/lib/new_relic/agent/method_tracer.rb:334:in `block (2 levels) in _nr_define_traced_method'
forum  | 	/app/cs_comments_service/api/users.rb:114:in `block in <top (required)>'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1685:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1685:in `block in compile!'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1002:in `block (3 levels) in route!'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1020:in `route_eval'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1002:in `block (2 levels) in route!'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1051:in `block in process_route'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1049:in `catch'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1049:in `process_route'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1000:in `block in route!'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:997:in `each'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:997:in `route!'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1121:in `block in dispatch!'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1092:in `catch'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1092:in `invoke'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1116:in `dispatch!'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:932:in `block in call!'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1092:in `catch'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1092:in `invoke'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:932:in `call!'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:921:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/bundler/gems/rack-contrib-6ff3ca2b2d98/lib/rack/contrib/locale.rb:15:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/rack-protection-3.0.2/lib/rack/protection/xss_header.rb:20:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/rack-protection-3.0.2/lib/rack/protection/path_traversal.rb:18:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/rack-protection-3.0.2/lib/rack/protection/json_csrf.rb:28:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/rack-protection-3.0.2/lib/rack/protection/base.rb:53:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/rack-protection-3.0.2/lib/rack/protection/base.rb:53:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/rack-protection-3.0.2/lib/rack/protection/frame_options.rb:33:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/rack-2.2.4/lib/rack/logger.rb:17:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/rack-2.2.4/lib/rack/common_logger.rb:38:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:261:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:254:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/rack-2.2.4/lib/rack/head.rb:12:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/rack-2.2.4/lib/rack/method_override.rb:24:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:219:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1995:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1554:in `block in call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1770:in `synchronize'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1554:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/query_cache.rb:278:in `block in call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/query_cache.rb:52:in `cache'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/query_cache.rb:277:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/rack-timeout-0.6.3/lib/rack/timeout/core.rb:148:in `block in call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/rack-timeout-0.6.3/lib/rack/timeout/support/timeout.rb:19:in `timeout'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/rack-timeout-0.6.3/lib/rack/timeout/core.rb:147:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/unicorn-6.1.0/lib/unicorn/http_server.rb:634:in `process_client'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/unicorn-6.1.0/lib/unicorn/http_server.rb:739:in `worker_loop'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/unicorn-6.1.0/lib/unicorn/http_server.rb:547:in `spawn_missing_workers'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/unicorn-6.1.0/lib/unicorn/http_server.rb:143:in `start'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/unicorn-6.1.0/bin/unicorn:128:in `<top (required)>'
forum  | 	./bin/unicorn:16:in `load'
forum  | 	./bin/unicorn:16:in `<main>'
forum  | I, [2023-02-23T20:07:00.595752 #52]  INFO -- : source=rack-timeout id=5623233a-bb6f-4c89-8266-f0a9ac383e38 timeout=15000ms service=15002ms state=completed

Solution

Per the MongoDB documentation found here and specifically this subsection, I have updated the config/unicorn_tcp.rb to follow the suggested pattern.

Relevant paragraph (emphasis added):

When a process forks, Ruby threads are not transferred to the child processes and the Ruby driver Client objects lose their background monitoring. The application will typically seem to work just fine until the deployment state changes (for example due to network errors, a maintenance event) at which point the application is likely to start getting NoServerAvailable exception when performing MongoDB operations.

Which is almost exactly the behavior we observe in our installations, excepting that it simply times out rather than raising a NoServerAvailable exception.

Disclaimer

I am not a ruby developer. I'm more of an operations person who doesn't want to be woken up by things like this. I am not sure if it is appropriate to make similar changes in config/unicorn.rb or config/unicorn.heroku.rb nor am I in a position to test/validate such changes.

This change, however, has been validated and we will be moving it to production shortly using the mitodl fork that this PR originates from.

Please let me know if there are any questions I can answer and I will try my best!

Updated the unicorn_tcp before_fork and after_fork behaviors to be consistent with mongodb documentation.
@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Mar 21, 2023
@openedx-webhooks
Copy link

Thanks for the pull request, @Ardiea! Please note that it may take us up to several weeks or months to complete a review and merge your PR.

Feel free to add as much of the following information to the ticket as you can:

  • supporting documentation
  • Open edX discussion forum threads
  • timeline information ("this must be merged by XX date", and why that is)
  • partner information ("this is a course on edx.org")
  • any other information that can help Product understand the context for the PR

All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here.

Please let us know once your PR is ready for our review and all tests are green.

@itsjeyd itsjeyd added the needs test run Author's first PR to this repository, awaiting test authorization from Axim label Mar 28, 2023
@itsjeyd
Copy link

itsjeyd commented Mar 28, 2023

Thanks @Ardiea! We'll run tests on this soon.

@itsjeyd
Copy link

itsjeyd commented Apr 5, 2023

@e0d When you get a minute could you please kick off a build for this PR?

@codecov
Copy link

codecov bot commented Apr 6, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (249e461) 96.12% compared to head (a812ad1) 96.12%.

❗ Current head a812ad1 differs from pull request most recent head 08e898b. Consider uploading reports for the commit 08e898b to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #409   +/-   ##
=======================================
  Coverage   96.12%   96.12%           
=======================================
  Files          58       58           
  Lines        4568     4568           
=======================================
  Hits         4391     4391           
  Misses        177      177           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@e0d
Copy link
Contributor

e0d commented Apr 7, 2023

@Ardiea Looks like there are some failing checks, can you have a look?

@e0d e0d removed the needs test run Author's first PR to this repository, awaiting test authorization from Axim label Apr 7, 2023
@Ardiea
Copy link
Contributor Author

Ardiea commented Apr 10, 2023

I'm not sure what I can do about these or what it has to do with my change?

Failures:

  1) app users GET /api/v1/users/:course_id/stats returns user's stats with flagged sort
     Failure/Error: expect(res["user_stats"]).to eq expected_result

       expected: [{"active_flags"=>36, "inactive_flags"=>24, "replies"=>34, "responses"=>9, "threads"=>2, "username"=>...23, "inactive_flags"=>23, "replies"=>26, "responses"=>12, "threads"=>2, "username"=>"userauthor-6"}]
            got: [{"active_flags"=>74, "inactive_flags"=>55, "replies"=>77, "responses"=>22, "threads"=>4, "username"=...49, "inactive_flags"=>46, "replies"=>59, "responses"=>26, "threads"=>4, "username"=>"userauthor-6"}]

       (compared using ==)

       Diff:

@itsjeyd
Copy link

itsjeyd commented Apr 13, 2023

@Ardiea You could try rebasing your changes on latest master. If the failures aren't related to your changes, that should fix the build.

@itsjeyd itsjeyd added the waiting on author PR author needs to resolve review requests, answer questions, fix tests, etc. label Apr 13, 2023
@Ardiea
Copy link
Contributor Author

Ardiea commented Apr 13, 2023

My fork is in sync with this repo and my branch is already against the latest commit from this repo.

https://github.com/Ardiea/cs_comments_service/tree/master

 ~/code/cs_comments_service$  (md/fork_behavior)| git checkout master
Switched to branch 'master'
Your branch is up to date with 'origin/master'.
 ~/code/cs_comments_service$  (master)| git pull
Already up to date.
 ~/code/cs_comments_service$  (master)| git checkout md/fork_behavior
Switched to branch 'md/fork_behavior'
Your branch is up to date with 'origin/md/fork_behavior'.
 ~/code/cs_comments_service$  (md/fork_behavior)| git pull
Already up to date.
 ~/code/cs_comments_service$  (md/fork_behavior)| git rebase master
Current branch md/fork_behavior is up to date.

@pdpinch
Copy link

pdpinch commented Apr 18, 2023

It looks like the tests are failing because of changes to codecov. Can it be removed?

@pdpinch
Copy link

pdpinch commented Apr 18, 2023

It looks like we just needed to rerun the tests. It's passing now.

@itsjeyd
Copy link

itsjeyd commented Apr 20, 2023

Thanks @pdpinch.

@Ardiea We can line this up for engineering review now.

@asadazam93 This is ready for engineering review by Infinity.

@itsjeyd itsjeyd removed the waiting on author PR author needs to resolve review requests, answer questions, fix tests, etc. label Apr 20, 2023
@itsjeyd
Copy link

itsjeyd commented Apr 26, 2023

Hey @asadazam93, just checking in to see if you had any updates on when this PR could potentially be reviewed by Infinity?

@asadazam93
Copy link
Contributor

@Ardiea Have you tested this in a production environment?

@Ardiea
Copy link
Contributor Author

Ardiea commented May 16, 2023

Yes this is running in four different MIT production environments right now plus an additional 7 non-prod environments. It resolved the issue mentioned in the top post immediately which at the time was affecting us 2-3 times a week across various environments.

@asadazam93 asadazam93 merged commit 4ca760a into openedx:master May 21, 2023
@openedx-webhooks
Copy link

@Ardiea 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
open-source-contribution PR author is not from Axim or 2U
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

6 participants