fix: Updated the unicorn_tcp fork behavior #409

Ardiea · 2023-03-21T12:24:08Z

Updated the unicorn_tcp before_fork and after_fork behaviors to be consistent with mongodb documentation.

See closed PR: #406

Problem

@ MIT Office of Digital learning we use managed MongoDB Atlas as our backend datastore. Because of that, we don't always have visibility when maintenance events take place in the MongoDB cluster. We recently migrated cs_comments_service (we call it forum) to docker containers and we began noticing that when a MongoDB maintenance event takes place, the forum application becomes non-responsive until the docker container is restarted, which generally requires a manual intervention.

We are able to recreate this by manually triggering a primary failover in Atlas, and then observing the forum's behavior while the cluster makes this change. Once the primary has swapped, all further requests to the forum service result in the following error message:

forum  | I, [2023-02-23T20:06:51.883931 #53]  INFO -- : source=rack-timeout id=81a12329-cc66-431d-8e9e-ff2a4f6354c5 timeout=15000ms service=15002ms state=completed
forum  | 172.22.2.20 - - [23/Feb/2023:20:06:51 +0000] "GET /api/v1/users/10?complete=True&request_id=26a53562-a8a1-4690-9c3a-5dd4e0eef131 HTTP/1.1" 500 30 15.0013
forum  | E, [2023-02-23T20:07:00.594542 #52] ERROR -- : source=rack-timeout id=5623233a-bb6f-4c89-8266-f0a9ac383e38 timeout=15000ms service=15000ms state=timed_out
forum  | 2023-02-23 20:07:00 - Rack::Timeout::RequestTimeoutException - Request ran for longer than 15000ms :
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/semaphore.rb:33:in `sleep'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/semaphore.rb:33:in `wait'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/semaphore.rb:33:in `block in wait'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/semaphore.rb:32:in `synchronize'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/semaphore.rb:32:in `wait'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/server_selector/base.rb:609:in `wait_for_server_selection'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/server_selector/base.rb:266:in `block in select_server'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/server_selector/base.rb:247:in `loop'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/server_selector/base.rb:247:in `select_server'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/retryable.rb:521:in `select_server'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/retryable.rb:368:in `modern_read_with_retry'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/retryable.rb:127:in `read_with_retry'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/retryable.rb:63:in `read_with_retry_cursor'
forum  | 	/usr/local/lib/ruby/3.0.0/forwardable.rb:238:in `read_with_retry_cursor'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/collection/view/iterable.rb:126:in `select_cursor'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/collection/view/iterable.rb:60:in `each'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongoid-7.5.1/lib/mongoid/contextual/mongo.rb:287:in `first'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongoid-7.5.1/lib/mongoid/contextual/mongo.rb:287:in `find_first'
forum  | 	/usr/local/lib/ruby/3.0.0/forwardable.rb:238:in `find_first'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongoid-7.5.1/lib/mongoid/findable.rb:153:in `find_by'
forum  | 	/app/cs_comments_service/lib/helpers.rb:13:in `user'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/newrelic_rpm-8.12.0/lib/new_relic/agent/method_tracer.rb:339:in `block (3 levels) in _nr_define_traced_method'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/newrelic_rpm-8.12.0/lib/new_relic/agent/method_tracer.rb:74:in `block in trace_execution_scoped'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/newrelic_rpm-8.12.0/lib/new_relic/agent/method_tracer_helpers.rb:41:in `block in trace_execution_scoped'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/newrelic_rpm-8.12.0/lib/new_relic/agent/tracer.rb:355:in `capture_segment_error'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/newrelic_rpm-8.12.0/lib/new_relic/agent/method_tracer_helpers.rb:41:in `trace_execution_scoped'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/newrelic_rpm-8.12.0/lib/new_relic/agent/method_tracer.rb:72:in `trace_execution_scoped'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/newrelic_rpm-8.12.0/lib/new_relic/agent/method_tracer.rb:334:in `block (2 levels) in _nr_define_traced_method'
forum  | 	/app/cs_comments_service/api/users.rb:114:in `block in <top (required)>'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1685:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1685:in `block in compile!'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1002:in `block (3 levels) in route!'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1020:in `route_eval'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1002:in `block (2 levels) in route!'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1051:in `block in process_route'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1049:in `catch'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1049:in `process_route'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1000:in `block in route!'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:997:in `each'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:997:in `route!'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1121:in `block in dispatch!'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1092:in `catch'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1092:in `invoke'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1116:in `dispatch!'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:932:in `block in call!'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1092:in `catch'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1092:in `invoke'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:932:in `call!'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:921:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/bundler/gems/rack-contrib-6ff3ca2b2d98/lib/rack/contrib/locale.rb:15:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/rack-protection-3.0.2/lib/rack/protection/xss_header.rb:20:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/rack-protection-3.0.2/lib/rack/protection/path_traversal.rb:18:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/rack-protection-3.0.2/lib/rack/protection/json_csrf.rb:28:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/rack-protection-3.0.2/lib/rack/protection/base.rb:53:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/rack-protection-3.0.2/lib/rack/protection/base.rb:53:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/rack-protection-3.0.2/lib/rack/protection/frame_options.rb:33:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/rack-2.2.4/lib/rack/logger.rb:17:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/rack-2.2.4/lib/rack/common_logger.rb:38:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:261:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:254:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/rack-2.2.4/lib/rack/head.rb:12:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/rack-2.2.4/lib/rack/method_override.rb:24:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:219:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1995:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1554:in `block in call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1770:in `synchronize'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/sinatra-3.0.2/lib/sinatra/base.rb:1554:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/query_cache.rb:278:in `block in call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/query_cache.rb:52:in `cache'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/mongo-2.18.1/lib/mongo/query_cache.rb:277:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/rack-timeout-0.6.3/lib/rack/timeout/core.rb:148:in `block in call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/rack-timeout-0.6.3/lib/rack/timeout/support/timeout.rb:19:in `timeout'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/rack-timeout-0.6.3/lib/rack/timeout/core.rb:147:in `call'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/unicorn-6.1.0/lib/unicorn/http_server.rb:634:in `process_client'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/unicorn-6.1.0/lib/unicorn/http_server.rb:739:in `worker_loop'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/unicorn-6.1.0/lib/unicorn/http_server.rb:547:in `spawn_missing_workers'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/unicorn-6.1.0/lib/unicorn/http_server.rb:143:in `start'
forum  | 	/app/cs_comments_service/vendor/bundle/ruby/3.0.0/gems/unicorn-6.1.0/bin/unicorn:128:in `<top (required)>'
forum  | 	./bin/unicorn:16:in `load'
forum  | 	./bin/unicorn:16:in `<main>'
forum  | I, [2023-02-23T20:07:00.595752 #52]  INFO -- : source=rack-timeout id=5623233a-bb6f-4c89-8266-f0a9ac383e38 timeout=15000ms service=15002ms state=completed

Solution

Per the MongoDB documentation found here and specifically this subsection, I have updated the config/unicorn_tcp.rb to follow the suggested pattern.

Relevant paragraph (emphasis added):

When a process forks, Ruby threads are not transferred to the child processes and the Ruby driver Client objects lose their background monitoring. The application will typically seem to work just fine until the deployment state changes (for example due to network errors, a maintenance event) at which point the application is likely to start getting NoServerAvailable exception when performing MongoDB operations.

Which is almost exactly the behavior we observe in our installations, excepting that it simply times out rather than raising a NoServerAvailable exception.

Disclaimer

I am not a ruby developer. I'm more of an operations person who doesn't want to be woken up by things like this. I am not sure if it is appropriate to make similar changes in config/unicorn.rb or config/unicorn.heroku.rb nor am I in a position to test/validate such changes.

This change, however, has been validated and we will be moving it to production shortly using the mitodl fork that this PR originates from.

Please let me know if there are any questions I can answer and I will try my best!

Updated the unicorn_tcp before_fork and after_fork behaviors to be consistent with mongodb documentation.

openedx-webhooks · 2023-03-21T12:24:14Z

Thanks for the pull request, @Ardiea! Please note that it may take us up to several weeks or months to complete a review and merge your PR.

Feel free to add as much of the following information to the ticket as you can:

supporting documentation
Open edX discussion forum threads
timeline information ("this must be merged by XX date", and why that is)
partner information ("this is a course on edx.org")
any other information that can help Product understand the context for the PR

All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here.

Please let us know once your PR is ready for our review and all tests are green.

itsjeyd · 2023-03-28T08:50:43Z

Thanks @Ardiea! We'll run tests on this soon.

itsjeyd · 2023-04-05T07:22:39Z

@e0d When you get a minute could you please kick off a build for this PR?

codecov · 2023-04-06T11:06:39Z

Codecov Report

Patch and project coverage have no change.

Comparison is base (249e461) 96.12% compared to head (a812ad1) 96.12%.

❗ Current head a812ad1 differs from pull request most recent head 08e898b. Consider uploading reports for the commit 08e898b to get more accurate results

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #409   +/-   ##
=======================================
  Coverage   96.12%   96.12%           
=======================================
  Files          58       58           
  Lines        4568     4568           
=======================================
  Hits         4391     4391           
  Misses        177      177

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

e0d · 2023-04-07T18:52:22Z

@Ardiea Looks like there are some failing checks, can you have a look?

Ardiea · 2023-04-10T13:58:24Z

I'm not sure what I can do about these or what it has to do with my change?

Failures:

  1) app users GET /api/v1/users/:course_id/stats returns user's stats with flagged sort
     Failure/Error: expect(res["user_stats"]).to eq expected_result

       expected: [{"active_flags"=>36, "inactive_flags"=>24, "replies"=>34, "responses"=>9, "threads"=>2, "username"=>...23, "inactive_flags"=>23, "replies"=>26, "responses"=>12, "threads"=>2, "username"=>"userauthor-6"}]
            got: [{"active_flags"=>74, "inactive_flags"=>55, "replies"=>77, "responses"=>22, "threads"=>4, "username"=...49, "inactive_flags"=>46, "replies"=>59, "responses"=>26, "threads"=>4, "username"=>"userauthor-6"}]

       (compared using ==)

       Diff:

itsjeyd · 2023-04-13T06:59:19Z

@Ardiea You could try rebasing your changes on latest master. If the failures aren't related to your changes, that should fix the build.

Ardiea · 2023-04-13T12:49:55Z

My fork is in sync with this repo and my branch is already against the latest commit from this repo.

https://github.com/Ardiea/cs_comments_service/tree/master

 ~/code/cs_comments_service$  (md/fork_behavior)| git checkout master
Switched to branch 'master'
Your branch is up to date with 'origin/master'.
 ~/code/cs_comments_service$  (master)| git pull
Already up to date.
 ~/code/cs_comments_service$  (master)| git checkout md/fork_behavior
Switched to branch 'md/fork_behavior'
Your branch is up to date with 'origin/md/fork_behavior'.
 ~/code/cs_comments_service$  (md/fork_behavior)| git pull
Already up to date.
 ~/code/cs_comments_service$  (md/fork_behavior)| git rebase master
Current branch md/fork_behavior is up to date.

pdpinch · 2023-04-18T00:36:09Z

It looks like the tests are failing because of changes to codecov. Can it be removed?

pdpinch · 2023-04-18T19:00:19Z

It looks like we just needed to rerun the tests. It's passing now.

itsjeyd · 2023-04-20T12:19:49Z

Thanks @pdpinch.

@Ardiea We can line this up for engineering review now.

@asadazam93 This is ready for engineering review by Infinity.

itsjeyd · 2023-04-26T13:21:12Z

Hey @asadazam93, just checking in to see if you had any updates on when this PR could potentially be reviewed by Infinity?

asadazam93 · 2023-05-15T16:38:32Z

@Ardiea Have you tested this in a production environment?

Ardiea · 2023-05-16T12:19:09Z

Yes this is running in four different MIT production environments right now plus an additional 7 non-prod environments. It resolved the issue mentioned in the top post immediately which at the time was affecting us 2-3 times a week across various environments.

openedx-webhooks · 2023-05-21T08:52:23Z

@Ardiea 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future.

fix: Updated the unicorn_tcp fork behavior

08e898b

Updated the unicorn_tcp before_fork and after_fork behaviors to be consistent with mongodb documentation.

openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Mar 21, 2023

Ardiea mentioned this pull request Mar 21, 2023

Tweaks to before_fork and after_fork behaviors when using Unicorn #406

Closed

itsjeyd added the needs test run Author's first PR to this repository, awaiting test authorization from Axim label Mar 28, 2023

mariajgrimaldi mentioned this pull request Apr 3, 2023

Backport for Olive.3 openedx/wg-build-test-release#253

Closed

6 tasks

e0d removed the needs test run Author's first PR to this repository, awaiting test authorization from Axim label Apr 7, 2023

itsjeyd added the waiting on author PR author needs to resolve review requests, answer questions, fix tests, etc. label Apr 13, 2023

ghassanmas mentioned this pull request Apr 13, 2023

Backport before palm.1 release openedx/wg-build-test-release#261

Closed

52 tasks

itsjeyd removed the waiting on author PR author needs to resolve review requests, answer questions, fix tests, etc. label Apr 20, 2023

asadazam93 approved these changes May 21, 2023

View reviewed changes

asadazam93 merged commit 4ca760a into openedx:master May 21, 2023

Ardiea mentioned this pull request May 30, 2023

Backport unicorn_tcp fork behavior fixes to Palm #412

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Updated the unicorn_tcp fork behavior #409

fix: Updated the unicorn_tcp fork behavior #409

Ardiea commented Mar 21, 2023

openedx-webhooks commented Mar 21, 2023

itsjeyd commented Mar 28, 2023

itsjeyd commented Apr 5, 2023

codecov bot commented Apr 6, 2023 •

edited

Loading

e0d commented Apr 7, 2023

Ardiea commented Apr 10, 2023

itsjeyd commented Apr 13, 2023

Ardiea commented Apr 13, 2023

pdpinch commented Apr 18, 2023

pdpinch commented Apr 18, 2023

itsjeyd commented Apr 20, 2023

itsjeyd commented Apr 26, 2023

asadazam93 commented May 15, 2023

Ardiea commented May 16, 2023 •

edited

Loading

openedx-webhooks commented May 21, 2023

fix: Updated the unicorn_tcp fork behavior #409

fix: Updated the unicorn_tcp fork behavior #409

Conversation

Ardiea commented Mar 21, 2023

Problem

Solution

Disclaimer

openedx-webhooks commented Mar 21, 2023

itsjeyd commented Mar 28, 2023

itsjeyd commented Apr 5, 2023

codecov bot commented Apr 6, 2023 • edited Loading

Codecov Report

e0d commented Apr 7, 2023

Ardiea commented Apr 10, 2023

itsjeyd commented Apr 13, 2023

Ardiea commented Apr 13, 2023

pdpinch commented Apr 18, 2023

pdpinch commented Apr 18, 2023

itsjeyd commented Apr 20, 2023

itsjeyd commented Apr 26, 2023

asadazam93 commented May 15, 2023

Ardiea commented May 16, 2023 • edited Loading

openedx-webhooks commented May 21, 2023

codecov bot commented Apr 6, 2023 •

edited

Loading

Ardiea commented May 16, 2023 •

edited

Loading