-
Notifications
You must be signed in to change notification settings - Fork 108
Bug: Various sentry bugs #388
Comments
Two core issues were determined related to sidekiq/ redis queue/ background jobs.
Slack API Root CauseThe root cause of the slack api problems was caused by disabling a slack application on the workspace. This happened during a great idea (but little hasty) of mine to clean up all the random slack applicatons on the workspace. One of the applications appeared to be non-functional but the key used by it was was being used in a oc repo . This repo is hosted somewhere and is used as way to communicate with slack to invite users. It was found that BE was sending this repo a HTTP request to invite users and this repo was processing them. By removing the code that added items to this queue we thought this would be resolved. When the issues still occurred it was realized that the jobs were being retried up to the default ruby maximum. By keeping the ActiveJob in place for this event, but removing the call to external api and logging the queue we will slowly remove these requests. Currently @hollomancer is manually inviting users and seeing a better response. We think this is due to customized messages. Database Issues Root CauseAgain this was a sidekiq issue, we saw that the primary error was due to not finding the database We stopped adding new sidekiq jobs in the pathways that caused the issue, and saw the errors continue. Like above we realized that the rails were performing a retry, and after some time we noticed that no new jobs were happening, sidekiq will retry 25 times over 21 days. So eventually these errors would go away. Investigation showed that in Infra we use the same environment variables for
A couple of red herrings came up: But ultimately it was found that we were setting a variable @robbkidd noted that: database config expects a different format and that we were putting the host url in as a database url. By removing the line that sets this value. sidekiq would use default rails database connections in This was seen by:
Since robb had no users in his local db he recieved this error:
this error is better than
As of today our sentry errrors are wayyyy down: Recommended actions:
Personal comments:I think this issue took so long due to my lack of domain and language knowledge, I really thank people like @nellshamrell and @robbkidd for filling in the gaps. @ohaiwalt was also a boss in regards to undertanding Infra. In addition I was personally hampered due to lack of logs, and getting sentry access, I think we should find ways to make it so open source contributors can view these items without compromising OC security. Perhaps this is another use case for a staging environment. |
👍 |
Using this as a placeholder for all the new bugs until we can write out full issues:
1. ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool within 5.000 seconds (waited 5.000 seconds); all pool...2. Slack::Client::InviteFailed: {"ok"=>false, "error"=>"token_revoked"}3. ActiveJob::DeserializationError: Error while trying to deserialize arguments: FATAL: database "opcode-postgres" does not exist4. ActiveRecord::NoDatabaseError: FATAL: database "opcode-postgres" does not exist5. ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool within 5.000 seconds (waited 5.000 seconds); all pool...It appears when new users sign up we get a group of (2, 4, 3) at the same time.
I think 2 can be corrected by adding a legacy token to environment vars.
The other ones will take more troubleshooting and I'd like to get the full stack traces.
The text was updated successfully, but these errors were encountered: