Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(backend): Fix .env file read contention on pyro connection setup #8736

Merged
merged 7 commits into from
Nov 25, 2024

Conversation

majdyz
Copy link
Contributor

@majdyz majdyz commented Nov 21, 2024

On high concurrent pyro connections initialization, the .env file being on the high read contention causing connection establishment to fail.

Calling Secrets(), Config(), and Settings(), will always trigger a file read. And when this is executed too many times it will cause read contention and pass the file descriptor limit set by the OS. This will trigger retry and causes delay & thundering herd issues. In Unix this is errored out as:

  • Too Many Open Files or
  • There are only ... file descriptors (hard limit) available

To reproduce the error you can try to do the run locally and run hundreds of concurrent requests.

Changes 🏗️

Move all the usage of Config as a globally initialized variable so it's only called once per process.

Checklist 📋

For code changes:

  • I have clearly listed my changes in the PR description
  • I have made a test plan
  • I have tested my changes according to the test plan:
    • ...
Example test plan
  • Create from scratch and execute an agent with at least 3 blocks
  • Import an agent from file upload, and confirm it executes correctly
  • Upload agent to marketplace
  • Import an agent from marketplace and confirm it executes correctly
  • Edit an agent from monitor, and confirm it executes correctly

For configuration changes:

  • .env.example is updated or already compatible with my changes
  • docker-compose.yml is updated or already compatible with my changes
  • I have included a list of my configuration changes in the PR description (under Changes)
Examples of configuration changes
  • Changing ports
  • Adding new services that need to communicate with each other
  • Secrets or environment variable changes
  • New or infrastructure changes such as databases

@majdyz majdyz requested a review from aarushik93 November 21, 2024 17:09
@majdyz majdyz requested a review from a team as a code owner November 21, 2024 17:09
@majdyz majdyz requested review from Torantulino and removed request for a team November 21, 2024 17:09
Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

API Changes
Several API endpoints were changed from async to sync functions. Need to verify this won't impact request handling performance or concurrency.

Configuration Issue
The pyro_host variable is used before being defined when constructing the client URI. This could lead to connection errors.

Logging Level Change
Changed retry failure logging from info to error level. Verify this aligns with logging strategy and won't flood error logs.

@github-actions github-actions bot added platform/backend AutoGPT Platform - Back end size/m labels Nov 21, 2024
Copy link

netlify bot commented Nov 21, 2024

Deploy Preview for auto-gpt-docs canceled.

Name Link
🔨 Latest commit a73c63a
🔍 Latest deploy log https://app.netlify.com/sites/auto-gpt-docs/deploys/67440e159b5e7800086aa5d6

@aarushik93
Copy link
Contributor

@majdyz is this related to today's outage? Because we don't read from .env

@majdyz
Copy link
Contributor Author

majdyz commented Nov 21, 2024

@aarushik93 unrelated, just something I found during the local test.

@aarushik93
Copy link
Contributor

Can you explain a bit more, the PR description doesn't really explain to me what the issue is or what its fixing

@majdyz
Copy link
Contributor Author

majdyz commented Nov 25, 2024

@aarushik93 Described in the PR like this:

On high concurrent pyro connections initialization, the .env file being on the high read contention causing connection establishment to fail.

Calling `Secrets()`, `Config()`, and `Settings()`, will always trigger a file read. And when this is executed too many times it will cause read contention and pass the file descriptor limit set by the OS. This will trigger retry and causes delay & thundering herd issues. In Unix this is errored out as:
* `Too Many Open Files` or
* `There are only ... file descriptors (hard limit) available`

Let me know if there is any part that needs to be clarified.

@aarushik93
Copy link
Contributor

Gotcha, thanks that clears it up for me!

@majdyz majdyz merged commit f00654c into dev Nov 25, 2024
15 checks passed
@majdyz majdyz deleted the zamilmajdy/fix-env-file-contention branch November 25, 2024 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants