Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrieval Service efficiency optimization #13543

Merged
merged 14 commits into from
Feb 17, 2025

Conversation

charli117
Copy link
Contributor

@charli117 charli117 commented Feb 11, 2025

…taset fetching

Summary

Fix #13553

There are two main points:

  1. multithreading. Instead of using ThreadPoolExecutor (ThreadPoolExecutor) to start a new Thread each time, can reuse threads and reduce the cost of thread creation and destruction.
  2. Database query optimization. When querying document and paragraph information in batches, load_only has been used to reduce data transmission and reduce the number of single queries.

Tip

Close issue syntax: Fixes #<issue number> or Resolves #<issue number>, see documentation for more details.

Screenshots

After optimization, the speed of hit test is increased significantly
image
image

Checklist

Important

Please review the checklist below before submitting your pull request.

  • This change requires a documentation update, included: Dify Document
  • I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.
  • I ran dev/reformat(backend) and cd web && npx lint-staged(frontend) to appease the lint gods

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. 💪 enhancement New feature or request labels Feb 11, 2025
@JohnJyong JohnJyong self-assigned this Feb 11, 2025
@JohnJyong JohnJyong self-requested a review February 11, 2025 15:03
@JohnJyong
Copy link
Collaborator

pls fix the mypy check, thanks @charli117

@charli117
Copy link
Contributor Author

fix this problem #13553

Copy link
Contributor

@bowenliang123 bowenliang123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor comments.

@charli117
Copy link
Contributor Author

LGTM with minor comments.

@bowenliang123 Very good optimization suggestion, already arranged

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Feb 17, 2025
@JohnJyong JohnJyong merged commit 222df44 into langgenius:main Feb 17, 2025
8 checks passed
chinnsenn pushed a commit to chinnsenn/dify that referenced this pull request Mar 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💪 enhancement New feature or request lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Knowledge base retrieval efficiency problem
3 participants