Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: support milvus to full text search #11430

Merged
merged 8 commits into from
Jan 8, 2025

Conversation

kgpp34
Copy link
Contributor

@kgpp34 kgpp34 commented Dec 6, 2024

Summary

This pull request focuses on enabling full-text search capabilities within the Milvus data source, improving the overall search functionality and ensuring that the integration is robust and reliable.

Resolves #11370

Motivation and Context

The motivation behind this change is to improve the search capabilities of the Dify project by integrating full-text search support in the Milvus data source. This enhancement allows for more efficient and accurate retrieval of relevant data, which is crucial for improving the overall user experience.

Dependencies

  • This change requires the following dependencies:
    • pymilvus (version update to 2.5.0)
    • python-dotenv (version update to 1.0.1)
    • milvus (version 2.5.0-beta)

Detailed Changes

  • Modified: core/rag/datasource/vdb/field.py

    • Updated to support new sparse_vector field types and configurations required for full-text search in Milvus.
  • Modified: core/rag/datasource/vdb/milvus/milvus_vector.py

    • Implemented full-text search capabilities within the Milvus vector data source, ensuring efficient and accurate retrieval of relevant data.
  • Modified: pyproject.toml

    • Updated project configuration to include new dependencies and settings necessary for full-text search support in Milvus.
  • Modified: tests/integration_tests/vdb/milvus/test_milvus.py

    • Enhanced integration tests to cover the new full-text search functionality, ensuring that the Milvus integration behaves as expected under various scenarios.
  • Modified: docker/docker-compose.yaml

    • Update docker-compose configuration to include milvus 2.5.0-beta

Checklist

Important

Please review the checklist below before submitting your pull request.

  • This change requires a documentation update, included: Dify Document
  • I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.
  • I ran dev/reformat(backend) and cd web && npx lint-staged(frontend) to appease the lint gods

@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. 👻 feat:rag Embedding related issue, like qdrant, weaviate, milvus, vector database. labels Dec 6, 2024
@JohnJyong
Copy link
Collaborator

May i Know will this affect the old data? @kgpp34

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Dec 7, 2024
@kgpp34
Copy link
Contributor Author

kgpp34 commented Dec 7, 2024

May i Know will this affect the old data? @kgpp34

I fix my code and it will guarantee the change will not influence the CRUD on the old data

@crazywoola
Copy link
Member

Please fix the lint errors

@kgpp34
Copy link
Contributor Author

kgpp34 commented Dec 9, 2024

Please fix the lint errors

i have fixed the poetry lock link error

@JohnJyong
Copy link
Collaborator

pls resolve conflicts thanks @kgpp34

@JohnJyong JohnJyong self-requested a review December 27, 2024 08:28
@kgpp34
Copy link
Contributor Author

kgpp34 commented Dec 28, 2024

pls resolve conflicts thanks @kgpp34

I have resolved these conflicts, thanks

@crazywoola
Copy link
Member

@kgpp34 Hello, there are some conflicts when rebasing the main branch, please resolve it again. :) The rest LGTM.

@crazywoola crazywoola self-assigned this Jan 7, 2025
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jan 7, 2025
@kgpp34 kgpp34 force-pushed the feat/milvus_full_text_search branch from 7e3a615 to 629ef76 Compare January 8, 2025 01:02
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Jan 8, 2025
kgpp34 added 6 commits January 8, 2025 11:52
- Modified  to support new sparse_vector field types and configurations required for full-text search.
- Implemented full-text search capabilities in .
- Updated  to include new dependencies(pymilvus==2.5.0 ]python-dotenv=1.0.1)
- Enhanced integration tests in  to cover the new full-text search functionality.

This commit improves the search functionality within the Milvus data source, ensuring efficient and accurate retrieval of relevant data.

Signed-off-by: YoungLH <974840768@qq.com>
…lt docker-compose.yaml

Signed-off-by: YoungLH <974840768@qq.com>
Signed-off-by: YoungLH <974840768@qq.com>
…of existing data

Signed-off-by: YoungLH <974840768@qq.com>
Signed-off-by: YoungLH <974840768@qq.com>
Signed-off-by: YoungLH <974840768@qq.com>
@kgpp34 kgpp34 force-pushed the feat/milvus_full_text_search branch from 629ef76 to 01cfbec Compare January 8, 2025 03:55
Signed-off-by: YoungLH <974840768@qq.com>
@kgpp34
Copy link
Contributor Author

kgpp34 commented Jan 8, 2025

crazywoola

@crazywoola I have resolved the conflicts in poetry.lock by regenerating it. Please review again. :)

@JohnJyong
Copy link
Collaborator

this function has a breaking change: if user's milvus version is below 2.5, it may cause error. so, I suggest that you can add a env variable to check user's version @kgpp34

- Added an environment variable  to allow users to explicitly enable or disable hybrid search.
- Updated the  function to respect the environment variable and check the Milvus server version.
- If the Milvus version is below 2.5.0 or hybrid search is disabled via the environment variable, hybrid search functionality will be disabled to prevent errors.
- Improved error handling and logging for version checks to ensure graceful fallback when hybrid search is not supported.

This change ensures that users with Milvus versions below 2.5.0 do not encounter breaking changes and can continue using the system without hybrid search functionality.

Signed-off-by: YoungLH <974840768@qq.com>
@kgpp34
Copy link
Contributor Author

kgpp34 commented Jan 8, 2025

this function has a breaking change: if user's milvus version is below 2.5, it may cause error. so, I suggest that you can add a env variable to check user's version @kgpp34

Hi @JohnJyong ,

Thank you for your valuable feedback! I’ve addressed the issue by adding an environment variable MILVUS_ENABLE_HYBRID_SEARCH to allow users to explicitly enable or disable hybrid search. Additionally, I’ve updated the _check_hybrid_search_support function to check the Milvus server version and ensure backward compatibility for users with versions below 2.5.0.

If the Milvus version is below 2.5.0 or hybrid search is disabled via the environment variable, the system will gracefully disable hybrid search functionality to prevent errors. This ensures that users with older Milvus versions can continue using the system without issues.

Please let me know if there’s anything else that needs to be addressed!

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jan 8, 2025
@JohnJyong
Copy link
Collaborator

LGTM

@JohnJyong JohnJyong merged commit 040a3b7 into langgenius:main Jan 8, 2025
7 checks passed
JohnJyong added a commit that referenced this pull request Jan 8, 2025
@JohnJyong JohnJyong mentioned this pull request Jan 8, 2025
5 tasks
JohnJyong added a commit that referenced this pull request Jan 8, 2025
@shengxiagit
Copy link

If the VDB type is Milvus, v0.15.0 the Retrieval method only show semantic search.

api/controllers/console/datasets/datasets.py
DatasetRetrievalSettingApi

@kgpp34 kgpp34 deleted the feat/milvus_full_text_search branch January 9, 2025 02:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
👻 feat:rag Embedding related issue, like qdrant, weaviate, milvus, vector database. lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for Milvus Full Text Search in Dify
4 participants