-
Notifications
You must be signed in to change notification settings - Fork 24
Elasticsearch 'fields' parameter not usable in similarity_search* methods #64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…lds parameter in similarity_search* methods.
filter: Optional[List[dict]] = None, | ||
*, | ||
fields: Optional[List[str]] = None, | ||
*, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's still a random indent here.
page_content=hit["_source"].get(content_field, ""), | ||
metadata=hit["_source"].get("metadata", {}), | ||
) | ||
for field_key in fields: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines (84-88) (now 87-91) do exactly that, wouldn't those lines (80-88) get removed then? Maybe there are other instances in which people create more sophisticated document builders and the fields should be added anyway as they are being added in those lines.
In my current code I just made the changes from above and everything works as expected without any changes in this file.
Maybe the maintainers can have a look at this.
I'm not sure this is needed. I find it confusing that this change is mixing up independent fields (which are not supported by LangChain) with metadata fields, which are the proper way to store additional fields when working with LangChain. How do you plan on addressing collisions if you are just adding standalone fields to the search results as metadata? Maybe I need some clarification on how you ended up with an Elasticsearch index that has extra fields, given that the LangChain integration does not add them, so that I better understand the use case that your PR is trying to address. Thanks! |
I think the changes to the |
@sbartlett97 I understand that this solves a problem that and a few others have, but you are not addressing the concerns I raised. LangChain has its own solution for extra fields attached to a document, and this integration tries to adapt to that solution. Please correct me if I'm wrong, but my guess is that you are trying to use an Elasticsearch index that was not created with the LangChain integration. My thinking is that there are two options available to you:
My concerns with this change:
If I'm missing anything with my analysis please clarify. |
@miguelgrinberg I understand your concerns, and indeed, my use case is integrating data from an existing elastic index that wasn't setup using this package.
However, if the
Also the changes I proposed was to leave the metadata handling in |
I need you to clarify why you think this is an inconsistency. The I guess it is possible that there are inconsistencies in this LangChain integration (which has been worked on by several people and not just by myself), but at least the one example you pointed out does not seem inconsistent. As I said above, if the support we provide directly in our official client works better for your needs, then there is really no need for you to be using the LangChain integration. Edit: sorry, actually I do see the inconsistency now, which exists in the But as I mentioned several times above, why do you need this integration package if you can just work with the ES client directly and not be constrained to the LangChain document structure which you obviously do not care about? |
See Issue #62 for the description for this PR.
@giuliabaldini