-
Notifications
You must be signed in to change notification settings - Fork 16.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Instructor Model to Embeddins #771
Closed
Closed
Changes from 6 commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
af7af69
Added Instructor Model to Embeddings
477069b
Updated embeddings and tests
f381568
Updated embeddings and tests
14b6cd7
Update structure
d81591e
Update structure
f63cebd
Update huggingface.py
enoreyes b9799a6
Update tests/integration_tests/embeddings/test_huggingface.py
enoreyes 196c9b7
Update langchain/embeddings/huggingface.py
enoreyes File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the Instructor paper - they support asymmetric instructions (ie. the instruction for embedding and the instruction for retrieval are different). From their repo, "Represent the Wikipedia document for retrieval: " is used for the original embedding and "Represent the Wikipedia question for retrieving supporting documents: " is used when constructing the query embedding. Would be good to support this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, however adding in the ability to do this makes the code a little wonky. Would love to get this merged first as a V1 and then add the asymmetric instructions component later. Alternatively, if you can figure out a good way to do this feel free to add it to this PR. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the idea of asymmetric prompts for an embed/query class like this feels somewhat generic - so my instinct would be just to have two parameters to the class
embed_instuction
andquery_instruction
instead of justinstruction
. They can both default toDEFAULT_INSTRUCTION
but then the embed and query methods can just reference the correct one. I am not a good judge if this is 'wonky' though.I agree that this would be ok to merge as is for v1 though - so maybe time for @hwchase17 to have a look again