Add 'truncate' parameter for CohereEmbeddings #798

ephe-meral · 2023-01-29T18:26:03Z

Currently, the 'truncate' parameter of the cohere API is not supported.

This means that by default, if trying to generate and embedding that is too big, the call will just fail with an error (which is frustrating if using this embedding source e.g. with GPT-Index, because it's hard to handle it properly when generating a lot of embeddings).
With the parameter, one can decide to either truncate the START or END of the text to fit the max token length and still generate an embedding without throwing the error.

In this PR, I added this parameter to the class.

Arguably, there should be a better way to handle this error, e.g. by optionally calling a function or so that gets triggered when the token limit is reached and can split the document or some such. Especially in the use case with GPT-Index, its often hard to estimate the token counts for each document and I'd rather sort out the troublemakers or simply split them than interrupting the whole execution.
Thoughts?

hwchase17

thanks for this! really curious to hear more about your last comment - do you want to dm me to discuss more?

As #798, this commit adds the option to truncate the user's inputs larger than what the model can handle. This defaults to None in the Cohere SDK instead of directly passing "NONE", so I maintained the same default value (see: https://github.com/cohere-ai/cohere-python/blob/7f30bfd40c98b88f7d08f8c05db5b91ddb1310d1/cohere/client.py#L127) PS: I think that both this and #798 should default to the same value.

add 'truncate' parameter for cohere embeddings

2f6c42e

hwchase17 approved these changes Jan 29, 2023

View reviewed changes

hwchase17 and others added 3 commits January 30, 2023 23:50

Merge branch 'master' into patch-2

7afd692

fix self.truncate reference issues

d3ea935

reformat cohere embed code

d2689ce

hwchase17 merged commit ebea40c into langchain-ai:master Feb 1, 2023

stepp1 mentioned this pull request Feb 23, 2023

Add truncate argument for Cohere's LLM #1256

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 'truncate' parameter for CohereEmbeddings #798

Add 'truncate' parameter for CohereEmbeddings #798

ephe-meral commented Jan 29, 2023

hwchase17 left a comment

Add 'truncate' parameter for CohereEmbeddings #798

Add 'truncate' parameter for CohereEmbeddings #798

Conversation

ephe-meral commented Jan 29, 2023

hwchase17 left a comment

Choose a reason for hiding this comment