-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question]: Question related to Output of retriever.retrieve() #16608
Comments
Hey @Kai-Chen00! How can I assist you today? I'm here to help you with any bugs, questions, or contributions you have in mind. Let's dive into the issue and see what we can uncover! I found a similar unsolved discussion titled Embedding is None and a closed issue titled [Question]: Retriever documents, embeddings = None. The suggested solution for the closed issue is to use It seems that the issue might be related to how the vector store is populated. Ensure that the vector store is populated with the necessary fields, such as text, which LlamaIndex expects. Double-check that the text field is correctly populated in the vector store [3]. |
@dosu, the database I using is chromaDB, any advice for that to check the embedding of chunks? |
To check the embeddings of chunks in ChromaDB when using LlamaIndex, you can verify if the embeddings are correctly stored by using a test function like @pytest.mark.skipif(chromadb_not_available, reason="chromadb is not available")
@pytest.mark.asyncio()
@pytest.mark.parametrize("use_async", [True, False])
async def test_add_to_chromadb_and_query(
vector_store: ChromaVectorStore,
node_embeddings: List[TextNode],
use_async: bool,
) -> None:
if use_async:
await vector_store.async_add(node_embeddings)
res = await vector_store.aquery(
VectorStoreQuery(query_embedding=[1.0, 0.0, 0.0], similarity_top_k=1)
)
else:
vector_store.add(node_embeddings)
res = vector_store.query(
VectorStoreQuery(query_embedding=[1.0, 0.0, 0.0], similarity_top_k=1)
)
assert res.nodes
assert res.nodes[0].get_content() == "lorem ipsum" This test checks if the embeddings are correctly added and can be queried by asserting that the result nodes are not empty and that the content of the first node matches the expected text [1]. |
@dosu, what about the meaning of the key "socre" for a NodeWithScore object? |
The "score" key in a |
@dosu, How to set "score"? And I've set cosine distance as the metric to retreive top k relevant chunks. So the score here would be cosine distance? |
To set the "score" for a NodeWithScore(
node=TextNode(text=ret["text"], extra_info=ret["metadata"]),
# Transform cosine distance into a similarity score (higher is more similar)
score=1 - ret["dist"],
) This transformation ensures that a higher score indicates a closer match to the query [1]. |
@dosu, No, I mean how to know the meaning of the "score" in my context, I'm not sure if the metric I'm using is cosine similarity. |
The "score" in a |
By these codes, what would be the meaning of the key "score" for nodes[0] |
In your code, the "score" key for |
@dosu, what if I want to calculate the eculidean distance as the metric to retrieve, how can I modify the code? |
To calculate Euclidean distance as the metric for retrieval, you need to modify the vector_store_euclidean = OraLlamaVS.from_documents(
text_nodes,
table_name="Documents_EUCLIDEAN",
client=connection,
distance_strategy=DistanceStrategy.EUCLIDEAN_DISTANCE,
) This code snippet sets the distance metric to Euclidean distance for the vector store. If you are using a different vector store, ensure to replace the |
Question Validation
Question
When I print out the nodes[0]
I found there is a key "embedding" is None, why? Shouldn't all chunks in the database already be vectorized? I expect the value of embedding to be a vector.
The text was updated successfully, but these errors were encountered: