Skip to content

Latest commit

 

History

History
17 lines (9 loc) · 1.51 KB

File metadata and controls

17 lines (9 loc) · 1.51 KB

HyDE - Hypothetical Document Embeddings

This is advanced RAG approach to dense retrieval that promises to make searching for information even more efficient and accurate

we provided Colab walkthrough for HyDE implementation Open In Colab

Learn deeper in Our Blog

The HyDE approach recognizes the difficulty of zero-shot learning and encoding relevance without labeled data. Instead, it leverages the power of language models and hypothetical documents. Here’s how it works:

  1. Generating Hypothetical Documents: When a user enters a query, HyDE instructs a language model, like GPT-3, to generate a hypothetical document. This document is designed to capture relevance patterns but may contain inaccuracies.
  2. Unsupervised Encoding: The generated hypothetical document is then encoded into an embedding vector using an unsupervised contrastive encoder. This vector identifies a neighborhood in the corpus embedding space, where similar real documents are retrieved based on vector similarity.
  3. Retrieval Process: HyDE searches for real documents in the corpus that are most similar to the encoded hypothetical document. The retrieved documents are then presented as search results.

Read the Blog Post