From 998fa0d3f98cafa84aa0edf26ea9e0838dcdb42a Mon Sep 17 00:00:00 2001 From: Matvey Arye Date: Wed, 1 Nov 2023 21:48:23 -0400 Subject: [PATCH] Fixes to the docs for timescale vector template (#12756) --- templates/docs/INDEX.md | 2 +- .../README.md | 77 +++++++++++++++---- 2 files changed, 62 insertions(+), 17 deletions(-) diff --git a/templates/docs/INDEX.md b/templates/docs/INDEX.md index 6bc0f584e9087..d96f5517d5022 100644 --- a/templates/docs/INDEX.md +++ b/templates/docs/INDEX.md @@ -21,7 +21,7 @@ These templates cover advanced retrieval techniques. - [Anthropic Iterative Search](../anthropic-iterative-search): This retrieval technique uses iterative prompting to determine what to retrieve and whether the retriever documents are good enough. - [Neo4j Parent Document Retrieval](../neo4j-parent): This retrieval technique stores embeddings for smaller chunks, but then returns larger chunks to pass to the model for generation. - [Semi-Structured RAG](../rag-semi-structured): The template shows how to do retrieval over semi-structured data (e.g. data that involves both text and tables). -- [Temporal RAG](../rag-timescale-hybrid-search-time): The template shows how to do retrieval over data that has a strong time-based component. +- [Temporal RAG](../rag-timescale-hybrid-search-time): The template shows how to do hybrid search over data with a time-based component using [Timescale Vector](https://www.timescale.com/ai?utm_campaign=vectorlaunch&utm_source=langchain&utm_medium=referral). ## 🔍Advanced Retrieval - Query Transformation diff --git a/templates/rag-timescale-hybrid-search-time/README.md b/templates/rag-timescale-hybrid-search-time/README.md index d7868fd97936c..3bd1e2162b2b0 100644 --- a/templates/rag-timescale-hybrid-search-time/README.md +++ b/templates/rag-timescale-hybrid-search-time/README.md @@ -10,8 +10,7 @@ This is useful any time your data has a strong time-based component. Some exampl Such items are often searched by both similarity and time. For example: Show me all news about Toyota trucks from 2022. -[Timescale Vector](https://www.timescale.com/ai?utm_campaign=vectorlaunch&utm_source=langchain&utm_medium=referral) provides superior performance when searching for embeddings within a particular -timeframe by leveraging automatic table partitioning to isolate data for particular time-ranges. +[Timescale Vector](https://www.timescale.com/ai?utm_campaign=vectorlaunch&utm_source=langchain&utm_medium=referral) provides superior performance when searching for embeddings within a particular timeframe by leveraging automatic table partitioning to isolate data for particular time-ranges. Langchain's self-query retriever allows deducing time-ranges (as well as other search criteria) from the text of user queries. @@ -35,29 +34,75 @@ Timescale Vector is available on [Timescale](https://www.timescale.com/products? - To get started, [signup](https://console.cloud.timescale.com/signup?utm_campaign=vectorlaunch&utm_source=langchain&utm_medium=referral) to Timescale, create a new database and follow this notebook! - See the [installation instructions](https://github.com/timescale/python-vector) for more details on using Timescale Vector in python. -### Using Timescale Vector with this template +## Environment Setup -This template uses TimescaleVector as a vectorstore and requires that `TIMESCALES_SERVICE_URL` is set. +This template uses Timescale Vector as a vectorstore and requires that `TIMESCALES_SERVICE_URL`. Signup for a 90-day trial [here](https://console.cloud.timescale.com/signup?utm_campaign=vectorlaunch&utm_source=langchain&utm_medium=referral) if you don't yet have an account. -## LLM +To load the sample dataset, set `LOAD_SAMPLE_DATA=1`. To load your own dataset see the section below. -Be sure that `OPENAI_API_KEY` is set in order to the OpenAI models. +Set the `OPENAI_API_KEY` environment variable to access the OpenAI models. -## Loading sample data +## Usage -We have provided a sample dataset you can use for demoing this template. It consists of the git history of the timescale project. +To use this package, you should first have the LangChain CLI installed: -To load this dataset, set the `LOAD_SAMPLE_DATA` environmental variable. +```shell +pip install -U "langchain-cli[serve]" +``` -## Loading your own dataset. +To create a new LangChain project and install this as the only package, you can do: -To load your own dataset you will have to modify the code in the `DATASET SPECIFIC CODE` section of `chain.py`. -This code defines the name of the collection, how to load the data, and the human-language description of both the -contents of the collection and all of the metadata. The human-language descriptions are used by the self-query retriever -to help the LLM convert the question into filters on the metadata when searching the data in Timescale-vector. +```shell +langchain app new my-app --package rag-timescale-hybrid-search-time +``` + +If you want to add this to an existing project, you can just run: + +```shell +langchain app add rag-timescale-hybrid-search-time +``` + +And add the following code to your `server.py` file: +```python +from rag_timescale_hybrid_search import chain as rag_timescale_hybrid_search_chain + +add_routes(app, rag_timescale_hybrid_search_chain, path="/rag-timescale-hybrid-search") +``` + +(Optional) Let's now configure LangSmith. +LangSmith will help us trace, monitor and debug LangChain applications. +LangSmith is currently in private beta, you can sign up [here](https://smith.langchain.com/). +If you don't have access, you can skip this section + +```shell +export LANGCHAIN_TRACING_V2=true +export LANGCHAIN_API_KEY= +export LANGCHAIN_PROJECT= # if not specified, defaults to "default" +``` -## Using in your own applications +If you are inside this directory, then you can spin up a LangServe instance directly by: -This is a standard LangServe template. Instructions on how to use it with your LangServe applications are [here](https://github.com/langchain-ai/langchain/blob/master/templates/README.md). +```shell +langchain serve +``` +This will start the FastAPI app with a server is running locally at +[http://localhost:8000](http://localhost:8000) +We can see all templates at [http://127.0.0.1:8000/docs](http://127.0.0.1:8000/docs) +We can access the playground at [http://127.0.0.1:8000/rag-timescale-hybrid-search/playground](http://127.0.0.1:8000/rag-timescale-hybrid-search/playground) + +We can access the template from code with: + +```python +from langserve.client import RemoteRunnable + +runnable = RemoteRunnable("http://localhost:8000/rag-timescale-hybrid-search") +``` + +## Loading your own dataset + +To load your own dataset you will have to modify the code in the `DATASET SPECIFIC CODE` section of `chain.py`. +This code defines the name of the collection, how to load the data, and the human-language description of both the +contents of the collection and all of the metadata. The human-language descriptions are used by the self-query retriever +to help the LLM convert the question into filters on the metadata when searching the data in Timescale-vector. \ No newline at end of file