Skip to content

Commit

Permalink
Fixes to the docs for timescale vector template (langchain-ai#12756)
Browse files Browse the repository at this point in the history
  • Loading branch information
cevian authored and xieqihui committed Nov 21, 2023
1 parent 8efe0fa commit 998fa0d
Show file tree
Hide file tree
Showing 2 changed files with 62 additions and 17 deletions.
2 changes: 1 addition & 1 deletion templates/docs/INDEX.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ These templates cover advanced retrieval techniques.
- [Anthropic Iterative Search](../anthropic-iterative-search): This retrieval technique uses iterative prompting to determine what to retrieve and whether the retriever documents are good enough.
- [Neo4j Parent Document Retrieval](../neo4j-parent): This retrieval technique stores embeddings for smaller chunks, but then returns larger chunks to pass to the model for generation.
- [Semi-Structured RAG](../rag-semi-structured): The template shows how to do retrieval over semi-structured data (e.g. data that involves both text and tables).
- [Temporal RAG](../rag-timescale-hybrid-search-time): The template shows how to do retrieval over data that has a strong time-based component.
- [Temporal RAG](../rag-timescale-hybrid-search-time): The template shows how to do hybrid search over data with a time-based component using [Timescale Vector](https://www.timescale.com/ai?utm_campaign=vectorlaunch&utm_source=langchain&utm_medium=referral).

## 🔍Advanced Retrieval - Query Transformation

Expand Down
77 changes: 61 additions & 16 deletions templates/rag-timescale-hybrid-search-time/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,7 @@ This is useful any time your data has a strong time-based component. Some exampl

Such items are often searched by both similarity and time. For example: Show me all news about Toyota trucks from 2022.

[Timescale Vector](https://www.timescale.com/ai?utm_campaign=vectorlaunch&utm_source=langchain&utm_medium=referral) provides superior performance when searching for embeddings within a particular
timeframe by leveraging automatic table partitioning to isolate data for particular time-ranges.
[Timescale Vector](https://www.timescale.com/ai?utm_campaign=vectorlaunch&utm_source=langchain&utm_medium=referral) provides superior performance when searching for embeddings within a particular timeframe by leveraging automatic table partitioning to isolate data for particular time-ranges.

Langchain's self-query retriever allows deducing time-ranges (as well as other search criteria) from the text of user queries.

Expand All @@ -35,29 +34,75 @@ Timescale Vector is available on [Timescale](https://www.timescale.com/products?
- To get started, [signup](https://console.cloud.timescale.com/signup?utm_campaign=vectorlaunch&utm_source=langchain&utm_medium=referral) to Timescale, create a new database and follow this notebook!
- See the [installation instructions](https://github.com/timescale/python-vector) for more details on using Timescale Vector in python.

### Using Timescale Vector with this template
## Environment Setup

This template uses TimescaleVector as a vectorstore and requires that `TIMESCALES_SERVICE_URL` is set.
This template uses Timescale Vector as a vectorstore and requires that `TIMESCALES_SERVICE_URL`. Signup for a 90-day trial [here](https://console.cloud.timescale.com/signup?utm_campaign=vectorlaunch&utm_source=langchain&utm_medium=referral) if you don't yet have an account.

## LLM
To load the sample dataset, set `LOAD_SAMPLE_DATA=1`. To load your own dataset see the section below.

Be sure that `OPENAI_API_KEY` is set in order to the OpenAI models.
Set the `OPENAI_API_KEY` environment variable to access the OpenAI models.

## Loading sample data
## Usage

We have provided a sample dataset you can use for demoing this template. It consists of the git history of the timescale project.
To use this package, you should first have the LangChain CLI installed:

To load this dataset, set the `LOAD_SAMPLE_DATA` environmental variable.
```shell
pip install -U "langchain-cli[serve]"
```

## Loading your own dataset.
To create a new LangChain project and install this as the only package, you can do:

To load your own dataset you will have to modify the code in the `DATASET SPECIFIC CODE` section of `chain.py`.
This code defines the name of the collection, how to load the data, and the human-language description of both the
contents of the collection and all of the metadata. The human-language descriptions are used by the self-query retriever
to help the LLM convert the question into filters on the metadata when searching the data in Timescale-vector.
```shell
langchain app new my-app --package rag-timescale-hybrid-search-time
```

If you want to add this to an existing project, you can just run:

```shell
langchain app add rag-timescale-hybrid-search-time
```

And add the following code to your `server.py` file:
```python
from rag_timescale_hybrid_search import chain as rag_timescale_hybrid_search_chain

add_routes(app, rag_timescale_hybrid_search_chain, path="/rag-timescale-hybrid-search")
```

(Optional) Let's now configure LangSmith.
LangSmith will help us trace, monitor and debug LangChain applications.
LangSmith is currently in private beta, you can sign up [here](https://smith.langchain.com/).
If you don't have access, you can skip this section

```shell
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY=<your-api-key>
export LANGCHAIN_PROJECT=<your-project> # if not specified, defaults to "default"
```

## Using in your own applications
If you are inside this directory, then you can spin up a LangServe instance directly by:

This is a standard LangServe template. Instructions on how to use it with your LangServe applications are [here](https://github.com/langchain-ai/langchain/blob/master/templates/README.md).
```shell
langchain serve
```

This will start the FastAPI app with a server is running locally at
[http://localhost:8000](http://localhost:8000)

We can see all templates at [http://127.0.0.1:8000/docs](http://127.0.0.1:8000/docs)
We can access the playground at [http://127.0.0.1:8000/rag-timescale-hybrid-search/playground](http://127.0.0.1:8000/rag-timescale-hybrid-search/playground)

We can access the template from code with:

```python
from langserve.client import RemoteRunnable

runnable = RemoteRunnable("http://localhost:8000/rag-timescale-hybrid-search")
```

## Loading your own dataset

To load your own dataset you will have to modify the code in the `DATASET SPECIFIC CODE` section of `chain.py`.
This code defines the name of the collection, how to load the data, and the human-language description of both the
contents of the collection and all of the metadata. The human-language descriptions are used by the self-query retriever
to help the LLM convert the question into filters on the metadata when searching the data in Timescale-vector.

0 comments on commit 998fa0d

Please sign in to comment.