Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes to the docs for timescale vector template #12756

Merged
merged 1 commit into from
Nov 2, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion templates/docs/INDEX.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ These templates cover advanced retrieval techniques.
- [Anthropic Iterative Search](../anthropic-iterative-search): This retrieval technique uses iterative prompting to determine what to retrieve and whether the retriever documents are good enough.
- [Neo4j Parent Document Retrieval](../neo4j-parent): This retrieval technique stores embeddings for smaller chunks, but then returns larger chunks to pass to the model for generation.
- [Semi-Structured RAG](../rag-semi-structured): The template shows how to do retrieval over semi-structured data (e.g. data that involves both text and tables).
- [Temporal RAG](../rag-timescale-hybrid-search-time): The template shows how to do retrieval over data that has a strong time-based component.
- [Temporal RAG](../rag-timescale-hybrid-search-time): The template shows how to do hybrid search over data with a time-based component using [Timescale Vector](https://www.timescale.com/ai?utm_campaign=vectorlaunch&utm_source=langchain&utm_medium=referral).

## 🔍Advanced Retrieval - Query Transformation

Expand Down
77 changes: 61 additions & 16 deletions templates/rag-timescale-hybrid-search-time/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,7 @@ This is useful any time your data has a strong time-based component. Some exampl

Such items are often searched by both similarity and time. For example: Show me all news about Toyota trucks from 2022.

[Timescale Vector](https://www.timescale.com/ai?utm_campaign=vectorlaunch&utm_source=langchain&utm_medium=referral) provides superior performance when searching for embeddings within a particular
timeframe by leveraging automatic table partitioning to isolate data for particular time-ranges.
[Timescale Vector](https://www.timescale.com/ai?utm_campaign=vectorlaunch&utm_source=langchain&utm_medium=referral) provides superior performance when searching for embeddings within a particular timeframe by leveraging automatic table partitioning to isolate data for particular time-ranges.

Langchain's self-query retriever allows deducing time-ranges (as well as other search criteria) from the text of user queries.

Expand All @@ -35,29 +34,75 @@ Timescale Vector is available on [Timescale](https://www.timescale.com/products?
- To get started, [signup](https://console.cloud.timescale.com/signup?utm_campaign=vectorlaunch&utm_source=langchain&utm_medium=referral) to Timescale, create a new database and follow this notebook!
- See the [installation instructions](https://github.com/timescale/python-vector) for more details on using Timescale Vector in python.

### Using Timescale Vector with this template
## Environment Setup

This template uses TimescaleVector as a vectorstore and requires that `TIMESCALES_SERVICE_URL` is set.
This template uses Timescale Vector as a vectorstore and requires that `TIMESCALES_SERVICE_URL`. Signup for a 90-day trial [here](https://console.cloud.timescale.com/signup?utm_campaign=vectorlaunch&utm_source=langchain&utm_medium=referral) if you don't yet have an account.

## LLM
To load the sample dataset, set `LOAD_SAMPLE_DATA=1`. To load your own dataset see the section below.

Be sure that `OPENAI_API_KEY` is set in order to the OpenAI models.
Set the `OPENAI_API_KEY` environment variable to access the OpenAI models.

## Loading sample data
## Usage

We have provided a sample dataset you can use for demoing this template. It consists of the git history of the timescale project.
To use this package, you should first have the LangChain CLI installed:

To load this dataset, set the `LOAD_SAMPLE_DATA` environmental variable.
```shell
pip install -U "langchain-cli[serve]"
```

## Loading your own dataset.
To create a new LangChain project and install this as the only package, you can do:

To load your own dataset you will have to modify the code in the `DATASET SPECIFIC CODE` section of `chain.py`.
This code defines the name of the collection, how to load the data, and the human-language description of both the
contents of the collection and all of the metadata. The human-language descriptions are used by the self-query retriever
to help the LLM convert the question into filters on the metadata when searching the data in Timescale-vector.
```shell
langchain app new my-app --package rag-timescale-hybrid-search-time
```

If you want to add this to an existing project, you can just run:

```shell
langchain app add rag-timescale-hybrid-search-time
```

And add the following code to your `server.py` file:
```python
from rag_timescale_hybrid_search import chain as rag_timescale_hybrid_search_chain

add_routes(app, rag_timescale_hybrid_search_chain, path="/rag-timescale-hybrid-search")
```

(Optional) Let's now configure LangSmith.
LangSmith will help us trace, monitor and debug LangChain applications.
LangSmith is currently in private beta, you can sign up [here](https://smith.langchain.com/).
If you don't have access, you can skip this section

```shell
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY=<your-api-key>
export LANGCHAIN_PROJECT=<your-project> # if not specified, defaults to "default"
```

## Using in your own applications
If you are inside this directory, then you can spin up a LangServe instance directly by:

This is a standard LangServe template. Instructions on how to use it with your LangServe applications are [here](https://github.com/langchain-ai/langchain/blob/master/templates/README.md).
```shell
langchain serve
```

This will start the FastAPI app with a server is running locally at
[http://localhost:8000](http://localhost:8000)

We can see all templates at [http://127.0.0.1:8000/docs](http://127.0.0.1:8000/docs)
We can access the playground at [http://127.0.0.1:8000/rag-timescale-hybrid-search/playground](http://127.0.0.1:8000/rag-timescale-hybrid-search/playground)

We can access the template from code with:

```python
from langserve.client import RemoteRunnable

runnable = RemoteRunnable("http://localhost:8000/rag-timescale-hybrid-search")
```

## Loading your own dataset

To load your own dataset you will have to modify the code in the `DATASET SPECIFIC CODE` section of `chain.py`.
This code defines the name of the collection, how to load the data, and the human-language description of both the
contents of the collection and all of the metadata. The human-language descriptions are used by the self-query retriever
to help the LLM convert the question into filters on the metadata when searching the data in Timescale-vector.
Loading