Skip to content

Commit

Permalink
[Docs] HCD and DSE with RAGStack (#582)
Browse files Browse the repository at this point in the history
* initial-content

* add-langchain-hub

* dse-69-example

* typo
  • Loading branch information
mendonk authored Jul 11, 2024
1 parent b1d2468 commit 4b940f0
Show file tree
Hide file tree
Showing 3 changed files with 149 additions and 0 deletions.
36 changes: 36 additions & 0 deletions docs/modules/examples/pages/dse-69.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
= RAGStack and DataStax Enterprise (DSE) 6.9 example

. Pull the latest dse-server Docker image and confirm the container is in a running state.
+
[source,bash]
----
docker pull datastax/dse-server:6.9.0-rc.2
docker run -e DS_LICENSE=accept -p 9042:9042 -d datastax/dse-server:6.9.0-rc.2
----
+
. Install dependencies.
+
[source,bash]
----
pip install ragstack-ai-langchain python-dotenv langchainhub
----
+
. Create a `.env` file in the root directory of the project and add the following environment variables.
+
[source,bash]
----
OPENAI_API_KEY="sk-..."
----
+
. Create a Python script to embed and generate the results of a query.
+
include::examples:partial$hcd-quickstart.adoc[]
+
You should see output like this:
+
[source,plain]
----
Task decomposition involves breaking down a complex task into smaller and simpler steps to make it more manageable. Techniques like Chain of Thought and Tree of Thoughts help models decompose hard tasks and enhance performance by thinking step by step. This process allows for a better interpretation of the model's thinking process and can involve various methods such as simple prompting, task-specific instructions, or human inputs.
----


44 changes: 44 additions & 0 deletions docs/modules/examples/pages/hcd.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
= RAGStack and Hyper Converged Database (HCD) example

. Clone the HCD example repository.
+
[source,bash]
----
git clone git@github.com:datastax/astra-db-java.git
cd astra-db-java
----
+
. Build the Docker image and confirm the containers are in a running state.
+
[source,bash]
----
docker compose up -d
docker compose ps
----
+
. Install dependencies.
+
[source,bash]
----
pip install ragstack-ai-langchain python-dotenv langchainhub
----
+
. Create a `.env` file in the root directory of the project and add the following environment variables.
+
[source,bash]
----
OPENAI_API_KEY="sk-..."
----
+
. Create a Python script to embed and generate the results.
+
include::examples:partial$hcd-quickstart.adoc[]
+
You should see output like this:
+
[source,plain]
----
Task decomposition involves breaking down a complex task into smaller and simpler steps to make it more manageable. Techniques like Chain of Thought and Tree of Thoughts help models decompose hard tasks and enhance performance by thinking step by step. This process allows for a better interpretation of the model's thinking process and can involve various methods such as simple prompting, task-specific instructions, or human inputs.
----


69 changes: 69 additions & 0 deletions docs/modules/examples/partials/hcd-quickstart.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
.Python
[%collapsible%open]
====
[source,python]
----
import os
from dotenv import load_dotenv
import bs4
from langchain import hub
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_text_splitters import RecursiveCharacterTextSplitter
import cassio
from cassio.table import MetadataVectorCassandraTable
from langchain_community.vectorstores import Cassandra
# Load environment variables
load_dotenv()
openai_api_key = os.getenv("OPENAI_API_KEY")
# Initialize Cassandra
cassio.init(contact_points=['localhost'], username='cassandra', password='cassandra')
cassio.config.resolve_session().execute(
"create keyspace if not exists my_vector_keyspace with replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};"
)
# Create metadata Vector Cassandra Table
mvct = MetadataVectorCassandraTable(table='my_vector_table', vector_dimension=1536, keyspace='my_vector_keyspace')
# Web loader configuration
loader = WebBaseLoader(
web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
bs_kwargs=dict(
parse_only=bs4.SoupStrainer(
class_=("post-content", "post-title", "post-header")
)
),
)
docs = loader.load()
# Document splitting
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
# Vector store setup
vectorstore = Cassandra.from_documents(documents=splits, embedding=OpenAIEmbeddings(), table_name='my_vector_table', keyspace='my_vector_keyspace', vector_dimension=1024)
retriever = vectorstore.as_retriever()
# Language model setup
llm = ChatOpenAI()
# Chain components
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| hub.pull("rlm/rag-prompt")
| llm
| StrOutputParser()
)
# Invocation
result = rag_chain.invoke("What is Task Decomposition?")
print(result)
----
====

0 comments on commit 4b940f0

Please sign in to comment.