The goal of this session is to ingest our knowledge base into a Cloud SQL database. The data are the Gen AI Dauphine Tunis Google Slides.
We will skip the following steps:
- Creating a Cloud SQL database
- Inserting data into the Cloud SQL database
- Creating a PG vector extension in the Cloud SQL database
This tutorial will help you complete this exercise.
- Open
tp_4/gcs_to_cloudsql.ipynb
and fill in the TODOs of I.
- Open
tp_4/gcs_to_cloudsql.ipynb
and fill in the TODOs of II.- Add the merged document to a table in the Cloud SQL database
Goal:
The created table should look like this:
Create a Python file to automate the process of ingesting the data into Cloud SQL.
-
Open the file
exercices/tp_4/ingest.py
and create the following functions:- List all the files in the bucket
- Download a file locally
- Load the content of the file with
unstructured
- Merge the content of the file by page
- Create a table if the table doesn't exist
- Get embeddings methods
- Ingest the data into the table
- Create a
store
instance
-
Open the file
exercices/tp_4/retrieve.py
and create the following functions:- Perform a similarity search
To verify the correctness of your code, you can run the following commands:
python ingest.py
python retrieve.py
- Open
tp_4/api.py
and fill in the TODOs.
Hint: You can use the tests in ingest.py
and retrieve.py
to edit api.py
.
You need to:
- Create a Cloud SQL connection
- Perform a similarity search from a user query
- Edit the root
get_sources
API route function to return the relevant documents
- Open
tp_4/app.py
and fill in the TODOs.
- Launch the API:
uvicorn api:app --host 0.0.0.0 --port 8181
- Set the HOST in
app.py
- Launch the app:
streamlit run app.py
Here we just display the relevant documents from a user query. We don't ask the LLM to answer from these documents.
- Deploy the FastAPI app:
# May change depending on your platform
# Replace <my-docker-image-name> and <my-app-name> with your initials + _api
# Example: Florian Bastin -> <my-docker-image-name>fb_api
# Replace docker buildx build --platform linux/amd64 with docker build -t if it does not work
docker buildx build --platform linux/amd64 --push -t europe-west1-docker.pkg.dev/dauphine-437611/dauphine-ar/<my-docker-name>:latest -f Dockerfile_api .
# Be careful, the default port is 8080 for Cloud Run.
# If you encounter an error, edit the default Cloud Run port on the interface or via command line
gcloud run deploy <my-app-name> \
--image=<my-region>-docker.pkg.dev/<my-project-id>/<my-registry-name>/<my-docker-name>:latest \
--platform=managed \
--region=<my-region> \
--allow-unauthenticated \
--set-env-vars GOOGLE_API_KEY=[INSERT_GOOGLE_API_KEY],DB_PASSWORD=[INSERT_DB_PASSWORD] \
--port 8181
# Note that a SECRET KEY like this should be provided by GOOGLE SECRET MANAGER for more safety.
# For simplicity, we will use the env variable here.
-
Change the HOST in your Streamlit
app.py
to the URL of the FastAPI: Example:HOST = "https://fb-1021317796643.europe-west1.run.app/answer"
-
Deploy the Streamlit app:
# May change depending on your platform
# Replace <my-docker-image-name> and <my-app-name> with your initials + _streamlit
# Example: Florian Bastin -> <my-docker-image-name>fb_streamlit
# Replace docker buildx build --platform linux/amd64 with docker build -t if it does not work
docker buildx build --platform linux/amd64 --push -t europe-west1-docker.pkg.dev/dauphine-437611/dauphine-ar/<my-docker-name>:latest -f Dockerfile .
gcloud run deploy <initials>-streamlit \
--image=europe-west1-docker.pkg.dev/dauphine-437611/dauphine-ar/<initials>-streamlit:latest \
--platform=managed \
--region=europe-west1 \
--allow-unauthenticated \
--port 8080