GenAI-GCP/exercices/tp_4 at main · BastinFlorian/GenAI-GCP

History

Name		Name	Last commit message	Last commit date
parent directory ..
downloaded_files		downloaded_files
1_gcs_to_cloudsql.ipynb		1_gcs_to_cloudsql.ipynb
Dockerfile		Dockerfile
Dockerfile_api		Dockerfile_api
README.md		README.md
api.py		api.py
app.py		app.py
config.py		config.py
ingest.py		ingest.py
requirements.txt		requirements.txt
retrieve.py		retrieve.py

README.md

4. Ingest Your Data into a Cloud SQL Database

The goal of this session is to ingest our knowledge base into a Cloud SQL database. The data are the Gen AI Dauphine Tunis Google Slides.

We will skip the following steps:

Creating a Cloud SQL database
Inserting data into the Cloud SQL database
Creating a PG vector extension in the Cloud SQL database

This tutorial will help you complete this exercise.

I. Read and Download Files from Google Cloud Storage

Open tp_4/gcs_to_cloudsql.ipynb and fill in the TODOs of I.

II. Test Locally the Streamlit App and the API

Open tp_4/gcs_to_cloudsql.ipynb and fill in the TODOs of II.
- Add the merged document to a table in the Cloud SQL database

Goal:

The created table should look like this:

III. Create a Python File to Automate the Process

Create a Python file to automate the process of ingesting the data into Cloud SQL.

Open the file exercices/tp_4/ingest.py and create the following functions:
- List all the files in the bucket
- Download a file locally
- Load the content of the file with unstructured
- Merge the content of the file by page
- Create a table if the table doesn't exist
- Get embeddings methods
- Ingest the data into the table
- Create a store instance
Open the file exercices/tp_4/retrieve.py and create the following functions:
- Perform a similarity search

To verify the correctness of your code, you can run the following commands:

python ingest.py
python retrieve.py

IV. Edit the API to Perform a Similarity Search

Open tp_4/api.py and fill in the TODOs.

Hint: You can use the tests in ingest.py and retrieve.py to edit api.py.

You need to:

Create a Cloud SQL connection
Perform a similarity search from a user query
Edit the root get_sources API route function to return the relevant documents

V. Test the API

Open tp_4/app.py and fill in the TODOs.

VI. Test Locally

Launch the API: uvicorn api:app --host 0.0.0.0 --port 8181
Set the HOST in app.py
Launch the app: streamlit run app.py

Here we just display the relevant documents from a user query. We don't ask the LLM to answer from these documents.

VII. Deploy the API

Deploy the FastAPI app:

# May change depending on your platform
# Replace <my-docker-image-name> and <my-app-name> with your initials + _api
# Example: Florian Bastin -> <my-docker-image-name>fb_api
# Replace docker buildx build --platform linux/amd64 with docker build -t if it does not work
docker buildx build --platform linux/amd64 --push -t europe-west1-docker.pkg.dev/dauphine-437611/dauphine-ar/<my-docker-name>:latest -f Dockerfile_api .

# Be careful, the default port is 8080 for Cloud Run.
# If you encounter an error, edit the default Cloud Run port on the interface or via command line
gcloud run deploy <my-app-name> \
    --image=<my-region>-docker.pkg.dev/<my-project-id>/<my-registry-name>/<my-docker-name>:latest \
    --platform=managed \
    --region=<my-region> \
    --allow-unauthenticated \
    --set-env-vars GOOGLE_API_KEY=[INSERT_GOOGLE_API_KEY],DB_PASSWORD=[INSERT_DB_PASSWORD] \
    --port 8181

# Note that a SECRET KEY like this should be provided by GOOGLE SECRET MANAGER for more safety.
# For simplicity, we will use the env variable here.

Change the HOST in your Streamlit app.py to the URL of the FastAPI: Example: HOST = "https://fb-1021317796643.europe-west1.run.app/answer"
Deploy the Streamlit app:

# May change depending on your platform
# Replace <my-docker-image-name> and <my-app-name> with your initials + _streamlit
# Example: Florian Bastin -> <my-docker-image-name>fb_streamlit
# Replace docker buildx build --platform linux/amd64 with docker build -t if it does not work
docker buildx build --platform linux/amd64 --push -t europe-west1-docker.pkg.dev/dauphine-437611/dauphine-ar/<my-docker-name>:latest -f Dockerfile .

gcloud run deploy <initials>-streamlit \
    --image=europe-west1-docker.pkg.dev/dauphine-437611/dauphine-ar/<initials>-streamlit:latest \
    --platform=managed \
    --region=europe-west1 \
    --allow-unauthenticated \
    --port 8080

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tp_4

tp_4

README.md

4. Ingest Your Data into a Cloud SQL Database

I. Read and Download Files from Google Cloud Storage

II. Test Locally the Streamlit App and the API

III. Create a Python File to Automate the Process

IV. Edit the API to Perform a Similarity Search

V. Test the API

VI. Test Locally

VII. Deploy the API

Files

tp_4

Directory actions

More options

Directory actions

More options

Latest commit

History

tp_4

Folders and files

parent directory

README.md

4. Ingest Your Data into a Cloud SQL Database

I. Read and Download Files from Google Cloud Storage

II. Test Locally the Streamlit App and the API

III. Create a Python File to Automate the Process

IV. Edit the API to Perform a Similarity Search

V. Test the API

VI. Test Locally

VII. Deploy the API