Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: PostgreSQL implementation for Chat History and Vectorization #1512

Merged
merged 164 commits into from
Dec 17, 2024

Conversation

gpickett
Copy link
Contributor

@gpickett gpickett commented Nov 22, 2024

Purpose

  • This pull request introduces PostgreSQL as a core feature for CWYD, allowing the platform to utilize PostgreSQL for chat history, search indexing, and vector-based similarity queries. It replaces the default database configuration with PostgreSQL and ensures all related infrastructure and backend components are updated accordingly.

Does this introduce a breaking change?

  • Yes
  • No

How to Test

  • Get the code

  • Test the code

  1. Set up PostgreSQL as the default database:
    • Configure the necessary environment variables:
      export DATABASE_TYPE=PostgreSQL
      export POSTGRES_URI=your_postgres_uri
      export POSTGRES_USER=your_postgres_user
      export POSTGRES_PASSWORD=your_postgres_password
      
    • Ensure pgvector extension is installed in your PostgreSQL database:
      CREATE EXTENSION vector;
      
  2. Run the infrastructure deployment script to create the necessary tables
  3. Test chat history and search functionalities by interacting with the CWYD platform.

What to Check

Verify that the following are valid:

  • Chat history is stored and retrieved successfully from PostgreSQL.
  • Search index creation and vector similarity queries work as expected using PostgreSQL.
  • Advanced image processing and integrated vectorization features are disabled when DATABASE_TYPE=PostgreSQL.
  • Switching back to CosmosDB (DATABASE_TYPE=CosmosDB) retains existing functionality.

Other Information

  • Ensure that PostgreSQL credentials are securely managed in the environment or using Azure Key Vault.
  • For any issues during testing or deployment, check logs for connection and query errors.
  • This change replaces CosmosDB as the default database, which may affect deployments relying on previous configurations.

Fr4nc3 and others added 30 commits November 7, 2024 17:40
…zer Configurations in JSON Environment Variables
refactor: Group Azure Blob Storage and Form Recognizer Configurations in JSON Environment Variables
Configure Environment Variables for Database Connection PostgresSQL
fix: Readme update - Refactor: Group Azure Blob Storage and Form Recognizer Configurations in JSON Environment Variables
feat: Load Respective Database Class Dynamically - either CosmosDB or PostgreSQL
fix: close connection and error handling
@Fr4nc3 Fr4nc3 marked this pull request as ready for review December 10, 2024 16:48
infra/core/database/postgresdb.bicep Dismissed Show dismissed Hide dismissed
Copy link

github-actions bot commented Dec 10, 2024

Coverage

Coverage Report •
FileStmtsMissCoverMissing
code/backend/api
   chat_history.py2474382%26–28, 40, 46–48, 65, 116, 140–141, 144, 173, 211–212, 215, 252, 301, 310, 317, 319, 324, 328, 330, 333, 342–343, 346, 350–352, 370, 374, 379, 399, 408, 414, 428, 467, 505, 507–508, 510
code/backend/batch/utilities/chat_history
   cosmosdb.py1119316%18–24, 27–29, 31, 33–34, 37–38, 40–41, 44–45, 48, 51, 54, 59–63, 68–71, 73, 76, 86–88, 90, 93–95, 97, 100, 103–104, 107, 109, 113–117, 120–121, 124–127, 129–130, 133, 135, 138, 142–144, 147, 150–151, 153, 156, 167–168, 170–171, 173–178, 180, 183, 186–189, 191, 194, 198–200, 203, 205
   database_client_base.py391269%9, 14, 19, 26, 31, 36, 43, 50, 57, 68, 75, 82
   database_factory.py23195%59
   postgresdbservice.py856523%16–20, 23–25, 28, 36–38, 41–42, 45–47, 50–52, 57, 60, 63, 71, 81, 84–86, 89–91, 94–97, 99, 105–108, 110–112, 115–116, 119–121, 124–127, 132–133, 144–147, 149, 152–154, 157–159
code/backend/batch/utilities/helpers
   azure_postgres_helper.py1301390%83–86, 103, 109–110, 114–118, 120
   env_helper.py1841691%118–119, 122–125, 127–128, 130, 334–336, 358, 363–365
code/backend/batch/utilities/helpers/config
   config_helper.py176298%87, 90
   database_type.py40100% 
code/backend/batch/utilities/helpers/embedders
   embedder_factory.py14192%15
   postgres_embedder.py48197%74
code/backend/batch/utilities/orchestrator
   orchestrator_base.py50198%33
code/backend/batch/utilities/parser
   output_parser_tool.py390100% 
code/backend/batch/utilities/search
   postgres_search_handler.py66592%48, 51, 65, 92, 97
   search.py18194%17
code/backend/pages
   02_Explore_Data.py33330%1–8, 10–11, 13, 21, 29, 32–34, 38, 41–42, 45–49, 51, 55–56, 58–61, 64–65
   03_Delete_Data.py46460%1–9, 11–13, 15, 23–25, 29, 33, 41, 43, 45–46, 48–50, 54–55, 57, 59–61, 65, 69–73, 75, 78–79, 83–88
   04_Configuration.py1781780%1–11, 13–14, 16, 24–26, 30, 32, 37–46, 49–50, 53–62, 65–74, 77–78, 80–82, 86–87, 99–103, 106–107, 111–112, 115, 119–120, 124–125, 130–131, 135–137, 140–141, 144–145, 148–149, 172, 174–175, 177–181, 183–186, 189–194, 206–209, 223–224, 234–238, 258–263, 270, 272, 277, 285, 293, 300–301, 308, 310–311, 315, 323, 329, 336–337, 357–361, 367–368, 389–390, 392–393, 398, 403, 409–412, 418–419, 422, 441, 482–483, 487, 490–492, 494, 498–499, 502–505, 507–508, 510–512, 514–517, 519–520
scripts/data_scripts
   create_postgres_tables.py56560%1–5, 7–13, 16, 27–28, 35–36, 42, 45, 48, 57–58, 61, 64–65, 68–69, 71, 80–81, 84–85, 87, 98–99, 103–104, 106–107, 109, 122–123, 126–127, 129–130, 132–133, 135–136, 138–141, 143–144
TOTAL364494574% 

Tests Skipped Failures Errors Time
370 0 💤 0 ❌ 0 🔥 1m 0s ⏱️

Copy link
Contributor

@Prajwal-Microsoft Prajwal-Microsoft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes approved for PostgreSQL implementation

@Prajwal-Microsoft Prajwal-Microsoft added this pull request to the merge queue Dec 17, 2024
Merged via the queue into Azure-Samples:main with commit 7fb0636 Dec 17, 2024
8 checks passed
Copy link

🎉 This PR is included in version 1.13.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants