CocoIndex is an ETL framework to transform data for AI, with real-time incremental processing - keep index up to date with low latency on source update. It supports custom logic like LEGO, and makes it easy for users to plugin the modules that best suits their project.
In this example, we will walk you through how to build embedding index based on local files, using Google Document AI as parser.
🥥 🌴 We are constantly improving - more blogs and examples coming soon. Stay tuned 👀 and drop a star at Cocoindex on Github for latest updates!
- Install Postgres if you don't have one.
- Configure Project and Processs ID for Document AI API
- Official Google document AI API
- Sign in to Google Cloud Console, create or open a project, and enable Document AI API.
- Create a processor in Document AI.
- update '.env' with
GOOGLE_CLOUD_PROJECT_ID
andGOOGLE_CLOUD_PROCESSOR_ID
.
Install dependencies:
pip install -e .
Setup:
python main.py cocoindex setup
Update index:
python main.py cocoindex update
Run:
python main.py
CocoInsight is in Early Access now (Free) 😊 You found us! A quick 3 minute video tutorial about CocoInsight: Watch on YouTube.
Run CocoInsight to understand your RAG data pipeline:
python main.py cocoindex server -c https://cocoindex.io
Then open the CocoInsight UI at https://cocoindex.io/cocoinsight.