Skip to content

cocoindex-io/cocoindex-etl-with-document-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🥥 CocoIndex ETL with Document AI

CocoIndex is an ETL framework to transform data for AI, with real-time incremental processing - keep index up to date with low latency on source update. It supports custom logic like LEGO, and makes it easy for users to plugin the modules that best suits their project.

In this example, we will walk you through how to build embedding index based on local files, using Google Document AI as parser.

🥥 🌴 We are constantly improving - more blogs and examples coming soon. Stay tuned 👀 and drop a star at Cocoindex on Github for latest updates! GitHub

Use Document AI to parse PDF files in CocoIndex

Prerequisite

Run

Install dependencies:

pip install -e .

Setup:

python main.py cocoindex setup

Update index:

python main.py cocoindex update

Run:

python main.py

CocoInsight

CocoInsight is in Early Access now (Free) 😊 You found us! A quick 3 minute video tutorial about CocoInsight: Watch on YouTube.

Run CocoInsight to understand your RAG data pipeline:

python main.py cocoindex server -c https://cocoindex.io

Then open the CocoInsight UI at https://cocoindex.io/cocoinsight.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages