Argus Panoptes, in ancient Greek mythology, was a giant with a hundred eyes and a servant of the goddess Hera. His many eyes made him an excellent watchman, as some of his eyes would always remain open while the others slept, allowing him to be ever-vigilant.
Classic OCR (Object Character Recognition) models lack reasoning ability based on context when extracting information from documents. In this project we demonstrate how to use a hybrid approach with OCR and LLM (multimodal Large Language Model) to get better results without any pre-training.
This solution uses Azure Document Intelligence combined with GPT4-Vision. Each of the tools have their strong points and the hybrid approach is better than any of them alone.
Notes:
- The Azure OpenAI model needs to be vision capable i.e. GPT-4T-0125, 0409 or Omni
- Frontend: A Streamlit Python web-app for user interaction. UNDER CONSTRUCTION
- Backend: An Azure Function for core logic, Cosmos DB for auditing, logging, and storing output schemas, Azure Document Intelligence, GPT-4 Vision and a Logic App for integrating with Outlook Inbox.
- Demo: Sample documents, system prompts, and output schemas.
Before deploying the solution, you need to create an OpenAI resource and deploy a model that is vision capable.
-
Create an OpenAI Resource:
- Follow the instructions here to create an OpenAI resource in Azure.
-
Deploy a Vision-Capable Model:
- Ensure the deployed model supports vision, such as GPT-4T-0125, GPT-4T-0409 or GPT-4-Omni.
Click the button to directly deploy to Azure:
Deploy to Azure
offers a one click deployment without cloning the code. Alternatively, follow the instructions below.
-
Prerequisites:
- Install Azure Developer CLI.
- Ensure you have access to an Azure subscription.
- Create an OpenAI resource and deploy a vision-capable model.
-
Deployment Steps:
- Run the following command to deploy all resources:
azd up
- Run the following command to deploy all resources:
- Bicep Template Deployment:
- Use the provided
main.bicep
file to deploy resources manually:az deployment group create --resource-group <your-resource-group> --template-file main.bicep
- Use the provided
Note: After deployment wait for about 10 minutes for the docker images to be pulled. You can check the progress in your Functionapp > Deployment Center > Logs.
To run the Streamlit app app.py
located in the frontend
folder, follow these steps:
-
Install the required dependencies by running the following command in your terminal:
pip install -r frontend/requirements.txt
-
Rename the
.env.temp
file to.env
:mv frontend/.env.temp frontend/.env
-
Populate the
.env
file with the necessary environment variables. Open the.env
file in a text editor and provide the required values for each variable. -
Assign a CosmsosDB and Blob Storage Role to your Principal ID:
Get the
principal ID
of the currently signed-in user:az ad signed-in-user show --query id -o tsv
Then, create Cosmos and Blob
role assignments
:az cosmosdb sql role assignment create \ --principal-id "<principal-id>" \ --resource-group "<resource-group-name>" \ --account-name "<cosmos-account-name>" \ --role-definition-name "Cosmos DB Built-in Data Contributor" \ --scope "/subscriptions/<subscription-id>/resourceGroups/<resource-group-name>/providers/Microsoft.DocumentDB/databaseAccounts/<cosmos-account-name>"
az role assignment create \ --assignee "<principal-id>" \ --role "Storage Blob Data Contributor" \ --scope "/subscriptions/<subscription-id>/resourceGroups/<resource-group-name>/providers/Microsoft.Storage/storageAccounts/<storage-account-name>"
-
Start the Streamlit app by running the following command in your terminal:
streamlit run frontend/app.py
You can connect a Outlook inbox to send incoming attachments directly to the blob storage to trigger the extraction process. For that a Logic App was already built for you. The only thing you need to do is to open the resource "LogicAppName" add a trigger and connect it to your Outlook inbox. Open this Microsoft Learn page and search for "Add a trigger to check incoming email" follow the described steps then activate it with the "Run" button.
-
Upload PDF Files:
- Navigate to the
sa-uniqueID
storage account and thedatasets
container - Create a new folder called
default-dataset
and upload your PDF files.
- Navigate to the
-
View Results:
- Processed results will be available in your Cosmos DB database under the
doc-extracts
collection and thedocuments
container.
- Processed results will be available in your Cosmos DB database under the
The input to the model consists of two main components: a model prompt
and a JSON template
with the schema of the data to be extracted.
The prompt is a textual instruction explaining what the model should do, including the type of data to extract and how to extract it. Here are a couple of example prompts:
-
Default Prompt: Extract all data from the document.
-
Example Prompt: Extract all financial data, including transaction amounts, dates, and descriptions from the document. For date extraction use american formatting.
The JSON template defines the schema of the data to be extracted. This can be an empty JSON object {}
if the model is supposed to create its own schema. Alternatively, it can be more specific to guide the model on what data to extract or for further processing in a structured database. Here are some examples:
- Empty JSON Template (default):
{}
- Specific JSON Template Example:
{
"transactionDate": "",
"transactionAmount": "",
"transactionDescription": ""
}
By providing a prompt and a JSON template, users can control the behavior of the model to extract specific data from their documents in a structured manner.
- JSON Schemas created using JSON Schema Builder.
This README file provides an overview and quickstart guide for deploying and using Project ARGUS. For detailed instructions, consult the documentation and code comments in the respective files.