diff --git a/.github/workflows/build_package.yml b/.github/workflows/build_package.yml
index acdcaec..6556395 100644
--- a/.github/workflows/build_package.yml
+++ b/.github/workflows/build_package.yml
@@ -45,4 +45,4 @@ jobs:
       - name: Test import
         shell: bash
         working-directory: ${{ vars.RUNNER_TEMP }}
-        run: python -c "import llama_parse"
+        run: python -c "import llama_cloud_services"
diff --git a/.github/workflows/publish_release.yml b/.github/workflows/publish_release.yml
index a7c5bf5..9c5eab3 100644
--- a/.github/workflows/publish_release.yml
+++ b/.github/workflows/publish_release.yml
@@ -23,16 +23,31 @@ jobs:
         uses: actions/setup-python@v4
         with:
           python-version: ${{ env.PYTHON_VERSION }}
+
       - name: Install Poetry
         uses: snok/install-poetry@v1
         with:
           version: ${{ env.POETRY_VERSION }}
+
       - name: Install deps
         shell: bash
         run: pip install -e .
-      - name: Build and publish to pypi
+
+      - name: Build and publish llama-cloud-services
         uses: JRubics/poetry-publish@v2.1
         with:
+          poetry_version: ${{ env.POETRY_VERSION }}
+          python_version: ${{ env.PYTHON_VERSION }}
+          working_directory: "llama_cloud_services"
+          pypi_token: ${{ secrets.LLAMA_PARSE_PYPI_TOKEN }}
+          poetry_install_options: "--without dev"
+
+      - name: Build and publish llama-parse
+        uses: JRubics/poetry-publish@v2.1
+        with:
+          poetry_version: ${{ env.POETRY_VERSION }}
+          python_version: ${{ env.PYTHON_VERSION }}
+          working_directory: "llama_parse"
           pypi_token: ${{ secrets.LLAMA_PARSE_PYPI_TOKEN }}
           poetry_install_options: "--without dev"
 
@@ -52,6 +67,7 @@ jobs:
           export PKG=$(ls dist/ | grep tar)
           set -- $PKG
           echo "name=$1" >> $GITHUB_ENV
+
       - name: Upload Release Asset (sdist) to GitHub
         id: upload-release-asset
         uses: actions/upload-release-asset@v1
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index 226c646..6ee4cc5 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -33,6 +33,7 @@ repos:
     rev: v1.0.1
     hooks:
       - id: mypy
+        exclude: ^tests/
         additional_dependencies:
           [
             "types-requests",
@@ -46,7 +47,7 @@ repos:
           [
             --disallow-untyped-defs,
             --ignore-missing-imports,
-            --python-version=3.8,
+            --python-version=3.10,
           ]
   - repo: https://github.com/adamchainz/blacken-docs
     rev: 1.16.0
diff --git a/examples/extract/data/resumes/ai_researcher.pdf b/examples/extract/data/resumes/ai_researcher.pdf
new file mode 100644
index 0000000..ff7d02d
Binary files /dev/null and b/examples/extract/data/resumes/ai_researcher.pdf differ
diff --git a/examples/extract/data/resumes/ml_engineer.pdf b/examples/extract/data/resumes/ml_engineer.pdf
new file mode 100644
index 0000000..43586da
Binary files /dev/null and b/examples/extract/data/resumes/ml_engineer.pdf differ
diff --git a/examples/extract/data/resumes/software_architect.pdf b/examples/extract/data/resumes/software_architect.pdf
new file mode 100644
index 0000000..95672ad
Binary files /dev/null and b/examples/extract/data/resumes/software_architect.pdf differ
diff --git a/examples/extract/resume_screening.ipynb b/examples/extract/resume_screening.ipynb
new file mode 100644
index 0000000..479734e
--- /dev/null
+++ b/examples/extract/resume_screening.ipynb
@@ -0,0 +1,882 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Extracting data from resumes\n",
+    "\n",
+    "Let us assume that we are running a hiring process for a company and we have received a list of resumes from candidates. We want to extract structured data from the resumes so that we can run a screening process and shortlist candidates. \n",
+    "\n",
+    "Take a look at one of the resumes in the `data/resumes` directory. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "\n",
+       "        <iframe\n",
+       "            width=\"600\"\n",
+       "            height=\"400\"\n",
+       "            src=\"./data/resumes/ai_researcher.pdf\"\n",
+       "            frameborder=\"0\"\n",
+       "            allowfullscreen\n",
+       "            \n",
+       "        ></iframe>\n",
+       "        "
+      ],
+      "text/plain": [
+       "<IPython.lib.display.IFrame at 0x103a7e950>"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from IPython.display import IFrame\n",
+    "\n",
+    "IFrame(src=\"./data/resumes/ai_researcher.pdf\", width=600, height=400)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You will notice that all the resumes have different layouts but contain common information like name, email, experience, education, etc. \n",
+    "\n",
+    "With LlamaExtract, we will show you how to:\n",
+    "- *Define* a data schema to extract the information of interest. \n",
+    "- *Iterate* over the data schema to generalize the schema for multiple resumes.\n",
+    "- *Finalize* the schema and schedule extractions for multiple resumes.\n",
+    "\n",
+    "We will start by defining a `LlamaExtract` client which provides a Python interface to the LlamaExtract API. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from dotenv import load_dotenv\n",
+    "from llama_extract import LlamaExtract\n",
+    "\n",
+    "\n",
+    "# Load environment variables (put LLAMA_CLOUD_API_KEY in your .env file)\n",
+    "load_dotenv(override=True)\n",
+    "\n",
+    "# Optionally, add your project id/organization id\n",
+    "llama_extract = LlamaExtract()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Defining the data schema\n",
+    "\n",
+    "Next, let us try to extract two fields from the resume: `name` and `email`. We can either use a Python dictionary structure to define the `data_schema` as a JSON or use a Pydantic model instead, for brevity and convenience. In either case, our output is guaranteed to validate against this schema."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from pydantic import BaseModel, Field\n",
+    "\n",
+    "\n",
+    "class Resume(BaseModel):\n",
+    "    name: str = Field(description=\"The name of the candidate\")\n",
+    "    email: str = Field(description=\"The email address of the candidate\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from llama_cloud.core.api_error import ApiError\n",
+    "\n",
+    "try:\n",
+    "    existing_agent = llama_extract.get_agent(name=\"resume-screening\")\n",
+    "    if existing_agent:\n",
+    "        llama_extract.delete_agent(existing_agent.id)\n",
+    "except ApiError as e:\n",
+    "    if e.status_code == 404:\n",
+    "        pass\n",
+    "    else:\n",
+    "        raise\n",
+    "\n",
+    "agent = llama_extract.create_agent(name=\"resume-screening\", data_schema=Resume)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[ExtractionAgent(id=ad801427-d06b-499d-bbe0-6109c5f0646b, name=resume-screening)]"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "llama_extract.list_agents()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Uploading files: 100%|██████████| 1/1 [00:00<00:00,  1.19it/s]\n",
+      "Creating extraction jobs: 100%|██████████| 1/1 [00:01<00:00,  1.30s/it]\n",
+      "Extracting files: 100%|██████████| 1/1 [00:03<00:00,  3.18s/it]\n",
+      "Uploading files: 100%|██████████| 1/1 [00:00<00:00,  1.23it/s]\n",
+      "Creating extraction jobs: 100%|██████████| 1/1 [00:03<00:00,  3.09s/it]\n",
+      "Extracting files: 100%|██████████| 1/1 [00:11<00:00, 11.11s/it]\n",
+      "Uploading files: 100%|██████████| 1/1 [00:00<00:00,  1.16it/s]\n",
+      "Creating extraction jobs: 100%|██████████| 1/1 [00:03<00:00,  3.10s/it]\n",
+      "Extracting files: 100%|██████████| 1/1 [00:09<00:00,  9.87s/it]\n",
+      "Uploading files: 100%|██████████| 1/1 [00:00<00:00,  1.12it/s]\n",
+      "Creating extraction jobs: 100%|██████████| 1/1 [00:05<00:00,  5.92s/it]\n",
+      "Extracting files: 100%|██████████| 1/1 [00:12<00:00, 12.05s/it]\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "{'name': 'Dr. Rachel Zhang', 'email': 'rachel.zhang@email.com'}"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "resume = agent.extract(\"./data/resumes/ai_researcher.pdf\")\n",
+    "resume.data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Iterating over the data schema\n",
+    "\n",
+    "Now that we have created a data schema, let us add more fields to the schema. We will add `experience` and `education` fields to the schema. \n",
+    "- We can create a new Pydantic model for each of these fields and represent `experience` and `education` as lists of these models. Doing this will allow us to extract multiple entities from the resume without having to pre-define how many experiences or education the candidate has. \n",
+    "- We have added a `description` parameter to provide more context for extraction. We can use `description` to provide example inputs/outputs for the extraction. \n",
+    "- Note that we have annotated the `start_date` and `end_date` fields with `Optional[str]` to indicate that these fields are optional. This is *important* because the schema will be used to extract data from multiple resumes and not all resumes will have the same format. A field must only be required if it is guaranteed to be present in all the resumes. \n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from typing import List, Optional\n",
+    "\n",
+    "\n",
+    "class Education(BaseModel):\n",
+    "    institution: str = Field(description=\"The institution of the candidate\")\n",
+    "    degree: str = Field(description=\"The degree of the candidate\")\n",
+    "    start_date: Optional[str] = Field(\n",
+    "        default=None, description=\"The start date of the candidate's education\"\n",
+    "    )\n",
+    "    end_date: Optional[str] = Field(\n",
+    "        default=None, description=\"The end date of the candidate's education\"\n",
+    "    )\n",
+    "\n",
+    "\n",
+    "class Experience(BaseModel):\n",
+    "    company: str = Field(description=\"The name of the company\")\n",
+    "    title: str = Field(description=\"The title of the candidate\")\n",
+    "    description: Optional[str] = Field(\n",
+    "        default=None, description=\"The description of the candidate's experience\"\n",
+    "    )\n",
+    "    start_date: Optional[str] = Field(\n",
+    "        default=None, description=\"The start date of the candidate's experience\"\n",
+    "    )\n",
+    "    end_date: Optional[str] = Field(\n",
+    "        default=None, description=\"The end date of the candidate's experience\"\n",
+    "    )\n",
+    "\n",
+    "\n",
+    "class Resume(BaseModel):\n",
+    "    name: str = Field(description=\"The name of the candidate\")\n",
+    "    email: str = Field(description=\"The email address of the candidate\")\n",
+    "    links: List[str] = Field(\n",
+    "        description=\"The links to the candidate's social media profiles\"\n",
+    "    )\n",
+    "    experience: List[Experience] = Field(description=\"The candidate's experience\")\n",
+    "    education: List[Education] = Field(description=\"The candidate's education\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next, we will update the `data_schema` for the `resume-screening` agent to use the new `Resume` model. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'name': 'Dr. Rachel Zhang',\n",
+       " 'email': 'rachel.zhang@email.com',\n",
+       " 'links': ['linkedin.com/in/rachelzhang',\n",
+       "  'github.com/rzhang-ai',\n",
+       "  'scholar.google.com/rachelzhang'],\n",
+       " 'experience': [{'company': 'DeepMind',\n",
+       "   'title': 'Senior Research Scientist',\n",
+       "   'description': '- Lead researcher on large-scale multi-task learning systems, developing novel architectures that improve cross-task generalization by 40%\\n- Pioneered new approach to zero-shot learning using contrastive training, published in NeurIPS 2023\\n- Built and led team of 6 researchers working on foundational ML models\\n- Developed novel regularization techniques for large language models, reducing catastrophic forgetting by 35%',\n",
+       "   'start_date': '2019',\n",
+       "   'end_date': 'Present'},\n",
+       "  {'company': 'Google Research',\n",
+       "   'title': 'Research Scientist',\n",
+       "   'description': '- Developed probabilistic frameworks for robust ML, published in ICML 2018\\n- Created novel attention mechanisms for computer vision models, improving accuracy by 25%\\n- Led collaboration with Google Brain team on efficient training methods for transformer models\\n- Mentored 4 PhD interns and collaborated with academic institutions',\n",
+       "   'start_date': '2015',\n",
+       "   'end_date': '2019'},\n",
+       "  {'company': 'Columbia University',\n",
+       "   'title': 'Research Assistant Professor',\n",
+       "   'description': '- Published seminal work on Bayesian optimization methods (cited 1000+ times)\\n- Taught graduate-level courses in Machine Learning and Statistical Learning Theory\\n- Supervised 5 PhD students and 3 MSc students\\n- Secured $500K in research grants for probabilistic ML research',\n",
+       "   'start_date': '2011',\n",
+       "   'end_date': '2015'}],\n",
+       " 'education': [{'institution': 'Columbia University',\n",
+       "   'degree': 'Ph.D. in Computer Science',\n",
+       "   'start_date': '2007',\n",
+       "   'end_date': '2011'},\n",
+       "  {'institution': 'Stanford University',\n",
+       "   'degree': 'M.S. in Computer Science',\n",
+       "   'start_date': '2005',\n",
+       "   'end_date': '2007'}]}"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "agent.data_schema = Resume\n",
+    "resume = agent.extract(\"./data/resumes/ai_researcher.pdf\")\n",
+    "resume.data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This is a good start. Let us add a few more fields to the schema and re-run the extraction. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class TechnicalSkills(BaseModel):\n",
+    "    programming_languages: List[str] = Field(\n",
+    "        description=\"The programming languages the candidate is proficient in.\"\n",
+    "    )\n",
+    "    frameworks: List[str] = Field(\n",
+    "        description=\"The tools/frameworks the candidate is proficient in, e.g. React, Django, PyTorch, etc.\"\n",
+    "    )\n",
+    "    skills: List[str] = Field(\n",
+    "        description=\"Other general skills the candidate is proficient in, e.g. Data Engineering, Machine Learning, etc.\"\n",
+    "    )\n",
+    "\n",
+    "\n",
+    "class Resume(BaseModel):\n",
+    "    name: str = Field(description=\"The name of the candidate\")\n",
+    "    email: str = Field(description=\"The email address of the candidate\")\n",
+    "    links: List[str] = Field(\n",
+    "        description=\"The links to the candidate's social media profiles\"\n",
+    "    )\n",
+    "    experience: List[Experience] = Field(description=\"The candidate's experience\")\n",
+    "    education: List[Education] = Field(description=\"The candidate's education\")\n",
+    "    technical_skills: TechnicalSkills = Field(\n",
+    "        description=\"The candidate's technical skills\"\n",
+    "    )\n",
+    "    key_accomplishments: str = Field(\n",
+    "        description=\"Summarize the candidates highest achievements.\"\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'name': 'Dr. Rachel Zhang',\n",
+       " 'email': 'rachel.zhang@email.com',\n",
+       " 'links': ['linkedin.com/in/rachelzhang',\n",
+       "  'github.com/rzhang-ai',\n",
+       "  'scholar.google.com/rachelzhang'],\n",
+       " 'experience': [{'company': 'DeepMind',\n",
+       "   'title': 'Senior Research Scientist',\n",
+       "   'description': '- Lead researcher on large-scale multi-task learning systems, developing novel architectures that improve cross-task generalization by 40%\\n- Pioneered new approach to zero-shot learning using contrastive training, published in NeurIPS 2023\\n- Built and led team of 6 researchers working on foundational ML models\\n- Developed novel regularization techniques for large language models, reducing catastrophic forgetting by 35%',\n",
+       "   'start_date': '2019',\n",
+       "   'end_date': 'Present'},\n",
+       "  {'company': 'Google Research',\n",
+       "   'title': 'Research Scientist',\n",
+       "   'description': '- Developed probabilistic frameworks for robust ML, published in ICML 2018\\n- Created novel attention mechanisms for computer vision models, improving accuracy by 25%\\n- Led collaboration with Google Brain team on efficient training methods for transformer models\\n- Mentored 4 PhD interns and collaborated with academic institutions',\n",
+       "   'start_date': '2015',\n",
+       "   'end_date': '2019'},\n",
+       "  {'company': 'Columbia University',\n",
+       "   'title': 'Research Assistant Professor',\n",
+       "   'description': '- Published seminal work on Bayesian optimization methods (cited 1000+ times)\\n- Taught graduate-level courses in Machine Learning and Statistical Learning Theory\\n- Supervised 5 PhD students and 3 MSc students\\n- Secured $500K in research grants for probabilistic ML research',\n",
+       "   'start_date': '2011',\n",
+       "   'end_date': '2015'}],\n",
+       " 'education': [{'institution': 'Columbia University',\n",
+       "   'degree': 'Ph.D. in Computer Science',\n",
+       "   'start_date': '2007',\n",
+       "   'end_date': '2011'},\n",
+       "  {'institution': 'Stanford University',\n",
+       "   'degree': 'M.S. in Computer Science',\n",
+       "   'start_date': '2005',\n",
+       "   'end_date': '2007'}],\n",
+       " 'technical_skills': {'programming_languages': ['Python',\n",
+       "   'C++',\n",
+       "   'Julia',\n",
+       "   'CUDA'],\n",
+       "  'frameworks': ['PyTorch', 'TensorFlow', 'JAX', 'Ray'],\n",
+       "  'skills': ['Deep Learning',\n",
+       "   'Reinforcement Learning',\n",
+       "   'Probabilistic Models',\n",
+       "   'Multi-Task Learning',\n",
+       "   'Zero-Shot Learning',\n",
+       "   'Neural Architecture Search']},\n",
+       " 'key_accomplishments': 'AI researcher with 12+ years of experience spanning classical machine learning, deep learning, and probabilistic modeling. Led groundbreaking research in reinforcement learning, generative models, and multi-task learning. Published 25+ papers in top-tier conferences (NeurIPS, ICML, ICLR). Strong track record of transitioning theoretical advances into practical applications in both academic and industrial settings.'}"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "agent.data_schema = Resume\n",
+    "resume = agent.extract(\"./data/resumes/ai_researcher.pdf\")\n",
+    "resume.data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Finalizing the schema\n",
+    "\n",
+    "This is great! We have extracted a lot of key information from the resume that is well-typed and can be used downstream for further processing. Until now, this data is ephemeral and will be lost if we close the session. Let us save the state of our extraction and use it to extract data from multiple resumes. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "agent.save()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'type': 'object',\n",
+       " '$defs': {'Education': {'type': 'object',\n",
+       "   'title': 'Education',\n",
+       "   'required': ['institution', 'degree', 'start_date', 'end_date'],\n",
+       "   'properties': {'degree': {'type': 'string',\n",
+       "     'title': 'Degree',\n",
+       "     'description': 'The degree of the candidate'},\n",
+       "    'end_date': {'anyOf': [{'type': 'string'}, {'type': 'null'}],\n",
+       "     'title': 'End Date',\n",
+       "     'description': \"The end date of the candidate's education\"},\n",
+       "    'start_date': {'anyOf': [{'type': 'string'}, {'type': 'null'}],\n",
+       "     'title': 'Start Date',\n",
+       "     'description': \"The start date of the candidate's education\"},\n",
+       "    'institution': {'type': 'string',\n",
+       "     'title': 'Institution',\n",
+       "     'description': 'The institution of the candidate'}},\n",
+       "   'additionalProperties': False},\n",
+       "  'Experience': {'type': 'object',\n",
+       "   'title': 'Experience',\n",
+       "   'required': ['company', 'title', 'description', 'start_date', 'end_date'],\n",
+       "   'properties': {'title': {'type': 'string',\n",
+       "     'title': 'Title',\n",
+       "     'description': 'The title of the candidate'},\n",
+       "    'company': {'type': 'string',\n",
+       "     'title': 'Company',\n",
+       "     'description': 'The name of the company'},\n",
+       "    'end_date': {'anyOf': [{'type': 'string'}, {'type': 'null'}],\n",
+       "     'title': 'End Date',\n",
+       "     'description': \"The end date of the candidate's experience\"},\n",
+       "    'start_date': {'anyOf': [{'type': 'string'}, {'type': 'null'}],\n",
+       "     'title': 'Start Date',\n",
+       "     'description': \"The start date of the candidate's experience\"},\n",
+       "    'description': {'anyOf': [{'type': 'string'}, {'type': 'null'}],\n",
+       "     'title': 'Description',\n",
+       "     'description': \"The description of the candidate's experience\"}},\n",
+       "   'additionalProperties': False},\n",
+       "  'TechnicalSkills': {'type': 'object',\n",
+       "   'title': 'TechnicalSkills',\n",
+       "   'required': ['programming_languages', 'frameworks', 'skills'],\n",
+       "   'properties': {'skills': {'type': 'array',\n",
+       "     'items': {'type': 'string'},\n",
+       "     'title': 'Skills',\n",
+       "     'description': 'Other general skills the candidate is proficient in, e.g. Data Engineering, Machine Learning, etc.'},\n",
+       "    'frameworks': {'type': 'array',\n",
+       "     'items': {'type': 'string'},\n",
+       "     'title': 'Frameworks',\n",
+       "     'description': 'The tools/frameworks the candidate is proficient in, e.g. React, Django, PyTorch, etc.'},\n",
+       "    'programming_languages': {'type': 'array',\n",
+       "     'items': {'type': 'string'},\n",
+       "     'title': 'Programming Languages',\n",
+       "     'description': 'The programming languages the candidate is proficient in.'}},\n",
+       "   'additionalProperties': False}},\n",
+       " 'title': 'Resume',\n",
+       " 'required': ['name',\n",
+       "  'email',\n",
+       "  'links',\n",
+       "  'experience',\n",
+       "  'education',\n",
+       "  'technical_skills',\n",
+       "  'key_accomplishments'],\n",
+       " 'properties': {'name': {'type': 'string',\n",
+       "   'title': 'Name',\n",
+       "   'description': 'The name of the candidate'},\n",
+       "  'email': {'type': 'string',\n",
+       "   'title': 'Email',\n",
+       "   'description': 'The email address of the candidate'},\n",
+       "  'links': {'type': 'array',\n",
+       "   'items': {'type': 'string'},\n",
+       "   'title': 'Links',\n",
+       "   'description': \"The links to the candidate's social media profiles\"},\n",
+       "  'education': {'type': 'array',\n",
+       "   'items': {'$ref': '#/$defs/Education'},\n",
+       "   'title': 'Education',\n",
+       "   'description': \"The candidate's education\"},\n",
+       "  'experience': {'type': 'array',\n",
+       "   'items': {'$ref': '#/$defs/Experience'},\n",
+       "   'title': 'Experience',\n",
+       "   'description': \"The candidate's experience\"},\n",
+       "  'technical_skills': {'type': 'object',\n",
+       "   'title': 'TechnicalSkills',\n",
+       "   'required': ['programming_languages', 'frameworks', 'skills'],\n",
+       "   'properties': {'skills': {'type': 'array',\n",
+       "     'items': {'type': 'string'},\n",
+       "     'title': 'Skills',\n",
+       "     'description': 'Other general skills the candidate is proficient in, e.g. Data Engineering, Machine Learning, etc.'},\n",
+       "    'frameworks': {'type': 'array',\n",
+       "     'items': {'type': 'string'},\n",
+       "     'title': 'Frameworks',\n",
+       "     'description': 'The tools/frameworks the candidate is proficient in, e.g. React, Django, PyTorch, etc.'},\n",
+       "    'programming_languages': {'type': 'array',\n",
+       "     'items': {'type': 'string'},\n",
+       "     'title': 'Programming Languages',\n",
+       "     'description': 'The programming languages the candidate is proficient in.'}},\n",
+       "   'description': \"The candidate's technical skills\",\n",
+       "   'additionalProperties': False},\n",
+       "  'key_accomplishments': {'type': 'string',\n",
+       "   'title': 'Key Accomplishments',\n",
+       "   'description': 'Summarize the candidates highest achievements.'}},\n",
+       " 'additionalProperties': False}"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "agent = llama_extract.get_agent(\"resume-screening\")\n",
+    "agent.data_schema  # Latest schema should be returned"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Queueing extractions"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "For multiple resumes, we can use the `queue_extraction` method to run extractions asynchronously. This is ideal for processing batch extraction jobs."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Uploading files: 100%|██████████| 3/3 [00:01<00:00,  2.29it/s]\n",
+      "Creating extraction jobs: 100%|██████████| 3/3 [00:04<00:00,  1.61s/it]\n"
+     ]
+    }
+   ],
+   "source": [
+    "import os\n",
+    "\n",
+    "# All resumes in the data/resumes directory\n",
+    "resumes = []\n",
+    "\n",
+    "with os.scandir(\"./data/resumes\") as entries:\n",
+    "    for entry in entries:\n",
+    "        if entry.is_file():\n",
+    "            resumes.append(entry.path)\n",
+    "\n",
+    "jobs = await agent.queue_extraction(resumes)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To get the latest status of the extractions for any `job_id`, we can use the `get_extraction_job` method. \n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[<StatusEnum.PENDING: 'PENDING'>,\n",
+       " <StatusEnum.PENDING: 'PENDING'>,\n",
+       " <StatusEnum.PENDING: 'PENDING'>]"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "[agent.get_extraction_job(job_id=job.id).status for job in jobs]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We notice that all extraction runs are in a PENDING state. We can check back again to see if the extractions have completed. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[<StatusEnum.SUCCESS: 'SUCCESS'>,\n",
+       " <StatusEnum.SUCCESS: 'SUCCESS'>,\n",
+       " <StatusEnum.SUCCESS: 'SUCCESS'>]"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "[agent.get_extraction_job(job_id=job.id).status for job in jobs]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Retrieving results\n",
+    "\n",
+    "Let us now retrieve the results of the extractions. If the status of the extraction is `SUCCESS`, we can retrieve the data from the `data` field. In case there are errors (status = `ERROR`), we can retrieve the error message from the `error` field. \n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "results = []\n",
+    "for job in jobs:\n",
+    "    extract_run = agent.list_extraction_runs(job_id=job.id)[0]\n",
+    "    if extract_run.status == \"SUCCESS\":\n",
+    "        results.append(extract_run.data)\n",
+    "    else:\n",
+    "        print(f\"Extraction status for job {job.id}: {extract_run.status}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'name': 'Dr. Rachel Zhang, Ph.D.',\n",
+       " 'email': 'rachel.zhang@email.com',\n",
+       " 'links': ['linkedin.com/in/rachelzhang',\n",
+       "  'github.com/rzhang-ai',\n",
+       "  'scholar.google.com/rachelzhang'],\n",
+       " 'experience': [{'company': 'DeepMind',\n",
+       "   'title': 'Senior Research Scientist',\n",
+       "   'description': '- Lead researcher on large-scale multi-task learning systems, developing novel architectures that improve cross-task generalization by 40%\\n- Pioneered new approach to zero-shot learning using contrastive training, published in NeurIPS 2023\\n- Built and led team of 6 researchers working on foundational ML models\\n- Developed novel regularization techniques for large language models, reducing catastrophic forgetting by 35%',\n",
+       "   'start_date': '2019',\n",
+       "   'end_date': 'Present'},\n",
+       "  {'company': 'Google Research',\n",
+       "   'title': 'Research Scientist',\n",
+       "   'description': '- Developed probabilistic frameworks for robust ML, published in ICML 2018\\n- Created novel attention mechanisms for computer vision models, improving accuracy by 25%\\n- Led collaboration with Google Brain team on efficient training methods for transformer models\\n- Mentored 4 PhD interns and collaborated with academic institutions',\n",
+       "   'start_date': '2015',\n",
+       "   'end_date': '2019'},\n",
+       "  {'company': 'Columbia University',\n",
+       "   'title': 'Research Assistant Professor',\n",
+       "   'description': '- Published seminal work on Bayesian optimization methods (cited 1000+ times)\\n- Taught graduate-level courses in Machine Learning and Statistical Learning Theory\\n- Supervised 5 PhD students and 3 MSc students\\n- Secured $500K in research grants for probabilistic ML research',\n",
+       "   'start_date': '2011',\n",
+       "   'end_date': '2015'}],\n",
+       " 'education': [{'institution': 'Columbia University',\n",
+       "   'degree': 'Ph.D. in Computer Science',\n",
+       "   'start_date': '2007',\n",
+       "   'end_date': '2011'},\n",
+       "  {'institution': 'Stanford University',\n",
+       "   'degree': 'M.S. in Computer Science',\n",
+       "   'start_date': '2005',\n",
+       "   'end_date': '2007'}],\n",
+       " 'technical_skills': {'programming_languages': ['Python',\n",
+       "   'C++',\n",
+       "   'Julia',\n",
+       "   'CUDA'],\n",
+       "  'frameworks': ['PyTorch', 'TensorFlow', 'JAX', 'Ray'],\n",
+       "  'skills': ['Deep Learning',\n",
+       "   'Reinforcement Learning',\n",
+       "   'Probabilistic Models',\n",
+       "   'Multi-Task Learning',\n",
+       "   'Zero-Shot Learning',\n",
+       "   'Neural Architecture Search']},\n",
+       " 'key_accomplishments': 'AI researcher with 12+ years of experience spanning classical machine learning, deep learning, and probabilistic modeling. Led groundbreaking research in reinforcement learning, generative models, and multi-task learning. Published 25+ papers in top-tier conferences (NeurIPS, ICML, ICLR). Strong track record of transitioning theoretical advances into practical applications in both academic and industrial settings.'}"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "results[0]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'name': 'Alex Park',\n",
+       " 'email': 'alex park@email.com',\n",
+       " 'links': ['linkedin.com/in/alexpark'],\n",
+       " 'experience': [{'company': 'SearchTech AI',\n",
+       "   'title': 'Senior Machine Learning Engineer',\n",
+       "   'description': 'Led development of next-generation learning-to-rank system using BER\\nArchitected and deployed real-time personalization system processing 10\\nIncreasing CTR by 15%\\nImproving search relevance by 24% (NDCG@10)',\n",
+       "   'start_date': None,\n",
+       "   'end_date': None},\n",
+       "  {'company': 'Commerce Corp',\n",
+       "   'title': '',\n",
+       "   'description': 'Developed semantic search system using transformer models and approximate nearest neighbors, reducing null search results by 35%',\n",
+       "   'start_date': None,\n",
+       "   'end_date': None},\n",
+       "  {'company': 'Tech Solutions Inc',\n",
+       "   'title': 'Machine Learning Engineer',\n",
+       "   'description': 'Implemented query understanding pipeline',\n",
+       "   'start_date': None,\n",
+       "   'end_date': None},\n",
+       "  {'company': '',\n",
+       "   'title': 'Software Engineer',\n",
+       "   'description': 'Built data pipelines and Flasticsearch',\n",
+       "   'start_date': None,\n",
+       "   'end_date': None}],\n",
+       " 'education': [{'institution': 'University of California, Berkeley',\n",
+       "   'degree': 'M.S. Computer Science',\n",
+       "   'start_date': None,\n",
+       "   'end_date': None},\n",
+       "  {'institution': 'University of California, Berkeley',\n",
+       "   'degree': 'B.S. Computer Science',\n",
+       "   'start_date': None,\n",
+       "   'end_date': None},\n",
+       "  {'institution': 'University of Washington',\n",
+       "   'degree': '',\n",
+       "   'start_date': None,\n",
+       "   'end_date': None}],\n",
+       " 'technical_skills': {'programming_languages': ['Python',\n",
+       "   'SQL',\n",
+       "   'Java',\n",
+       "   'Scala',\n",
+       "   'Shell Scripting'],\n",
+       "  'frameworks': ['PyTorch',\n",
+       "   'TensorFlow',\n",
+       "   'Scikit-learn',\n",
+       "   'Elasticsearch',\n",
+       "   'Solr',\n",
+       "   'Lucene',\n",
+       "   'BERT',\n",
+       "   'Word2Vec',\n",
+       "   'FastAI',\n",
+       "   'BM25',\n",
+       "   'FAISS',\n",
+       "   'Docker',\n",
+       "   'Kubernetes'],\n",
+       "  'skills': []},\n",
+       " 'key_accomplishments': 'Machine Learning Engineer with 5 years of experience building and deploying large-scale search and relevance systems: Specialized in developing personalized search algorithms, learning-to-rank models; and recommendation systems. Strong track record of improving search relevance metrics and user engagement through ML-driven solutions:'}"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "results[1]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'name': 'Sarah Chen',\n",
+       " 'email': 'sarah.chen@email.com',\n",
+       " 'links': [],\n",
+       " 'experience': [{'company': 'TechCorp Solutions',\n",
+       "   'title': 'Senior Software Architect',\n",
+       "   'description': '- Led architectural design and implementation of a cloud-native platform serving 2M+ users\\n- Established architectural guidelines and best practices adopted across 12 development teams\\n- Reduced system latency by 40% through implementation of event-driven architecture\\n- Mentored 15+ senior developers in cloud-native development practices',\n",
+       "   'start_date': '2020',\n",
+       "   'end_date': 'Present'},\n",
+       "  {'company': 'DataFlow Systems',\n",
+       "   'title': 'Lead Software Engineer',\n",
+       "   'description': '- Architected and led development of distributed data processing platform handling 5TB daily\\n- Designed microservices architecture reducing deployment time by 65%\\n- Led migration of legacy monolith to cloud-native architecture\\n- Managed team of 8 engineers across 3 international locations',\n",
+       "   'start_date': '2016',\n",
+       "   'end_date': '2020'},\n",
+       "  {'company': 'InnovateTech',\n",
+       "   'title': 'Senior Software Engineer',\n",
+       "   'description': '- Developed high-performance trading platform processing 100K transactions per second\\n- Implemented real-time analytics engine reducing processing latency by 75%\\n- Led adoption of container orchestration reducing deployment costs by 35%',\n",
+       "   'start_date': '2013',\n",
+       "   'end_date': '2016'}],\n",
+       " 'education': [{'institution': 'Stanford University',\n",
+       "   'degree': 'Master of Science in Computer Science',\n",
+       "   'start_date': None,\n",
+       "   'end_date': '2013'},\n",
+       "  {'institution': 'University of California, Berkeley',\n",
+       "   'degree': 'Bachelor of Science in Computer Engineering',\n",
+       "   'start_date': None,\n",
+       "   'end_date': '2011'}],\n",
+       " 'technical_skills': {'programming_languages': ['Java',\n",
+       "   'Python',\n",
+       "   'Go',\n",
+       "   'JavaScript/TypeScript'],\n",
+       "  'frameworks': [],\n",
+       "  'skills': ['Architecture & Design',\n",
+       "   'Microservices',\n",
+       "   'Event-Driven Architecture',\n",
+       "   'Domain-Driven Design',\n",
+       "   'REST APIs',\n",
+       "   'Cloud Platforms',\n",
+       "   'AWS (Advanced)',\n",
+       "   'Azure',\n",
+       "   'Google Cloud Platform']},\n",
+       " 'key_accomplishments': '- Co-inventor on three patents for distributed systems architecture\\n- Published paper on \"Scalable Microservices Architecture\" at IEEE Cloud Computing Conference 2022\\n- Keynote Speaker, CloudCon 2023: \"Future of Cloud-Native Architecture\"\\n- Regular presenter at local tech meetups and conferences'}"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "results[2]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Congratulations! You now have an agent that can extract structured data from resumes. \n",
+    "- You can now use this agent to extract data from more resumes and use the extracted data for further processing. \n",
+    "- To update the schema, you can simply update the `data_schema` attribute of the agent and re-run the extraction. \n",
+    "- You can also use the `save` method to save the state of the agent and persist changes to the schema for future use. \n",
+    "\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/examples/advanced_rag/dynamic_section_retrieval.ipynb b/examples/parse/advanced_rag/dynamic_section_retrieval.ipynb
similarity index 100%
rename from examples/advanced_rag/dynamic_section_retrieval.ipynb
rename to examples/parse/advanced_rag/dynamic_section_retrieval.ipynb
diff --git a/examples/advanced_rag/dynamic_section_retrieval_img.png b/examples/parse/advanced_rag/dynamic_section_retrieval_img.png
similarity index 100%
rename from examples/advanced_rag/dynamic_section_retrieval_img.png
rename to examples/parse/advanced_rag/dynamic_section_retrieval_img.png
diff --git a/examples/agents/demo_simple_openai_agent.ipynb b/examples/parse/agents/demo_simple_openai_agent.ipynb
similarity index 100%
rename from examples/agents/demo_simple_openai_agent.ipynb
rename to examples/parse/agents/demo_simple_openai_agent.ipynb
diff --git a/examples/caltrain/caltrain_schedule_weekend.pdf b/examples/parse/caltrain/caltrain_schedule_weekend.pdf
similarity index 100%
rename from examples/caltrain/caltrain_schedule_weekend.pdf
rename to examples/parse/caltrain/caltrain_schedule_weekend.pdf
diff --git a/examples/caltrain/caltrain_text_mode.ipynb b/examples/parse/caltrain/caltrain_text_mode.ipynb
similarity index 100%
rename from examples/caltrain/caltrain_text_mode.ipynb
rename to examples/parse/caltrain/caltrain_text_mode.ipynb
diff --git a/examples/data/BP_Excel.xlsx b/examples/parse/data/BP_Excel.xlsx
similarity index 100%
rename from examples/data/BP_Excel.xlsx
rename to examples/parse/data/BP_Excel.xlsx
diff --git a/examples/data/nvidia_quarterly_revenue_trend_by_market.xlsx b/examples/parse/data/nvidia_quarterly_revenue_trend_by_market.xlsx
similarity index 100%
rename from examples/data/nvidia_quarterly_revenue_trend_by_market.xlsx
rename to examples/parse/data/nvidia_quarterly_revenue_trend_by_market.xlsx
diff --git a/examples/demo_advanced.ipynb b/examples/parse/demo_advanced.ipynb
similarity index 100%
rename from examples/demo_advanced.ipynb
rename to examples/parse/demo_advanced.ipynb
diff --git a/examples/demo_advanced_astradb.ipynb b/examples/parse/demo_advanced_astradb.ipynb
similarity index 100%
rename from examples/demo_advanced_astradb.ipynb
rename to examples/parse/demo_advanced_astradb.ipynb
diff --git a/examples/demo_advanced_weaviate.ipynb b/examples/parse/demo_advanced_weaviate.ipynb
similarity index 100%
rename from examples/demo_advanced_weaviate.ipynb
rename to examples/parse/demo_advanced_weaviate.ipynb
diff --git a/examples/demo_api.ipynb b/examples/parse/demo_api.ipynb
similarity index 100%
rename from examples/demo_api.ipynb
rename to examples/parse/demo_api.ipynb
diff --git a/examples/demo_astradb.ipynb b/examples/parse/demo_astradb.ipynb
similarity index 100%
rename from examples/demo_astradb.ipynb
rename to examples/parse/demo_astradb.ipynb
diff --git a/examples/demo_basic.ipynb b/examples/parse/demo_basic.ipynb
similarity index 100%
rename from examples/demo_basic.ipynb
rename to examples/parse/demo_basic.ipynb
diff --git a/examples/demo_elasticsearch_vectordb.ipynb b/examples/parse/demo_elasticsearch_vectordb.ipynb
similarity index 100%
rename from examples/demo_elasticsearch_vectordb.ipynb
rename to examples/parse/demo_elasticsearch_vectordb.ipynb
diff --git a/examples/demo_excel.ipynb b/examples/parse/demo_excel.ipynb
similarity index 100%
rename from examples/demo_excel.ipynb
rename to examples/parse/demo_excel.ipynb
diff --git a/examples/demo_get_charts.ipynb b/examples/parse/demo_get_charts.ipynb
similarity index 100%
rename from examples/demo_get_charts.ipynb
rename to examples/parse/demo_get_charts.ipynb
diff --git a/examples/demo_insurance.ipynb b/examples/parse/demo_insurance.ipynb
similarity index 100%
rename from examples/demo_insurance.ipynb
rename to examples/parse/demo_insurance.ipynb
diff --git a/examples/demo_json.ipynb b/examples/parse/demo_json.ipynb
similarity index 100%
rename from examples/demo_json.ipynb
rename to examples/parse/demo_json.ipynb
diff --git a/examples/demo_json_parsing.ipynb b/examples/parse/demo_json_parsing.ipynb
similarity index 100%
rename from examples/demo_json_parsing.ipynb
rename to examples/parse/demo_json_parsing.ipynb
diff --git a/examples/demo_json_tour.ipynb b/examples/parse/demo_json_tour.ipynb
similarity index 100%
rename from examples/demo_json_tour.ipynb
rename to examples/parse/demo_json_tour.ipynb
diff --git a/examples/demo_languages.ipynb b/examples/parse/demo_languages.ipynb
similarity index 100%
rename from examples/demo_languages.ipynb
rename to examples/parse/demo_languages.ipynb
diff --git a/examples/demo_mongodb.ipynb b/examples/parse/demo_mongodb.ipynb
similarity index 100%
rename from examples/demo_mongodb.ipynb
rename to examples/parse/demo_mongodb.ipynb
diff --git a/examples/demo_parsing_instructions.ipynb b/examples/parse/demo_parsing_instructions.ipynb
similarity index 100%
rename from examples/demo_parsing_instructions.ipynb
rename to examples/parse/demo_parsing_instructions.ipynb
diff --git a/examples/demo_starter_multimodal.ipynb b/examples/parse/demo_starter_multimodal.ipynb
similarity index 100%
rename from examples/demo_starter_multimodal.ipynb
rename to examples/parse/demo_starter_multimodal.ipynb
diff --git a/examples/demo_starter_parse_selected_pages.ipynb b/examples/parse/demo_starter_parse_selected_pages.ipynb
similarity index 100%
rename from examples/demo_starter_parse_selected_pages.ipynb
rename to examples/parse/demo_starter_parse_selected_pages.ipynb
diff --git a/examples/demo_table_comparisons.ipynb b/examples/parse/demo_table_comparisons.ipynb
similarity index 100%
rename from examples/demo_table_comparisons.ipynb
rename to examples/parse/demo_table_comparisons.ipynb
diff --git a/examples/excel/dcf_rag.ipynb b/examples/parse/excel/dcf_rag.ipynb
similarity index 100%
rename from examples/excel/dcf_rag.ipynb
rename to examples/parse/excel/dcf_rag.ipynb
diff --git a/examples/excel/o1_excel_rag.ipynb b/examples/parse/excel/o1_excel_rag.ipynb
similarity index 100%
rename from examples/excel/o1_excel_rag.ipynb
rename to examples/parse/excel/o1_excel_rag.ipynb
diff --git a/examples/excel/references/query1.png b/examples/parse/excel/references/query1.png
similarity index 100%
rename from examples/excel/references/query1.png
rename to examples/parse/excel/references/query1.png
diff --git a/examples/excel/references/query2.png b/examples/parse/excel/references/query2.png
similarity index 100%
rename from examples/excel/references/query2.png
rename to examples/parse/excel/references/query2.png
diff --git a/examples/excel/references/query3.png b/examples/parse/excel/references/query3.png
similarity index 100%
rename from examples/excel/references/query3.png
rename to examples/parse/excel/references/query3.png
diff --git a/examples/excel/references/query4.png b/examples/parse/excel/references/query4.png
similarity index 100%
rename from examples/excel/references/query4.png
rename to examples/parse/excel/references/query4.png
diff --git a/examples/excel/references/query5.png b/examples/parse/excel/references/query5.png
similarity index 100%
rename from examples/excel/references/query5.png
rename to examples/parse/excel/references/query5.png
diff --git a/examples/excel/references/recursive_retrieval.png b/examples/parse/excel/references/recursive_retrieval.png
similarity index 100%
rename from examples/excel/references/recursive_retrieval.png
rename to examples/parse/excel/references/recursive_retrieval.png
diff --git a/examples/json_tour_screenshots/32778fb0-9e83-4b00-aebe-0d7f59ff0b5f-img_p0_1.png b/examples/parse/json_tour_screenshots/32778fb0-9e83-4b00-aebe-0d7f59ff0b5f-img_p0_1.png
similarity index 100%
rename from examples/json_tour_screenshots/32778fb0-9e83-4b00-aebe-0d7f59ff0b5f-img_p0_1.png
rename to examples/parse/json_tour_screenshots/32778fb0-9e83-4b00-aebe-0d7f59ff0b5f-img_p0_1.png
diff --git a/examples/json_tour_screenshots/32778fb0-9e83-4b00-aebe-0d7f59ff0b5f-page_1.jpg b/examples/parse/json_tour_screenshots/32778fb0-9e83-4b00-aebe-0d7f59ff0b5f-page_1.jpg
similarity index 100%
rename from examples/json_tour_screenshots/32778fb0-9e83-4b00-aebe-0d7f59ff0b5f-page_1.jpg
rename to examples/parse/json_tour_screenshots/32778fb0-9e83-4b00-aebe-0d7f59ff0b5f-page_1.jpg
diff --git a/examples/json_tour_screenshots/img_p0_1.png b/examples/parse/json_tour_screenshots/img_p0_1.png
similarity index 100%
rename from examples/json_tour_screenshots/img_p0_1.png
rename to examples/parse/json_tour_screenshots/img_p0_1.png
diff --git a/examples/json_tour_screenshots/links_page.png b/examples/parse/json_tour_screenshots/links_page.png
similarity index 100%
rename from examples/json_tour_screenshots/links_page.png
rename to examples/parse/json_tour_screenshots/links_page.png
diff --git a/examples/json_tour_screenshots/page_1.png b/examples/parse/json_tour_screenshots/page_1.png
similarity index 100%
rename from examples/json_tour_screenshots/page_1.png
rename to examples/parse/json_tour_screenshots/page_1.png
diff --git a/examples/json_tour_screenshots/page_35.png b/examples/parse/json_tour_screenshots/page_35.png
similarity index 100%
rename from examples/json_tour_screenshots/page_35.png
rename to examples/parse/json_tour_screenshots/page_35.png
diff --git a/examples/knowledge_graphs/kg_agent.ipynb b/examples/parse/knowledge_graphs/kg_agent.ipynb
similarity index 100%
rename from examples/knowledge_graphs/kg_agent.ipynb
rename to examples/parse/knowledge_graphs/kg_agent.ipynb
diff --git a/examples/knowledge_graphs/sf2023_budget_kg_screenshot.png b/examples/parse/knowledge_graphs/sf2023_budget_kg_screenshot.png
similarity index 100%
rename from examples/knowledge_graphs/sf2023_budget_kg_screenshot.png
rename to examples/parse/knowledge_graphs/sf2023_budget_kg_screenshot.png
diff --git a/examples/multimodal/claude_parse.ipynb b/examples/parse/multimodal/claude_parse.ipynb
similarity index 100%
rename from examples/multimodal/claude_parse.ipynb
rename to examples/parse/multimodal/claude_parse.ipynb
diff --git a/examples/multimodal/gpt4o_mini.ipynb b/examples/parse/multimodal/gpt4o_mini.ipynb
similarity index 100%
rename from examples/multimodal/gpt4o_mini.ipynb
rename to examples/parse/multimodal/gpt4o_mini.ipynb
diff --git a/examples/multimodal/insurance_rag.ipynb b/examples/parse/multimodal/insurance_rag.ipynb
similarity index 100%
rename from examples/multimodal/insurance_rag.ipynb
rename to examples/parse/multimodal/insurance_rag.ipynb
diff --git a/examples/multimodal/legal_rag.ipynb b/examples/parse/multimodal/legal_rag.ipynb
similarity index 100%
rename from examples/multimodal/legal_rag.ipynb
rename to examples/parse/multimodal/legal_rag.ipynb
diff --git a/examples/multimodal/llama2-p33.png b/examples/parse/multimodal/llama2-p33.png
similarity index 100%
rename from examples/multimodal/llama2-p33.png
rename to examples/parse/multimodal/llama2-p33.png
diff --git a/examples/multimodal/llama3.1-p5.png b/examples/parse/multimodal/llama3.1-p5.png
similarity index 100%
rename from examples/multimodal/llama3.1-p5.png
rename to examples/parse/multimodal/llama3.1-p5.png
diff --git a/examples/multimodal/multimodal_contextual_retrieval_rag.ipynb b/examples/parse/multimodal/multimodal_contextual_retrieval_rag.ipynb
similarity index 100%
rename from examples/multimodal/multimodal_contextual_retrieval_rag.ipynb
rename to examples/parse/multimodal/multimodal_contextual_retrieval_rag.ipynb
diff --git a/examples/multimodal/multimodal_contextual_retrieval_rag_img.png b/examples/parse/multimodal/multimodal_contextual_retrieval_rag_img.png
similarity index 100%
rename from examples/multimodal/multimodal_contextual_retrieval_rag_img.png
rename to examples/parse/multimodal/multimodal_contextual_retrieval_rag_img.png
diff --git a/examples/multimodal/multimodal_rag_slide_deck.ipynb b/examples/parse/multimodal/multimodal_rag_slide_deck.ipynb
similarity index 100%
rename from examples/multimodal/multimodal_rag_slide_deck.ipynb
rename to examples/parse/multimodal/multimodal_rag_slide_deck.ipynb
diff --git a/examples/multimodal/multimodal_rag_slide_deck_img.png b/examples/parse/multimodal/multimodal_rag_slide_deck_img.png
similarity index 100%
rename from examples/multimodal/multimodal_rag_slide_deck_img.png
rename to examples/parse/multimodal/multimodal_rag_slide_deck_img.png
diff --git a/examples/multimodal/multimodal_report_generation.ipynb b/examples/parse/multimodal/multimodal_report_generation.ipynb
similarity index 100%
rename from examples/multimodal/multimodal_report_generation.ipynb
rename to examples/parse/multimodal/multimodal_report_generation.ipynb
diff --git a/examples/multimodal/multimodal_report_generation_agent.ipynb b/examples/parse/multimodal/multimodal_report_generation_agent.ipynb
similarity index 100%
rename from examples/multimodal/multimodal_report_generation_agent.ipynb
rename to examples/parse/multimodal/multimodal_report_generation_agent.ipynb
diff --git a/examples/multimodal/multimodal_report_generation_agent_img.png b/examples/parse/multimodal/multimodal_report_generation_agent_img.png
similarity index 100%
rename from examples/multimodal/multimodal_report_generation_agent_img.png
rename to examples/parse/multimodal/multimodal_report_generation_agent_img.png
diff --git a/examples/multimodal/product_manual_rag.ipynb b/examples/parse/multimodal/product_manual_rag.ipynb
similarity index 100%
rename from examples/multimodal/product_manual_rag.ipynb
rename to examples/parse/multimodal/product_manual_rag.ipynb
diff --git a/examples/other_files/demo_ppt_basic.ipynb b/examples/parse/other_files/demo_ppt_basic.ipynb
similarity index 100%
rename from examples/other_files/demo_ppt_basic.ipynb
rename to examples/parse/other_files/demo_ppt_basic.ipynb
diff --git a/examples/other_files/demo_ppt_financial.ipynb b/examples/parse/other_files/demo_ppt_financial.ipynb
similarity index 100%
rename from examples/other_files/demo_ppt_financial.ipynb
rename to examples/parse/other_files/demo_ppt_financial.ipynb
diff --git a/examples/parsing_instructions/expense_report_document.pdf b/examples/parse/parsing_instructions/expense_report_document.pdf
similarity index 100%
rename from examples/parsing_instructions/expense_report_document.pdf
rename to examples/parse/parsing_instructions/expense_report_document.pdf
diff --git a/examples/parsing_instructions/expense_report_document.png b/examples/parse/parsing_instructions/expense_report_document.png
similarity index 100%
rename from examples/parsing_instructions/expense_report_document.png
rename to examples/parse/parsing_instructions/expense_report_document.png
diff --git a/examples/parsing_instructions/mcdonalds_receipt.png b/examples/parse/parsing_instructions/mcdonalds_receipt.png
similarity index 100%
rename from examples/parsing_instructions/mcdonalds_receipt.png
rename to examples/parse/parsing_instructions/mcdonalds_receipt.png
diff --git a/examples/parsing_instructions/parsing_instructions.ipynb b/examples/parse/parsing_instructions/parsing_instructions.ipynb
similarity index 100%
rename from examples/parsing_instructions/parsing_instructions.ipynb
rename to examples/parse/parsing_instructions/parsing_instructions.ipynb
diff --git a/examples/parsing_instructions/purchase_order_document.pdf b/examples/parse/parsing_instructions/purchase_order_document.pdf
similarity index 100%
rename from examples/parsing_instructions/purchase_order_document.pdf
rename to examples/parse/parsing_instructions/purchase_order_document.pdf
diff --git a/examples/parsing_instructions/purchase_order_document.png b/examples/parse/parsing_instructions/purchase_order_document.png
similarity index 100%
rename from examples/parsing_instructions/purchase_order_document.png
rename to examples/parse/parsing_instructions/purchase_order_document.png
diff --git a/examples/parsing_modes/demo_auto_mode.ipynb b/examples/parse/parsing_modes/demo_auto_mode.ipynb
similarity index 100%
rename from examples/parsing_modes/demo_auto_mode.ipynb
rename to examples/parse/parsing_modes/demo_auto_mode.ipynb
diff --git a/examples/parsing_modes/diagram.jpg b/examples/parse/parsing_modes/diagram.jpg
similarity index 100%
rename from examples/parsing_modes/diagram.jpg
rename to examples/parse/parsing_modes/diagram.jpg
diff --git a/examples/parsing_modes/mermaid_render.png b/examples/parse/parsing_modes/mermaid_render.png
similarity index 100%
rename from examples/parsing_modes/mermaid_render.png
rename to examples/parse/parsing_modes/mermaid_render.png
diff --git a/examples/parsing_modes/page_1.png b/examples/parse/parsing_modes/page_1.png
similarity index 100%
rename from examples/parsing_modes/page_1.png
rename to examples/parse/parsing_modes/page_1.png
diff --git a/examples/parsing_modes/page_11.png b/examples/parse/parsing_modes/page_11.png
similarity index 100%
rename from examples/parsing_modes/page_11.png
rename to examples/parse/parsing_modes/page_11.png
diff --git a/examples/parsing_modes/page_14.png b/examples/parse/parsing_modes/page_14.png
similarity index 100%
rename from examples/parsing_modes/page_14.png
rename to examples/parse/parsing_modes/page_14.png
diff --git a/examples/parsing_modes/page_3.png b/examples/parse/parsing_modes/page_3.png
similarity index 100%
rename from examples/parsing_modes/page_3.png
rename to examples/parse/parsing_modes/page_3.png
diff --git a/examples/report_generation/rfp_response/generate_rfp.ipynb b/examples/parse/report_generation/rfp_response/generate_rfp.ipynb
similarity index 100%
rename from examples/report_generation/rfp_response/generate_rfp.ipynb
rename to examples/parse/report_generation/rfp_response/generate_rfp.ipynb
diff --git a/examples/report_generation/rfp_response/generate_rfp_img.png b/examples/parse/report_generation/rfp_response/generate_rfp_img.png
similarity index 100%
rename from examples/report_generation/rfp_response/generate_rfp_img.png
rename to examples/parse/report_generation/rfp_response/generate_rfp_img.png
diff --git a/examples/test_tesla_impact_report/2019-tesla-impact-report-short.pdf b/examples/parse/test_tesla_impact_report/2019-tesla-impact-report-short.pdf
similarity index 100%
rename from examples/test_tesla_impact_report/2019-tesla-impact-report-short.pdf
rename to examples/parse/test_tesla_impact_report/2019-tesla-impact-report-short.pdf
diff --git a/examples/test_tesla_impact_report/test_gpt4o.ipynb b/examples/parse/test_tesla_impact_report/test_gpt4o.ipynb
similarity index 100%
rename from examples/test_tesla_impact_report/test_gpt4o.ipynb
rename to examples/parse/test_tesla_impact_report/test_gpt4o.ipynb
diff --git a/llama_cloud_services/__init__.py b/llama_cloud_services/__init__.py
new file mode 100644
index 0000000..9549d61
--- /dev/null
+++ b/llama_cloud_services/__init__.py
@@ -0,0 +1,11 @@
+from llama_cloud_services.parse import LlamaParse
+from llama_cloud_services.extract import LlamaExtract, ExtractionAgent
+from llama_cloud_services.report import ReportClient, LlamaReport
+
+__all__ = [
+    "LlamaParse",
+    "LlamaExtract",
+    "ExtractionAgent",
+    "ReportClient",
+    "LlamaReport",
+]
diff --git a/llama_cloud_services/extract/README.md b/llama_cloud_services/extract/README.md
new file mode 100644
index 0000000..1f24b25
--- /dev/null
+++ b/llama_cloud_services/extract/README.md
@@ -0,0 +1,186 @@
+# LlamaExtract
+
+> **⚠️ EXPERIMENTAL**
+> This library is under active development with frequent breaking changes. APIs and functionality may change significantly between versions. If you're interested in being an early adopter, please contact us at [support@llamaindex.ai](mailto:support@llamaindex.ai) or join our [Discord](https://discord.com/invite/eN6D2HQ4aX).
+
+LlamaExtract provides a simple API for extracting structured data from unstructured documents like PDFs, text files and images (upcoming).
+
+## Quick Start
+
+```python
+from llama_extract import LlamaExtract
+from pydantic import BaseModel, Field
+
+# Initialize client
+extractor = LlamaExtract()
+
+
+# Define schema using Pydantic
+class Resume(BaseModel):
+    name: str = Field(description="Full name of candidate")
+    email: str = Field(description="Email address")
+    skills: list[str] = Field(description="Technical skills and technologies")
+
+
+# Create extraction agent
+agent = extractor.create_agent(name="resume-parser", data_schema=Resume)
+
+# Extract data from document
+result = agent.extract("resume.pdf")
+print(result.data)
+```
+
+## Core Concepts
+
+- **Extraction Agents**: Reusable extractors configured with a specific schema and extraction settings.
+- **Data Schema**: Structure definition for the data you want to extract.
+- **Extraction Jobs**: Asynchronous extraction tasks that can be monitored.
+
+## Defining Schemas
+
+Schemas can be defined using either Pydantic models or JSON Schema:
+
+### Using Pydantic (Recommended)
+
+```python
+from pydantic import BaseModel, Field
+from typing import List, Optional
+
+
+class Experience(BaseModel):
+    company: str = Field(description="Company name")
+    title: str = Field(description="Job title")
+    start_date: Optional[str] = Field(description="Start date of employment")
+    end_date: Optional[str] = Field(description="End date of employment")
+
+
+class Resume(BaseModel):
+    name: str = Field(description="Candidate name")
+    experience: List[Experience] = Field(description="Work history")
+```
+
+### Using JSON Schema
+
+```python
+schema = {
+    "type": "object",
+    "properties": {
+        "name": {"type": "string", "description": "Candidate name"},
+        "experience": {
+            "type": "array",
+            "description": "Work history",
+            "items": {
+                "type": "object",
+                "properties": {
+                    "company": {
+                        "type": "string",
+                        "description": "Company name",
+                    },
+                    "title": {"type": "string", "description": "Job title"},
+                    "start_date": {
+                        "anyOf": [{"type": "string"}, {"type": "null"}],
+                        "description": "Start date of employment",
+                    },
+                    "end_date": {
+                        "anyOf": [{"type": "string"}, {"type": "null"}],
+                        "description": "End date of employment",
+                    },
+                },
+            },
+        },
+    },
+}
+
+agent = extractor.create_agent(name="resume-parser", data_schema=schema)
+```
+
+### Important restrictions on JSON/Pydantic Schema
+
+_LlamaExtract only supports a subset of the JSON Schema specification._ While limited, it should
+be sufficient for a wide variety of use-cases.
+
+- All fields are required by default. Nullable fields must be explicitly marked as such,
+  using `"anyOf"` with a `"null"` type. See `"start_date"` field above.
+- Root node must be of type `"object"`.
+- Schema nesting must be limited to within 5 levels.
+- The important fields are key names/titles, type and description. Fields for
+  formatting, default values, etc. are not supported.
+- There are other restrictions on number of keys, size of the schema, etc. that you may
+  hit for complex extraction use cases. In such cases, it is worth thinking how to restructure
+  your extraction workflow to fit within these constraints, e.g. by extracting subset of fields
+  and later merging them together.
+
+## Other Extraction APIs
+
+### Batch Processing
+
+Process multiple files asynchronously:
+
+```python
+# Queue multiple files for extraction
+jobs = await agent.queue_extraction(["resume1.pdf", "resume2.pdf"])
+
+# Check job status
+for job in jobs:
+    status = agent.get_extraction_job(job.id).status
+    print(f"Job {job.id}: {status}")
+
+# Get results when complete
+results = [agent.get_extraction_run_for_job(job.id) for job in jobs]
+```
+
+### Updating Schemas
+
+Schemas can be modified and updated after creation:
+
+```python
+# Update schema
+agent.data_schema = new_schema
+
+# Save changes
+agent.save()
+```
+
+### Managing Agents
+
+```python
+# List all agents
+agents = extractor.list_agents()
+
+# Get specific agent
+agent = extractor.get_agent(name="resume-parser")
+
+# Delete agent
+extractor.delete_agent(agent.id)
+```
+
+## Installation
+
+```bash
+pip install llama-extract==0.1.0
+```
+
+## Tips & Best Practices
+
+1. **Schema Design**:
+
+   - Try to limit schema nesting to 3-4 levels.
+   - Make fields optional when data might not always be present. Having required fields may force the model
+     to hallucinate when these fields are not present in the documents.
+   - When you want to extract a variable number of entities, use an `array` type. Note that you cannot use
+     an `array` type for the root node.
+   - Use descriptive field names and detailed descriptions. Use descriptions to pass formatting
+     instructions or few-shot examples.
+   - Start simple and iteratively build your schema to incorporate requirements.
+
+2. **Running Extractions**:
+   - Note that resetting `agent.schema` will not save the schema to the database,
+     until you call `agent.save`, but it will be used for running extractions.
+   - Check job status prior to accessing results. Any extraction error should be available as
+     part of `job.error` or `extraction_run.error` fields for debugging.
+   - Consider async operations (`queue_extraction`) for large-scale extraction once you have finalized your schema.
+
+## Additional Resources
+
+- [Example Notebook](examples/resume_screening.ipynb) - Detailed walkthrough of resume parsing
+- [Discord Community](https://discord.com/invite/eN6D2HQ4aX) - Get help and share feedback
diff --git a/llama_cloud_services/extract/__init__.py b/llama_cloud_services/extract/__init__.py
new file mode 100644
index 0000000..ff4ffca
--- /dev/null
+++ b/llama_cloud_services/extract/__init__.py
@@ -0,0 +1,3 @@
+from llama_cloud_services.extract.extract import LlamaExtract, ExtractionAgent
+
+__all__ = ["LlamaExtract", "ExtractionAgent"]
diff --git a/llama_cloud_services/extract/extract.py b/llama_cloud_services/extract/extract.py
new file mode 100644
index 0000000..3c37d82
--- /dev/null
+++ b/llama_cloud_services/extract/extract.py
@@ -0,0 +1,655 @@
+import asyncio
+import os
+import time
+from io import BufferedIOBase, BufferedReader, BytesIO
+from pathlib import Path
+from typing import List, Optional, Type, Union, Coroutine, Any, TypeVar
+import warnings
+import httpx
+from pydantic import BaseModel
+from llama_cloud import (
+    ExtractAgent as CloudExtractAgent,
+    ExtractConfig,
+    ExtractJob,
+    ExtractJobCreate,
+    ExtractRun,
+    File,
+    ExtractMode,
+    StatusEnum,
+    Project,
+    ExtractTarget,
+    LlamaExtractSettings,
+)
+from llama_cloud.client import AsyncLlamaCloud
+from llama_cloud_services.extract.utils import JSONObjectType, augment_async_errors
+from llama_index.core.schema import BaseComponent
+from llama_index.core.async_utils import run_jobs
+from llama_index.core.bridge.pydantic import Field, PrivateAttr
+from llama_index.core.constants import DEFAULT_BASE_URL
+from concurrent.futures import ThreadPoolExecutor
+
+T = TypeVar("T")
+
+FileInput = Union[str, Path, bytes, BufferedIOBase]
+SchemaInput = Union[JSONObjectType, Type[BaseModel]]
+
+DEFAULT_EXTRACT_CONFIG = ExtractConfig(
+    extraction_target=ExtractTarget.PER_DOC,
+    extraction_mode=ExtractMode.ACCURATE,
+)
+
+
+class ExtractionAgent:
+    """Class representing a single extraction agent with methods for extraction operations."""
+
+    def __init__(
+        self,
+        client: AsyncLlamaCloud,
+        agent: CloudExtractAgent,
+        project_id: Optional[str] = None,
+        organization_id: Optional[str] = None,
+        check_interval: int = 1,
+        max_timeout: int = 2000,
+        num_workers: int = 4,
+        show_progress: bool = True,
+        verbose: bool = False,
+    ):
+        self._client = client
+        self._agent = agent
+        self._project_id = project_id
+        self._organization_id = organization_id
+        self.check_interval = check_interval
+        self.max_timeout = max_timeout
+        self.num_workers = num_workers
+        self.show_progress = show_progress
+        self._verbose = verbose
+        self._data_schema: Union[JSONObjectType, None] = None
+        self._config: Union[ExtractConfig, None] = None
+        self._thread_pool = ThreadPoolExecutor(
+            max_workers=min(10, (os.cpu_count() or 1) + 4)
+        )
+
+    def _run_in_thread(self, coro: Coroutine[Any, Any, T]) -> T:
+        """Run coroutine in a separate thread to avoid event loop issues"""
+
+        def run_coro() -> T:
+            async def wrapped_coro() -> T:
+                async with httpx.AsyncClient(
+                    timeout=self._client._client_wrapper.httpx_client.timeout,
+                ) as client:
+                    original_client = self._client._client_wrapper.httpx_client
+                    self._client._client_wrapper.httpx_client = client
+                    try:
+                        return await coro
+                    finally:
+                        self._client._client_wrapper.httpx_client = original_client
+
+            return asyncio.run(wrapped_coro())
+
+        return self._thread_pool.submit(run_coro).result()
+
+    @property
+    def id(self) -> str:
+        return self._agent.id
+
+    @property
+    def name(self) -> str:
+        return self._agent.name
+
+    @property
+    def data_schema(self) -> dict:
+        return self._agent.data_schema if not self._data_schema else self._data_schema
+
+    @data_schema.setter
+    def data_schema(self, data_schema: SchemaInput) -> None:
+        processed_schema: JSONObjectType
+        if isinstance(data_schema, dict):
+            # TODO: if we expose a get_validated JSON schema method, we can use it here
+            processed_schema = data_schema  # type: ignore
+        elif isinstance(data_schema, type) and issubclass(data_schema, BaseModel):
+            processed_schema = data_schema.model_json_schema()
+        else:
+            raise ValueError(
+                "data_schema must be either a dictionary or a Pydantic model"
+            )
+        validated_schema = self._run_in_thread(
+            self._client.llama_extract.validate_extraction_schema(
+                data_schema=processed_schema
+            )
+        )
+        self._data_schema = validated_schema.data_schema
+
+    @property
+    def config(self) -> ExtractConfig:
+        return self._agent.config if not self._config else self._config
+
+    @config.setter
+    def config(self, config: ExtractConfig) -> None:
+        self._config = config
+
+    async def _upload_file(self, file_input: FileInput) -> File:
+        """Upload a file for extraction."""
+        if isinstance(file_input, BufferedIOBase):
+            upload_file = file_input
+        elif isinstance(file_input, bytes):
+            upload_file = BytesIO(file_input)
+        elif isinstance(file_input, (str, Path)):
+            upload_file = open(file_input, "rb")
+        else:
+            raise ValueError(
+                "file_input must be either a file path string, file bytes, or buffer object"
+            )
+
+        try:
+            return await self._client.files.upload_file(
+                project_id=self._project_id, upload_file=upload_file
+            )
+        finally:
+            if isinstance(upload_file, BufferedReader):
+                upload_file.close()
+
+    async def _wait_for_job_result(self, job_id: str) -> Optional[ExtractRun]:
+        """Wait for and return the results of an extraction job."""
+        start = time.perf_counter()
+        tries = 0
+        while True:
+            await asyncio.sleep(self.check_interval)
+            tries += 1
+            job = await self._client.llama_extract.get_job(
+                job_id=job_id,
+            )
+
+            if job.status == StatusEnum.SUCCESS:
+                return await self._client.llama_extract.get_run_by_job_id(
+                    job_id=job_id,
+                )
+            elif job.status == StatusEnum.PENDING:
+                end = time.perf_counter()
+                if end - start > self.max_timeout:
+                    raise Exception(f"Timeout while extracting the file: {job_id}")
+                if self._verbose and tries % 10 == 0:
+                    print(".", end="", flush=True)
+                continue
+            else:
+                warnings.warn(
+                    f"Failure in job: {job_id}, status: {job.status}, error: {job.error}"
+                )
+                return await self._client.llama_extract.get_run_by_job_id(
+                    job_id=job_id,
+                )
+
+    def save(self) -> None:
+        """Persist the extraction agent's schema and config to the database.
+
+        Returns:
+            ExtractionAgent: The updated extraction agent
+        """
+        self._agent = self._run_in_thread(
+            self._client.llama_extract.update_extraction_agent(
+                extraction_agent_id=self.id,
+                data_schema=self.data_schema,
+                config=self.config,
+            )
+        )
+
+    async def _queue_extraction_test(
+        self,
+        files: Union[FileInput, List[FileInput]],
+        extract_settings: LlamaExtractSettings,
+    ) -> Union[ExtractJob, List[ExtractJob]]:
+        if not isinstance(files, list):
+            files = [files]
+            single_file = True
+        else:
+            single_file = False
+
+        upload_tasks = [self._upload_file(file) for file in files]
+        with augment_async_errors():
+            uploaded_files = await run_jobs(
+                upload_tasks,
+                workers=self.num_workers,
+                desc="Uploading files",
+                show_progress=self.show_progress,
+            )
+
+        async def run_job(file: File) -> ExtractRun:
+            job_queued = await self._client.llama_extract.run_job_test_user(
+                job_create=ExtractJobCreate(
+                    extraction_agent_id=self.id,
+                    file_id=file.id,
+                    data_schema_override=self.data_schema,
+                    config_override=self.config,
+                ),
+                extract_settings=extract_settings,
+            )
+            return await self._wait_for_job_result(job_queued.id)
+
+        job_tasks = [run_job(file) for file in uploaded_files]
+        with augment_async_errors():
+            extract_jobs = await run_jobs(
+                job_tasks,
+                workers=self.num_workers,
+                desc="Creating extraction jobs",
+                show_progress=self.show_progress,
+            )
+
+        if self._verbose:
+            for file, job in zip(files, extract_jobs):
+                file_repr = (
+                    str(file) if isinstance(file, (str, Path)) else "<bytes/buffer>"
+                )
+                print(
+                    f"Queued file extraction for file {file_repr} under job_id {job.id}"
+                )
+
+        return extract_jobs[0] if single_file else extract_jobs
+
+    async def queue_extraction(
+        self,
+        files: Union[FileInput, List[FileInput]],
+    ) -> Union[ExtractJob, List[ExtractJob]]:
+        """
+        Queue multiple files for extraction.
+
+        Args:
+            files (Union[FileInput, List[FileInput]]): The files to extract
+
+        Returns:
+            Union[ExtractJob, List[ExtractJob]]: The queued extraction jobs
+        """
+        """Queue one or more files for extraction concurrently."""
+        if not isinstance(files, list):
+            files = [files]
+            single_file = True
+        else:
+            single_file = False
+
+        upload_tasks = [self._upload_file(file) for file in files]
+        with augment_async_errors():
+            uploaded_files = await run_jobs(
+                upload_tasks,
+                workers=self.num_workers,
+                desc="Uploading files",
+                show_progress=self.show_progress,
+            )
+
+        job_tasks = [
+            self._client.llama_extract.run_job(
+                request=ExtractJobCreate(
+                    extraction_agent_id=self.id,
+                    file_id=file.id,
+                    data_schema_override=self.data_schema,
+                    config_override=self.config,
+                ),
+            )
+            for file in uploaded_files
+        ]
+        with augment_async_errors():
+            extract_jobs = await run_jobs(
+                job_tasks,
+                workers=self.num_workers,
+                desc="Creating extraction jobs",
+                show_progress=self.show_progress,
+            )
+
+        if self._verbose:
+            for file, job in zip(files, extract_jobs):
+                file_repr = (
+                    str(file) if isinstance(file, (str, Path)) else "<bytes/buffer>"
+                )
+                print(
+                    f"Queued file extraction for file {file_repr} under job_id {job.id}"
+                )
+
+        return extract_jobs[0] if single_file else extract_jobs
+
+    async def aextract(
+        self, files: Union[FileInput, List[FileInput]]
+    ) -> Union[ExtractRun, List[ExtractRun]]:
+        """Asynchronously extract data from one or more files using this agent.
+
+        Args:
+            files (Union[FileInput, List[FileInput]]): The files to extract
+
+        Returns:
+            Union[ExtractRun, List[ExtractRun]]: The extraction results
+        """
+        if not isinstance(files, list):
+            files = [files]
+            single_file = True
+        else:
+            single_file = False
+
+        # Queue all files for extraction
+        jobs = await self.queue_extraction(files)
+        # Wait for all results concurrently
+        result_tasks = [self._wait_for_job_result(job.id) for job in jobs]
+        with augment_async_errors():
+            results = await run_jobs(
+                result_tasks,
+                workers=self.num_workers,
+                desc="Extracting files",
+                show_progress=self.show_progress,
+            )
+
+        return results[0] if single_file else results
+
+    def extract(
+        self, files: Union[FileInput, List[FileInput]]
+    ) -> Union[ExtractRun, List[ExtractRun]]:
+        """Synchronously extract data from one or more files using this agent.
+
+        Args:
+            files (Union[FileInput, List[FileInput]]): The files to extract
+
+        Returns:
+            Union[ExtractRun, List[ExtractRun]]: The extraction results
+        """
+        return self._run_in_thread(self.aextract(files))
+
+    def get_extraction_job(self, job_id: str) -> ExtractJob:
+        """
+        Get the extraction job for a given job_id.
+
+        Args:
+            job_id (str): The job_id to get the extraction job for
+
+        Returns:
+            ExtractJob: The extraction job
+        """
+        return self._run_in_thread(self._client.llama_extract.get_job(job_id=job_id))
+
+    def get_extraction_run_for_job(self, job_id: str) -> ExtractRun:
+        """
+        Get the extraction run for a given job_id.
+
+        Args:
+            job_id (str): The job_id to get the extraction run for
+
+        Returns:
+            ExtractRun: The extraction run
+        """
+        return self._run_in_thread(
+            self._client.llama_extract.get_run_by_job_id(
+                job_id=job_id,
+            )
+        )
+
+    def list_extraction_runs(self) -> List[ExtractRun]:
+        """List extraction runs for the extraction agent.
+
+        Returns:
+            List[ExtractRun]: List of extraction runs
+        """
+        return self._run_in_thread(
+            self._client.llama_extract.list_extract_runs(
+                extraction_agent_id=self.id,
+            )
+        )
+
+    def __repr__(self) -> str:
+        return f"ExtractionAgent(id={self.id}, name={self.name})"
+
+
+class LlamaExtract(BaseComponent):
+    """Factory class for creating and managing extraction agents."""
+
+    api_key: str = Field(description="The API key for the LlamaExtract API.")
+    base_url: str = Field(description="The base URL of the LlamaExtract API.")
+    check_interval: int = Field(
+        default=1,
+        description="The interval in seconds to check if the extraction is done.",
+    )
+    max_timeout: int = Field(
+        default=2000,
+        description="The maximum timeout in seconds to wait for the extraction to finish.",
+    )
+    num_workers: int = Field(
+        default=4,
+        gt=0,
+        lt=10,
+        description="The number of workers to use sending API requests for extraction.",
+    )
+    show_progress: bool = Field(
+        default=True, description="Show progress when extracting multiple files."
+    )
+    verbose: bool = Field(
+        default=False, description="Show verbose output when extracting files."
+    )
+    _async_client: AsyncLlamaCloud = PrivateAttr()
+    _thread_pool: ThreadPoolExecutor = PrivateAttr()
+    _project_id: Optional[str] = PrivateAttr()
+    _organization_id: Optional[str] = PrivateAttr()
+
+    def __init__(
+        self,
+        api_key: Optional[str] = None,
+        base_url: Optional[str] = None,
+        check_interval: int = 1,
+        max_timeout: int = 2000,
+        num_workers: int = 4,
+        show_progress: bool = True,
+        project_id: Optional[str] = None,
+        organization_id: Optional[str] = None,
+        verbose: bool = False,
+    ):
+        if not api_key:
+            api_key = os.getenv("LLAMA_CLOUD_API_KEY", None)
+            if api_key is None:
+                raise ValueError("The API key is required.")
+
+        if not base_url:
+            base_url = os.getenv("LLAMA_CLOUD_BASE_URL", None) or DEFAULT_BASE_URL
+
+        super().__init__(
+            api_key=api_key,
+            base_url=base_url,
+            check_interval=check_interval,
+            max_timeout=max_timeout,
+            num_workers=num_workers,
+            show_progress=show_progress,
+            verbose=verbose,
+        )
+
+        self._async_client = AsyncLlamaCloud(
+            token=self.api_key, base_url=self.base_url, timeout=None
+        )
+        self._thread_pool = ThreadPoolExecutor(
+            max_workers=min(10, (os.cpu_count() or 1) + 4)
+        )
+        # Fetch default project id if not provided
+        if not project_id:
+            project_id = os.getenv("LLAMA_CLOUD_PROJECT_ID", None)
+            if not project_id:
+                print("No project_id provided, fetching default project.")
+                projects: List[Project] = self._run_in_thread(
+                    self._async_client.projects.list_projects()
+                )
+                default_project = [p for p in projects if p.is_default]
+                if not default_project:
+                    raise ValueError(
+                        "No default project found. Please provide a project_id."
+                    )
+                project_id = default_project[0].id
+
+        self._project_id = project_id
+        self._organization_id = organization_id
+
+    def _run_in_thread(self, coro: Coroutine[Any, Any, T]) -> T:
+        """Run coroutine in a separate thread to avoid event loop issues"""
+
+        def run_coro() -> T:
+            # Create a new client for this thread
+            async def wrapped_coro() -> T:
+                async with httpx.AsyncClient(
+                    timeout=self._async_client._client_wrapper.httpx_client.timeout,
+                ) as client:
+                    # Replace the client in the coro's context
+                    original_client = self._async_client._client_wrapper.httpx_client
+                    self._async_client._client_wrapper.httpx_client = client
+                    try:
+                        return await coro
+                    finally:
+                        self._async_client._client_wrapper.httpx_client = (
+                            original_client
+                        )
+
+            return asyncio.run(wrapped_coro())
+
+        return self._thread_pool.submit(run_coro).result()
+
+    def create_agent(
+        self,
+        name: str,
+        data_schema: SchemaInput,
+        config: Optional[ExtractConfig] = None,
+    ) -> ExtractionAgent:
+        """Create a new extraction agent.
+
+        Args:
+            name (str): The name of the extraction agent
+            data_schema (SchemaInput): The data schema for the extraction agent
+            config (Optional[ExtractConfig]): The extraction config for the agent
+
+        Returns:
+            ExtractionAgent: The created extraction agent
+        """
+
+        if isinstance(data_schema, dict):
+            data_schema = data_schema
+        elif issubclass(data_schema, BaseModel):
+            data_schema = data_schema.model_json_schema()
+        else:
+            raise ValueError(
+                "data_schema must be either a dictionary or a Pydantic model"
+            )
+
+        agent = self._run_in_thread(
+            self._async_client.llama_extract.create_extraction_agent(
+                name=name,
+                data_schema=data_schema,
+                config=config or DEFAULT_EXTRACT_CONFIG,
+                project_id=self._project_id,
+                organization_id=self._organization_id,
+            )
+        )
+
+        return ExtractionAgent(
+            client=self._async_client,
+            agent=agent,
+            project_id=self._project_id,
+            organization_id=self._organization_id,
+            check_interval=self.check_interval,
+            max_timeout=self.max_timeout,
+            num_workers=self.num_workers,
+            show_progress=self.show_progress,
+            verbose=self.verbose,
+        )
+
+    def get_agent(
+        self,
+        name: Optional[str] = None,
+        id: Optional[str] = None,
+    ) -> ExtractionAgent:
+        """Get extraction agents by name or extraction agent ID.
+
+        Args:
+            name (Optional[str]): Filter by name
+            extraction_agent_id (Optional[str]): Filter by extraction agent ID
+
+        Returns:
+            ExtractionAgent: The extraction agent
+        """
+        if id is not None and name is not None:
+            warnings.warn(
+                "Both name and extraction_agent_id are provided. Using extraction_agent_id."
+            )
+
+        if id:
+            agent = self._run_in_thread(
+                self._async_client.llama_extract.get_extraction_agent(
+                    extraction_agent_id=id,
+                )
+            )
+
+        elif name:
+            agent = self._run_in_thread(
+                self._async_client.llama_extract.get_extraction_agent_by_name(
+                    name=name,
+                    project_id=self._project_id,
+                )
+            )
+        else:
+            raise ValueError("Either name or extraction_agent_id must be provided.")
+
+        return ExtractionAgent(
+            client=self._async_client,
+            agent=agent,
+            project_id=self._project_id,
+            organization_id=self._organization_id,
+            check_interval=self.check_interval,
+            max_timeout=self.max_timeout,
+            num_workers=self.num_workers,
+            show_progress=self.show_progress,
+            verbose=self.verbose,
+        )
+
+    def list_agents(self) -> List[ExtractionAgent]:
+        """List all available extraction agents."""
+        agents = self._run_in_thread(
+            self._async_client.llama_extract.list_extraction_agents(
+                project_id=self._project_id,
+            )
+        )
+
+        return [
+            ExtractionAgent(
+                client=self._async_client,
+                agent=agent,
+                project_id=self._project_id,
+                organization_id=self._organization_id,
+                check_interval=self.check_interval,
+                max_timeout=self.max_timeout,
+                num_workers=self.num_workers,
+                show_progress=self.show_progress,
+                verbose=self.verbose,
+            )
+            for agent in agents
+        ]
+
+    def delete_agent(self, agent_id: str) -> None:
+        """Delete an extraction agent by ID.
+
+        Args:
+            agent_id (str): ID of the extraction agent to delete
+        """
+        self._run_in_thread(
+            self._async_client.llama_extract.delete_extraction_agent(
+                extraction_agent_id=agent_id
+            )
+        )
+
+
+if __name__ == "__main__":
+    from dotenv import load_dotenv
+
+    load_dotenv()
+
+    data_dir = Path(__file__).parent.parent / "tests" / "data"
+    extractor = LlamaExtract()
+    try:
+        agent = extractor.get_agent(name="test-agent")
+    except Exception:
+        agent = extractor.create_agent(
+            "test-agent",
+            {
+                "type": "object",
+                "properties": {
+                    "title": {"type": "string"},
+                    "summary": {"type": "string"},
+                },
+            },
+        )
+    results = agent.extract(data_dir / "slide" / "conocophilips.pdf")
+    extractor.delete_agent(agent.id)
+    print(results)
diff --git a/llama_cloud_services/extract/utils.py b/llama_cloud_services/extract/utils.py
new file mode 100644
index 0000000..8ac4f6e
--- /dev/null
+++ b/llama_cloud_services/extract/utils.py
@@ -0,0 +1,34 @@
+from typing import Any, Dict, List, Union, Generator
+from contextlib import contextmanager
+
+# Asyncio error messages
+nest_asyncio_err = "cannot be called from a running event loop"
+nest_asyncio_msg = (
+    "The event loop is already running. "
+    "Add `import nest_asyncio; nest_asyncio.apply()` to your code to fix this issue."
+)
+
+
+def is_jupyter() -> bool:
+    """Check if we're running in a Jupyter environment."""
+    try:
+        from IPython import get_ipython
+
+        return get_ipython().__class__.__name__ == "ZMQInteractiveShell"
+    except (ImportError, AttributeError):
+        return False
+
+
+@contextmanager
+def augment_async_errors() -> Generator[None, None, None]:
+    """Context manager to add helpful information for errors due to nested event loops."""
+    try:
+        yield
+    except RuntimeError as e:
+        if nest_asyncio_err in str(e):
+            raise RuntimeError(nest_asyncio_msg)
+        raise
+
+
+JSONType = Union[Dict[str, Any], List[Any], str, int, float, bool, None]
+JSONObjectType = Dict[str, JSONType]
diff --git a/llama_cloud_services/parse/README.md b/llama_cloud_services/parse/README.md
new file mode 100644
index 0000000..68f5332
--- /dev/null
+++ b/llama_cloud_services/parse/README.md
@@ -0,0 +1,165 @@
+# LlamaParse
+
+[![PyPI - Downloads](https://img.shields.io/pypi/dm/llama-parse)](https://pypi.org/project/llama-parse/)
+[![GitHub contributors](https://img.shields.io/github/contributors/run-llama/llama_parse)](https://github.com/run-llama/llama_parse/graphs/contributors)
+[![Discord](https://img.shields.io/discord/1059199217496772688)](https://discord.gg/dGcwcsnxhU)
+
+LlamaParse is a **GenAI-native document parser** that can parse complex document data for any downstream LLM use case (RAG, agents).
+
+It is really good at the following:
+
+- ✅ **Broad file type support**: Parsing a variety of unstructured file types (.pdf, .pptx, .docx, .xlsx, .html) with text, tables, visual elements, weird layouts, and more.
+- ✅ **Table recognition**: Parsing embedded tables accurately into text and semi-structured representations.
+- ✅ **Multimodal parsing and chunking**: Extracting visual elements (images/diagrams) into structured formats and return image chunks using the latest multimodal models.
+- ✅ **Custom parsing**: Input custom prompt instructions to customize the output the way you want it.
+
+LlamaParse directly integrates with [LlamaIndex](https://github.com/run-llama/llama_index).
+
+The free plan is up to 1000 pages a day. Paid plan is free 7k pages per week + 0.3c per additional page by default. There is a sandbox available to test the API [**https://cloud.llamaindex.ai/parse ↗**](https://cloud.llamaindex.ai/parse).
+
+Read below for some quickstart information, or see the [full documentation](https://docs.cloud.llamaindex.ai/).
+
+If you're a company interested in enterprise RAG solutions, and/or high volume/on-prem usage of LlamaParse, come [talk to us](https://www.llamaindex.ai/contact).
+
+## Getting Started
+
+First, login and get an api-key from [**https://cloud.llamaindex.ai/api-key ↗**](https://cloud.llamaindex.ai/api-key).
+
+Then, make sure you have the latest LlamaIndex version installed.
+
+**NOTE:** If you are upgrading from v0.9.X, we recommend following our [migration guide](https://pretty-sodium-5e0.notion.site/v0-10-0-Migration-Guide-6ede431dcb8841b09ea171e7f133bd77), as well as uninstalling your previous version first.
+
+```
+pip uninstall llama-index  # run this if upgrading from v0.9.x or older
+pip install -U llama-index --upgrade --no-cache-dir --force-reinstall
+```
+
+Lastly, install the package:
+
+`pip install llama-parse`
+
+Now you can parse your first PDF file using the command line interface. Use the command `llama-parse [file_paths]`. See the help text with `llama-parse --help`.
+
+```bash
+export LLAMA_CLOUD_API_KEY='llx-...'
+
+# output as text
+llama-parse my_file.pdf --result-type text --output-file output.txt
+
+# output as markdown
+llama-parse my_file.pdf --result-type markdown --output-file output.md
+
+# output as raw json
+llama-parse my_file.pdf --output-raw-json --output-file output.json
+```
+
+You can also create simple scripts:
+
+```python
+import nest_asyncio
+
+nest_asyncio.apply()
+
+from llama_parse import LlamaParse
+
+parser = LlamaParse(
+    api_key="llx-...",  # can also be set in your env as LLAMA_CLOUD_API_KEY
+    result_type="markdown",  # "markdown" and "text" are available
+    num_workers=4,  # if multiple files passed, split in `num_workers` API calls
+    verbose=True,
+    language="en",  # Optionally you can define a language, default=en
+)
+
+# sync
+documents = parser.load_data("./my_file.pdf")
+
+# sync batch
+documents = parser.load_data(["./my_file1.pdf", "./my_file2.pdf"])
+
+# async
+documents = await parser.aload_data("./my_file.pdf")
+
+# async batch
+documents = await parser.aload_data(["./my_file1.pdf", "./my_file2.pdf"])
+```
+
+## Using with file object
+
+You can parse a file object directly:
+
+```python
+import nest_asyncio
+
+nest_asyncio.apply()
+
+from llama_parse import LlamaParse
+
+parser = LlamaParse(
+    api_key="llx-...",  # can also be set in your env as LLAMA_CLOUD_API_KEY
+    result_type="markdown",  # "markdown" and "text" are available
+    num_workers=4,  # if multiple files passed, split in `num_workers` API calls
+    verbose=True,
+    language="en",  # Optionally you can define a language, default=en
+)
+
+file_name = "my_file1.pdf"
+extra_info = {"file_name": file_name}
+
+with open(f"./{file_name}", "rb") as f:
+    # must provide extra_info with file_name key with passing file object
+    documents = parser.load_data(f, extra_info=extra_info)
+
+# you can also pass file bytes directly
+with open(f"./{file_name}", "rb") as f:
+    file_bytes = f.read()
+    # must provide extra_info with file_name key with passing file bytes
+    documents = parser.load_data(file_bytes, extra_info=extra_info)
+```
+
+## Using with `SimpleDirectoryReader`
+
+You can also integrate the parser as the default PDF loader in `SimpleDirectoryReader`:
+
+```python
+import nest_asyncio
+
+nest_asyncio.apply()
+
+from llama_parse import LlamaParse
+from llama_index.core import SimpleDirectoryReader
+
+parser = LlamaParse(
+    api_key="llx-...",  # can also be set in your env as LLAMA_CLOUD_API_KEY
+    result_type="markdown",  # "markdown" and "text" are available
+    verbose=True,
+)
+
+file_extractor = {".pdf": parser}
+documents = SimpleDirectoryReader(
+    "./data", file_extractor=file_extractor
+).load_data()
+```
+
+Full documentation for `SimpleDirectoryReader` can be found on the [LlamaIndex Documentation](https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader.html).
+
+## Examples
+
+Several end-to-end indexing examples can be found in the examples folder
+
+- [Getting Started](examples/demo_basic.ipynb)
+- [Advanced RAG Example](examples/demo_advanced.ipynb)
+- [Raw API Usage](examples/demo_api.ipynb)
+
+## Documentation
+
+[https://docs.cloud.llamaindex.ai/](https://docs.cloud.llamaindex.ai/)
+
+## Terms of Service
+
+See the [Terms of Service Here](./TOS.pdf).
+
+## Get in Touch (LlamaCloud)
+
+LlamaParse is part of LlamaCloud, our e2e enterprise RAG platform that provides out-of-the-box, production-ready connectors, indexing, and retrieval over your complex data sources. We offer SaaS and VPC options.
+
+LlamaCloud is currently available via waitlist (join by [creating an account](https://cloud.llamaindex.ai/)). If you're interested in state-of-the-art quality and in centralizing your RAG efforts, come [get in touch with us](https://www.llamaindex.ai/contact).
diff --git a/llama_cloud_services/parse/__init__.py b/llama_cloud_services/parse/__init__.py
new file mode 100644
index 0000000..4499d16
--- /dev/null
+++ b/llama_cloud_services/parse/__init__.py
@@ -0,0 +1,3 @@
+from llama_cloud_services.parse.base import LlamaParse, ResultType
+
+__all__ = ["LlamaParse", "ResultType"]
diff --git a/llama_parse/base.py b/llama_cloud_services/parse/base.py
similarity index 99%
rename from llama_parse/base.py
rename to llama_cloud_services/parse/base.py
index 3e19260..52f4c79 100644
--- a/llama_parse/base.py
+++ b/llama_cloud_services/parse/base.py
@@ -18,7 +18,7 @@
 from llama_index.core.readers.file.base import get_default_fs
 from llama_index.core.schema import Document
 
-from llama_parse.utils import (
+from llama_cloud_services.parse.utils import (
     SUPPORTED_FILE_TYPES,
     ResultType,
     nest_asyncio_err,
diff --git a/llama_parse/cli/__init__.py b/llama_cloud_services/parse/cli/__init__.py
similarity index 100%
rename from llama_parse/cli/__init__.py
rename to llama_cloud_services/parse/cli/__init__.py
diff --git a/llama_parse/cli/main.py b/llama_cloud_services/parse/cli/main.py
similarity index 98%
rename from llama_parse/cli/main.py
rename to llama_cloud_services/parse/cli/main.py
index 616c901..43de8bf 100644
--- a/llama_parse/cli/main.py
+++ b/llama_cloud_services/parse/cli/main.py
@@ -5,7 +5,7 @@
 from pydantic.fields import FieldInfo
 from typing import Any, Callable, List
 
-from llama_parse.base import LlamaParse
+from llama_cloud_services.parse.base import LlamaParse
 
 
 def pydantic_field_to_click_option(name: str, field: FieldInfo) -> click.Option:
diff --git a/llama_parse/utils.py b/llama_cloud_services/parse/utils.py
similarity index 100%
rename from llama_parse/utils.py
rename to llama_cloud_services/parse/utils.py
diff --git a/llama_cloud_services/report/README.md b/llama_cloud_services/report/README.md
new file mode 100644
index 0000000..e69de29
diff --git a/llama_cloud_services/report/__init__.py b/llama_cloud_services/report/__init__.py
new file mode 100644
index 0000000..4f2fe1d
--- /dev/null
+++ b/llama_cloud_services/report/__init__.py
@@ -0,0 +1,4 @@
+from llama_cloud_services.report.report import ReportClient
+from llama_cloud_services.report.base import LlamaReport
+
+__all__ = ["ReportClient", "LlamaReport"]
diff --git a/llama_cloud_services/report/base.py b/llama_cloud_services/report/base.py
new file mode 100644
index 0000000..7fea6fc
--- /dev/null
+++ b/llama_cloud_services/report/base.py
@@ -0,0 +1,269 @@
+import asyncio
+import httpx
+import os
+import io
+from concurrent.futures import ThreadPoolExecutor
+from typing import Optional, List, Union, Any, Coroutine, TypeVar
+from urllib.parse import urljoin
+
+from llama_cloud.types import ReportMetadata
+from llama_cloud_services.report.report import ReportClient
+
+T = TypeVar("T")
+
+
+class LlamaReport:
+    """Client for managing reports and general report operations."""
+
+    def __init__(
+        self,
+        api_key: str | None = None,
+        project_id: str | None = None,
+        organization_id: str | None = None,
+        base_url: str | None = None,
+        timeout: int | None = None,
+        async_httpx_client: httpx.AsyncClient | None = None,
+    ):
+        self.api_key = api_key or os.getenv("LLAMA_CLOUD_API_KEY", None)
+        if not self.api_key:
+            raise ValueError("No API key provided.")
+
+        self.base_url = base_url or os.getenv(
+            "LLAMA_CLOUD_BASE_URL", "https://api.cloud.llamaindex.ai"
+        )
+        self.timeout = timeout or 60
+
+        # Initialize HTTP clients
+        self._aclient = async_httpx_client or httpx.AsyncClient(timeout=self.timeout)
+
+        # Set auth headers
+        self.headers = {
+            "Authorization": f"Bearer {self.api_key}",
+        }
+
+        self.organization_id = organization_id
+        self.project_id = project_id
+        self._client_params = {
+            "timeout": self._aclient.timeout,
+            "headers": self._aclient.headers,
+            "base_url": self._aclient.base_url,
+            "auth": self._aclient.auth,
+            "event_hooks": self._aclient.event_hooks,
+            "cookies": self._aclient.cookies,
+            "max_redirects": self._aclient.max_redirects,
+            "params": self._aclient.params,
+            "trust_env": self._aclient.trust_env,
+        }
+        self._thread_pool = ThreadPoolExecutor(
+            max_workers=min(10, (os.cpu_count() or 1) + 4)
+        )
+
+    @property
+    def aclient(self) -> httpx.AsyncClient:
+        if self._aclient is None:
+            self._aclient = httpx.AsyncClient(**self._client_params)
+        return self._aclient
+
+    def _run_sync(self, coro: Coroutine[Any, Any, T]) -> T:
+        """Run coroutine in a separate thread to avoid event loop issues"""
+
+        # force a new client for this thread/event loop
+        original_client = self._aclient
+        self._aclient = None
+
+        def run_coro() -> T:
+            async def wrapped_coro() -> T:
+                return await coro
+
+            return asyncio.run(wrapped_coro())
+
+        result = self._thread_pool.submit(run_coro).result()
+
+        # restore the original client
+        self._aclient = original_client
+
+        return result
+
+    async def _get_default_project(self) -> str:
+        response = await self.aclient.get(
+            urljoin(str(self.base_url), "/api/v1/projects"), headers=self.headers
+        )
+        response.raise_for_status()
+        projects = response.json()
+        default_project = [p for p in projects if p.get("is_default")]
+        return default_project[0]["id"]
+
+    async def _build_url(
+        self, endpoint: str, extra_params: Optional[List[str]] = None
+    ) -> str:
+        """Helper method to build URLs with common query parameters."""
+        url = urljoin(str(self.base_url), endpoint)
+
+        if not self.project_id:
+            self.project_id = await self._get_default_project()
+
+        query_params = []
+        if self.organization_id:
+            query_params.append(f"organization_id={self.organization_id}")
+        if self.project_id:
+            query_params.append(f"project_id={self.project_id}")
+        if extra_params:
+            query_params.extend([p for p in extra_params if p is not None])
+
+        if query_params:
+            url += "?" + "&".join(query_params)
+
+        return url
+
+    async def acreate_report(
+        self,
+        name: str,
+        template_instructions: Optional[str] = None,
+        template_text: Optional[str] = None,
+        template_file: Optional[Union[str, tuple[str, bytes]]] = None,
+        input_files: Optional[List[Union[str, tuple[str, bytes]]]] = None,
+        existing_retriever_id: Optional[str] = None,
+    ) -> ReportClient:
+        """Create a new report asynchronously."""
+        url = await self._build_url("/api/v1/reports/")
+        open_files: List[io.BufferedReader] = []
+
+        data = {"name": name}
+        if template_instructions:
+            data["template_instructions"] = template_instructions
+        if template_text:
+            data["template_text"] = template_text
+        if existing_retriever_id:
+            data["existing_retriever_id"] = str(existing_retriever_id)
+
+        files: List[tuple[str, io.BufferedReader | bytes]] = []
+        if template_file:
+            if isinstance(template_file, str):
+                open_files.append(open(template_file, "rb"))
+                files.append(("template_file", open_files[-1]))
+            else:
+                files.append(("template_file", template_file[1]))
+
+        if input_files:
+            for f in input_files:
+                if isinstance(f, str):
+                    open_files.append(open(f, "rb"))
+                    files.append(("files", open_files[-1]))
+                else:
+                    files.append(("files", f[1]))
+
+        response = await self.aclient.post(
+            url, headers=self.headers, data=data, files=files
+        )
+        try:
+            response.raise_for_status()
+            report_id = response.json()["id"]
+            return ReportClient(report_id, name, self)
+        except httpx.HTTPStatusError as e:
+            raise ValueError(
+                f"Failed to create report: {e.response.text}\nError Code: {e.response.status_code}"
+            )
+        finally:
+            for open_file in open_files:
+                open_file.close()
+
+    def create_report(
+        self,
+        name: str,
+        template_instructions: Optional[str] = None,
+        template_text: Optional[str] = None,
+        template_file: Optional[Union[str, tuple[str, bytes]]] = None,
+        input_files: Optional[List[Union[str, tuple[str, bytes]]]] = None,
+        existing_retriever_id: Optional[str] = None,
+    ) -> ReportClient:
+        """Create a new report."""
+        return self._run_sync(
+            self.acreate_report(
+                name=name,
+                template_instructions=template_instructions,
+                template_text=template_text,
+                template_file=template_file,
+                input_files=input_files,
+                existing_retriever_id=existing_retriever_id,
+            )
+        )
+
+    async def alist_reports(
+        self, state: Optional[str] = None, limit: int = 100, offset: int = 0
+    ) -> List[ReportClient]:
+        """List all reports asynchronously."""
+        params = []
+        if state:
+            params.append(f"state={state}")
+        if limit:
+            params.append(f"limit={limit}")
+        if offset:
+            params.append(f"offset={offset}")
+
+        url = await self._build_url(
+            "/api/v1/reports/list",
+            extra_params=params,
+        )
+
+        response = await self.aclient.get(url, headers=self.headers)
+        response.raise_for_status()
+        data = response.json()
+
+        return [
+            ReportClient(r["report_id"], r["name"], self)
+            for r in data["report_responses"]
+        ]
+
+    def list_reports(
+        self, state: Optional[str] = None, limit: int = 100, offset: int = 0
+    ) -> List[ReportClient]:
+        """Synchronous wrapper for listing reports."""
+        return self._run_sync(self.alist_reports(state, limit, offset))
+
+    async def aget_report(self, report_id: str) -> ReportClient:
+        """Get a Report instance for working with a specific report."""
+        url = await self._build_url(f"/api/v1/reports/{report_id}")
+
+        response = await self.aclient.get(url, headers=self.headers)
+        response.raise_for_status()
+        data = response.json()
+
+        return ReportClient(data["report_id"], data["name"], self)
+
+    def get_report(self, report_id: str) -> ReportClient:
+        """Synchronous wrapper for getting a report."""
+        return self._run_sync(self.aget_report(report_id))
+
+    async def aget_report_metadata(self, report_id: str) -> ReportMetadata:
+        """Get metadata for a specific report asynchronously.
+
+        Returns:
+            dict containing:
+            - id: Report ID
+            - name: Report name
+            - state: Current report state
+            - report_metadata: Additional metadata
+            - template_file: Name of template file if used
+            - template_instructions: Template instructions if provided
+            - input_files: List of input file names
+        """
+        url = await self._build_url(f"/api/v1/reports/{report_id}/metadata")
+
+        response = await self.aclient.get(url, headers=self.headers)
+        response.raise_for_status()
+        return ReportMetadata(**response.json())
+
+    def get_report_metadata(self, report_id: str) -> ReportMetadata:
+        """Synchronous wrapper for getting report metadata."""
+        return self._run_sync(self.aget_report_metadata(report_id))
+
+    async def adelete_report(self, report_id: str) -> None:
+        """Delete a specific report asynchronously."""
+        url = await self._build_url(f"/api/v1/reports/{report_id}")
+
+        response = await self.aclient.delete(url, headers=self.headers)
+        response.raise_for_status()
+
+    def delete_report(self, report_id: str) -> None:
+        """Synchronous wrapper for deleting a report."""
+        return self._run_sync(self.adelete_report(report_id))
diff --git a/llama_cloud_services/report/report.py b/llama_cloud_services/report/report.py
new file mode 100644
index 0000000..2ff3f10
--- /dev/null
+++ b/llama_cloud_services/report/report.py
@@ -0,0 +1,404 @@
+import asyncio
+import httpx
+import time
+from typing import Optional, List, Literal, TYPE_CHECKING
+from dataclasses import dataclass
+from datetime import datetime
+from enum import Enum
+
+from llama_cloud.types import (
+    ReportEventItemEventData_Progress,
+    ReportMetadata,
+    EditSuggestion,
+    ReportResponse,
+    ReportPlan,
+    ReportBlock,
+    ReportPlanBlock,
+    Report,
+)
+
+if TYPE_CHECKING:
+    from llama_cloud_services.report.base import LlamaReport
+
+
+class MessageRole(str, Enum):
+    USER = "user"
+    ASSISTANT = "assistant"
+
+
+@dataclass
+class Message:
+    role: MessageRole
+    content: str
+    timestamp: datetime
+
+
+@dataclass
+class EditAction:
+    block_idx: int
+    old_content: str
+    new_content: Optional[str]
+    action: Literal["approved", "rejected"]
+    timestamp: datetime
+
+
+DEFAULT_POLL_INTERVAL = 5
+DEFAULT_TIMEOUT = 600
+
+
+class ReportClient:
+    """Client for operations on a specific report."""
+
+    def __init__(self, report_id: str, name: str, parent_client: "LlamaReport"):
+        self.report_id = report_id
+        self.name = name
+        self._client = parent_client
+        self._headers = parent_client.headers
+        self._run_sync = parent_client._run_sync
+        self._build_url = parent_client._build_url
+        self.chat_history: List[Message] = []
+        self.edit_history: List[EditAction] = []
+
+    @property
+    def aclient(self) -> httpx.AsyncClient:
+        return self._client.aclient
+
+    def __str__(self) -> str:
+        return f"Report(id={self.report_id}, name={self.name})"
+
+    def __repr__(self) -> str:
+        return f"Report(id={self.report_id}, name={self.name})"
+
+    def _get_block_content(self, block: ReportBlock | ReportPlanBlock) -> str:
+        if isinstance(block, ReportBlock):
+            return block.template
+        elif isinstance(block, ReportPlanBlock):
+            return block.block.template
+        else:
+            raise ValueError(f"Invalid block type: {type(block)}")
+
+    def _get_block_idx(self, block: ReportBlock | ReportPlanBlock) -> int:
+        if isinstance(block, ReportBlock):
+            return block.idx
+        elif isinstance(block, ReportPlanBlock):
+            return block.block.idx
+        else:
+            raise ValueError(f"Invalid block type: {type(block)}")
+
+    async def aget(self, version: Optional[int] = None) -> ReportResponse:
+        """Get this report's details asynchronously."""
+        extra_params = []
+        if version is not None:
+            extra_params.append(f"version={version}")
+
+        url = await self._build_url(f"/api/v1/reports/{self.report_id}", extra_params)
+
+        response = await self.aclient.get(url, headers=self._headers)
+        response.raise_for_status()
+        return ReportResponse(**response.json())
+
+    def get(self, version: Optional[int] = None) -> ReportResponse:
+        """Synchronous wrapper for getting this report's details."""
+        return self._run_sync(self.aget(version))
+
+    async def aupdate_plan(
+        self,
+        action: Literal["approve", "reject", "edit"],
+        updated_plan: Optional[dict] = None,
+    ) -> ReportResponse:
+        """Update this report's plan asynchronously."""
+        if action == "edit" and not updated_plan:
+            raise ValueError("updated_plan is required when action is 'edit'")
+
+        url = await self._build_url(
+            f"/api/v1/reports/{self.report_id}/plan", [f"action={action}"]
+        )
+
+        data = None
+        if updated_plan:
+            data = {"updated_plan": updated_plan}
+
+        response = await self.aclient.patch(url, headers=self._headers, json=data)
+        response.raise_for_status()
+        return ReportResponse(**response.json())
+
+    def update_plan(
+        self,
+        action: Literal["approve", "reject", "edit"],
+        updated_plan: Optional[dict] = None,
+    ) -> ReportResponse:
+        """Synchronous wrapper for updating this report's plan."""
+        return self._run_sync(self.aupdate_plan(action, updated_plan))
+
+    async def asuggest_edits(
+        self,
+        user_query: str,
+        auto_history: bool = True,
+        chat_history: Optional[List[dict]] = None,
+    ) -> List[EditSuggestion]:
+        """Get AI suggestions for edits to this report asynchronously.
+
+        Args:
+            user_query: The user's request/question about what to edit
+            auto_history: Whether to automatically add the user's message to the chat history
+            chat_history:
+                A list of chat messages to include in the chat history.
+                The format being a list of dictionaries with "role" and "content" keys.
+        """
+        # Add user message to history
+        self.chat_history.append(
+            Message(role=MessageRole.USER, content=user_query, timestamp=datetime.now())
+        )
+
+        # Format chat history with edit summaries
+        chat_history_dicts = []
+        for msg in self.chat_history[:-1]:  # Exclude current message
+            content = msg.content
+            if msg.role == MessageRole.USER:
+                # Add edit summary for user messages
+                edit_summary = self._get_edit_summary_after_message(msg.timestamp)
+                if edit_summary:
+                    content = f"{content}\n\nActions taken:\n{edit_summary}"
+
+            chat_history_dicts.append({"role": msg.role.value, "content": content})
+
+        # decide whether to include chat history or not
+        if chat_history:
+            chat_history_dicts = chat_history
+        elif auto_history:
+            chat_history_dicts = chat_history_dicts
+        else:
+            chat_history_dicts = []
+
+        # Make the API call
+        url = await self._build_url(f"/api/v1/reports/{self.report_id}/suggest_edits")
+        data = {"user_query": user_query, "chat_history": chat_history_dicts}
+
+        response = await self.aclient.post(url, headers=self._headers, json=data)
+        response.raise_for_status()
+        suggestions = response.json()
+        suggestions = [EditSuggestion(**suggestion) for suggestion in suggestions]
+
+        # Add assistant response to history
+        if suggestions:
+            for suggestion in suggestions:
+                self.chat_history.append(
+                    Message(
+                        role=MessageRole.ASSISTANT,
+                        content=suggestion.justification,
+                        timestamp=datetime.now(),
+                    )
+                )
+
+        return suggestions
+
+    def suggest_edits(
+        self,
+        user_query: str,
+        auto_history: bool = True,
+        chat_history: Optional[List[dict]] = None,
+    ) -> List[EditSuggestion]:
+        """Synchronous wrapper for getting edit suggestions."""
+        return self._run_sync(
+            self.asuggest_edits(user_query, auto_history, chat_history)
+        )
+
+    async def await_completion(
+        self, timeout: int = DEFAULT_TIMEOUT, poll_interval: int = DEFAULT_POLL_INTERVAL
+    ) -> Report:
+        """Wait for this report to complete processing."""
+        start_time = time.time()
+        while True:
+            report_response = await self.aget()
+            status = report_response.status
+
+            if status == "completed":
+                return report_response.report
+            elif status == "error":
+                events = await self.aget_events()
+                raise ValueError(f"Report entered error state: {events[-1].msg}")
+            elif time.time() - start_time > timeout:
+                raise TimeoutError(f"Report did not complete within {timeout} seconds")
+
+            await asyncio.sleep(poll_interval)
+
+    def wait_for_completion(
+        self, timeout: int = DEFAULT_TIMEOUT, poll_interval: int = DEFAULT_POLL_INTERVAL
+    ) -> Report:
+        """Synchronous wrapper for awaiting report completion."""
+        return self._run_sync(self.await_completion(timeout, poll_interval))
+
+    async def await_for_plan(
+        self, timeout: int = DEFAULT_TIMEOUT, poll_interval: int = DEFAULT_POLL_INTERVAL
+    ) -> ReportPlan:
+        """Wait for this report's plan to be ready for review."""
+        start_time = time.time()
+        while True:
+            report_metadata = await self.aget_metadata()
+            state = report_metadata.state
+
+            if state == "waiting_approval":
+                report_response = await self.aget()
+                return report_response.plan
+            elif state == "error":
+                events = await self.aget_events()
+                raise ValueError(f"Report entered error state: {events[-1].msg}")
+            elif time.time() - start_time > timeout:
+                raise TimeoutError(f"Plan was not ready within {timeout} seconds")
+
+            await asyncio.sleep(poll_interval)
+
+    def wait_for_plan(
+        self, timeout: int = DEFAULT_TIMEOUT, poll_interval: int = DEFAULT_POLL_INTERVAL
+    ) -> ReportPlan:
+        """Synchronous wrapper for awaiting plan readiness."""
+        return self._run_sync(self.await_for_plan(timeout, poll_interval))
+
+    async def aget_metadata(self) -> ReportMetadata:
+        """Get this report's metadata asynchronously."""
+        return await self._client.aget_report_metadata(self.report_id)
+
+    def get_metadata(self) -> ReportMetadata:
+        """Synchronous wrapper for getting this report's metadata."""
+        return self._run_sync(self.aget_metadata())
+
+    async def adelete(self) -> None:
+        """Delete this report asynchronously."""
+        return await self._client.adelete_report(self.report_id)
+
+    def delete(self) -> None:
+        """Synchronous wrapper for deleting this report."""
+        return self._run_sync(self.adelete())
+
+    async def aaccept_edit(self, suggestion: EditSuggestion) -> None:
+        """Accept a suggested edit.
+
+        Args:
+            suggestion: The EditSuggestion to accept, typically from suggest_edits()
+        """
+        # Get current report content
+        report = await self.aget()
+
+        # Track the edit
+        for edit_block in suggestion.blocks:
+            old_content = self._get_block_content(
+                next(
+                    b for b in report.blocks if b.idx == self._get_block_idx(edit_block)
+                )
+            )
+            new_content = self._get_block_content(edit_block.block)
+            self.edit_history.append(
+                EditAction(
+                    block_idx=self._get_block_idx(edit_block),
+                    old_content=old_content,
+                    new_content=new_content,
+                    action="approved",
+                    timestamp=datetime.now(),
+                )
+            )
+
+            # Update the specific block
+            for block in report.blocks:
+                if block.idx == self._get_block_idx(edit_block):
+                    block.template = new_content
+                    break
+
+        # Update the report
+        url = await self._build_url(f"/api/v1/reports/{self.report_id}")
+        await self.aclient.patch(
+            url, headers=self._headers, json={"content": report.model_dump()}
+        )
+
+    def accept_edit(self, suggestion: EditSuggestion) -> None:
+        """Synchronous wrapper for accepting an edit."""
+        return self._run_sync(self.aaccept_edit(suggestion))
+
+    async def areject_edit(self, suggestion: EditSuggestion) -> None:
+        """Reject a suggested edit.
+
+        Args:
+            suggestion: The EditSuggestion to reject, typically from suggest_edits()
+        """
+        # Track the rejections
+        for edit_block in suggestion.blocks:
+            self.edit_history.append(
+                EditAction(
+                    block_idx=self._get_block_idx(edit_block),
+                    old_content=self._get_block_content(edit_block),
+                    new_content=None,
+                    action="rejected",
+                    timestamp=datetime.now(),
+                )
+            )
+
+    def reject_edit(self, suggestion: EditSuggestion) -> None:
+        """Synchronous wrapper for rejecting an edit."""
+        return self._run_sync(self.areject_edit(suggestion))
+
+    def _get_edit_summary_after_message(
+        self, message_timestamp: datetime
+    ) -> Optional[str]:
+        """Get a summary of edits that occurred after a specific message."""
+        relevant_edits = [
+            edit for edit in self.edit_history if edit.timestamp > message_timestamp
+        ]
+
+        if not relevant_edits:
+            return None
+
+        approved = [edit for edit in relevant_edits if edit.action == "approved"]
+        rejected = [edit for edit in relevant_edits if edit.action == "rejected"]
+
+        summary = []
+
+        if approved:
+            summary.append("Approved edits:")
+            for edit in approved:
+                summary.append(
+                    f'Block {edit.block_idx}: "{edit.old_content}" -> "{edit.new_content}"'
+                )
+
+        if rejected:
+            if approved:  # Add spacing if we had approved edits
+                summary.append("")
+            summary.append("Rejected edits:")
+            for edit in rejected:
+                summary.append(f'Block {edit.block_idx}: "{edit.old_content}"')
+
+        return "\n".join(summary)
+
+    async def aget_events(
+        self, last_sequence: Optional[int] = None
+    ) -> List[ReportEventItemEventData_Progress]:
+        """Get all events for this report asynchronously.
+
+        Args:
+            last_sequence: If provided, only get events after this sequence number
+
+        Returns:
+            List of ReportEvent objects
+        """
+        extra_params = []
+        if last_sequence is not None:
+            extra_params.append(f"last_sequence={last_sequence}")
+
+        url = await self._build_url(
+            f"/api/v1/reports/{self.report_id}/events", extra_params
+        )
+
+        response = await self.aclient.get(url, headers=self._headers)
+        response.raise_for_status()
+        progress_events = []
+        for event in response.json():
+            if event["event_type"] == "progress":
+                progress_events.append(
+                    ReportEventItemEventData_Progress(**event["event_data"])
+                )
+
+        return progress_events
+
+    def get_events(
+        self, last_sequence: Optional[int] = None
+    ) -> List[ReportEventItemEventData_Progress]:
+        """Synchronous wrapper for getting report events."""
+        return self._run_sync(self.aget_events(last_sequence))
diff --git a/llama_parse/README.md b/llama_parse/README.md
new file mode 100644
index 0000000..68f5332
--- /dev/null
+++ b/llama_parse/README.md
@@ -0,0 +1,165 @@
+# LlamaParse
+
+[![PyPI - Downloads](https://img.shields.io/pypi/dm/llama-parse)](https://pypi.org/project/llama-parse/)
+[![GitHub contributors](https://img.shields.io/github/contributors/run-llama/llama_parse)](https://github.com/run-llama/llama_parse/graphs/contributors)
+[![Discord](https://img.shields.io/discord/1059199217496772688)](https://discord.gg/dGcwcsnxhU)
+
+LlamaParse is a **GenAI-native document parser** that can parse complex document data for any downstream LLM use case (RAG, agents).
+
+It is really good at the following:
+
+- ✅ **Broad file type support**: Parsing a variety of unstructured file types (.pdf, .pptx, .docx, .xlsx, .html) with text, tables, visual elements, weird layouts, and more.
+- ✅ **Table recognition**: Parsing embedded tables accurately into text and semi-structured representations.
+- ✅ **Multimodal parsing and chunking**: Extracting visual elements (images/diagrams) into structured formats and return image chunks using the latest multimodal models.
+- ✅ **Custom parsing**: Input custom prompt instructions to customize the output the way you want it.
+
+LlamaParse directly integrates with [LlamaIndex](https://github.com/run-llama/llama_index).
+
+The free plan is up to 1000 pages a day. Paid plan is free 7k pages per week + 0.3c per additional page by default. There is a sandbox available to test the API [**https://cloud.llamaindex.ai/parse ↗**](https://cloud.llamaindex.ai/parse).
+
+Read below for some quickstart information, or see the [full documentation](https://docs.cloud.llamaindex.ai/).
+
+If you're a company interested in enterprise RAG solutions, and/or high volume/on-prem usage of LlamaParse, come [talk to us](https://www.llamaindex.ai/contact).
+
+## Getting Started
+
+First, login and get an api-key from [**https://cloud.llamaindex.ai/api-key ↗**](https://cloud.llamaindex.ai/api-key).
+
+Then, make sure you have the latest LlamaIndex version installed.
+
+**NOTE:** If you are upgrading from v0.9.X, we recommend following our [migration guide](https://pretty-sodium-5e0.notion.site/v0-10-0-Migration-Guide-6ede431dcb8841b09ea171e7f133bd77), as well as uninstalling your previous version first.
+
+```
+pip uninstall llama-index  # run this if upgrading from v0.9.x or older
+pip install -U llama-index --upgrade --no-cache-dir --force-reinstall
+```
+
+Lastly, install the package:
+
+`pip install llama-parse`
+
+Now you can parse your first PDF file using the command line interface. Use the command `llama-parse [file_paths]`. See the help text with `llama-parse --help`.
+
+```bash
+export LLAMA_CLOUD_API_KEY='llx-...'
+
+# output as text
+llama-parse my_file.pdf --result-type text --output-file output.txt
+
+# output as markdown
+llama-parse my_file.pdf --result-type markdown --output-file output.md
+
+# output as raw json
+llama-parse my_file.pdf --output-raw-json --output-file output.json
+```
+
+You can also create simple scripts:
+
+```python
+import nest_asyncio
+
+nest_asyncio.apply()
+
+from llama_parse import LlamaParse
+
+parser = LlamaParse(
+    api_key="llx-...",  # can also be set in your env as LLAMA_CLOUD_API_KEY
+    result_type="markdown",  # "markdown" and "text" are available
+    num_workers=4,  # if multiple files passed, split in `num_workers` API calls
+    verbose=True,
+    language="en",  # Optionally you can define a language, default=en
+)
+
+# sync
+documents = parser.load_data("./my_file.pdf")
+
+# sync batch
+documents = parser.load_data(["./my_file1.pdf", "./my_file2.pdf"])
+
+# async
+documents = await parser.aload_data("./my_file.pdf")
+
+# async batch
+documents = await parser.aload_data(["./my_file1.pdf", "./my_file2.pdf"])
+```
+
+## Using with file object
+
+You can parse a file object directly:
+
+```python
+import nest_asyncio
+
+nest_asyncio.apply()
+
+from llama_parse import LlamaParse
+
+parser = LlamaParse(
+    api_key="llx-...",  # can also be set in your env as LLAMA_CLOUD_API_KEY
+    result_type="markdown",  # "markdown" and "text" are available
+    num_workers=4,  # if multiple files passed, split in `num_workers` API calls
+    verbose=True,
+    language="en",  # Optionally you can define a language, default=en
+)
+
+file_name = "my_file1.pdf"
+extra_info = {"file_name": file_name}
+
+with open(f"./{file_name}", "rb") as f:
+    # must provide extra_info with file_name key with passing file object
+    documents = parser.load_data(f, extra_info=extra_info)
+
+# you can also pass file bytes directly
+with open(f"./{file_name}", "rb") as f:
+    file_bytes = f.read()
+    # must provide extra_info with file_name key with passing file bytes
+    documents = parser.load_data(file_bytes, extra_info=extra_info)
+```
+
+## Using with `SimpleDirectoryReader`
+
+You can also integrate the parser as the default PDF loader in `SimpleDirectoryReader`:
+
+```python
+import nest_asyncio
+
+nest_asyncio.apply()
+
+from llama_parse import LlamaParse
+from llama_index.core import SimpleDirectoryReader
+
+parser = LlamaParse(
+    api_key="llx-...",  # can also be set in your env as LLAMA_CLOUD_API_KEY
+    result_type="markdown",  # "markdown" and "text" are available
+    verbose=True,
+)
+
+file_extractor = {".pdf": parser}
+documents = SimpleDirectoryReader(
+    "./data", file_extractor=file_extractor
+).load_data()
+```
+
+Full documentation for `SimpleDirectoryReader` can be found on the [LlamaIndex Documentation](https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader.html).
+
+## Examples
+
+Several end-to-end indexing examples can be found in the examples folder
+
+- [Getting Started](examples/demo_basic.ipynb)
+- [Advanced RAG Example](examples/demo_advanced.ipynb)
+- [Raw API Usage](examples/demo_api.ipynb)
+
+## Documentation
+
+[https://docs.cloud.llamaindex.ai/](https://docs.cloud.llamaindex.ai/)
+
+## Terms of Service
+
+See the [Terms of Service Here](./TOS.pdf).
+
+## Get in Touch (LlamaCloud)
+
+LlamaParse is part of LlamaCloud, our e2e enterprise RAG platform that provides out-of-the-box, production-ready connectors, indexing, and retrieval over your complex data sources. We offer SaaS and VPC options.
+
+LlamaCloud is currently available via waitlist (join by [creating an account](https://cloud.llamaindex.ai/)). If you're interested in state-of-the-art quality and in centralizing your RAG efforts, come [get in touch with us](https://www.llamaindex.ai/contact).
diff --git a/llama_parse/__init__.py b/llama_parse/__init__.py
deleted file mode 100644
index d62b759..0000000
--- a/llama_parse/__init__.py
+++ /dev/null
@@ -1,3 +0,0 @@
-from llama_parse.base import LlamaParse, ResultType
-
-__all__ = ["LlamaParse", "ResultType"]
diff --git a/llama_parse/llama_parse/__init__.py b/llama_parse/llama_parse/__init__.py
new file mode 100644
index 0000000..bf34cb5
--- /dev/null
+++ b/llama_parse/llama_parse/__init__.py
@@ -0,0 +1,3 @@
+from llama_cloud_services.parse import LlamaParse, ResultType
+
+__all__ = ["LlamaParse", "ResultType"]
diff --git a/llama_parse/llama_parse/base.py b/llama_parse/llama_parse/base.py
new file mode 100644
index 0000000..1d63880
--- /dev/null
+++ b/llama_parse/llama_parse/base.py
@@ -0,0 +1,19 @@
+from llama_cloud_services.parse.base import (
+    LlamaParse,
+    ResultType,
+    FileInput,
+    _DEFAULT_SEPARATOR,
+    JOB_RESULT_URL,
+    JOB_STATUS_ROUTE,
+    JOB_UPLOAD_ROUTE,
+)
+
+__all__ = [
+    "LlamaParse",
+    "ResultType",
+    "FileInput",
+    "_DEFAULT_SEPARATOR",
+    "JOB_RESULT_URL",
+    "JOB_STATUS_ROUTE",
+    "JOB_UPLOAD_ROUTE",
+]
diff --git a/llama_parse/llama_parse/cli/__init__.py b/llama_parse/llama_parse/cli/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/llama_parse/llama_parse/cli/main.py b/llama_parse/llama_parse/cli/main.py
new file mode 100644
index 0000000..11d7847
--- /dev/null
+++ b/llama_parse/llama_parse/cli/main.py
@@ -0,0 +1,4 @@
+from llama_cloud_services.parse.cli.main import parse
+
+if __name__ == "__main__":
+    parse()
diff --git a/llama_parse/llama_parse/utils.py b/llama_parse/llama_parse/utils.py
new file mode 100644
index 0000000..39b0165
--- /dev/null
+++ b/llama_parse/llama_parse/utils.py
@@ -0,0 +1,11 @@
+from llama_cloud_services.parse.utils import (
+    SUPPORTED_FILE_TYPES,
+    Language,
+    ResultType,
+)
+
+__all__ = [
+    "SUPPORTED_FILE_TYPES",
+    "Language",
+    "ResultType",
+]
diff --git a/llama_parse/pyproject.toml b/llama_parse/pyproject.toml
new file mode 100644
index 0000000..4463f07
--- /dev/null
+++ b/llama_parse/pyproject.toml
@@ -0,0 +1,24 @@
+[build-system]
+requires = ["poetry-core"]
+build-backend = "poetry.core.masonry.api"
+
+[tool.poetry]
+name = "llama-parse"
+version = "0.5.21"
+description = "Parse files into RAG-Optimized formats."
+authors = ["Logan Markewich <logan@llamaindex.ai>"]
+license = "MIT"
+readme = "README.md"
+packages = [{include = "llama_parse"}]
+
+[tool.poetry.dependencies]
+python = ">=3.9,<4.0"
+llama-cloud-services = "*"
+
+[tool.poetry.group.dev.dependencies]
+pytest = "^8.0.0"
+pytest-asyncio = "*"
+ipykernel = "^6.29.0"
+
+[tool.poetry.scripts]
+llama-parse = "llama_parse.cli.main:parse"
diff --git a/pyproject.toml b/pyproject.toml
index cf470da..87919f1 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -2,25 +2,38 @@
 requires = ["poetry-core"]
 build-backend = "poetry.core.masonry.api"
 
+[tool.mypy]
+files = ["llama_cloud_services"]
+python_version = "3.10"
+
 [tool.poetry]
-name = "llama-parse"
-version = "0.5.20"
-description = "Parse files into RAG-Optimized formats."
-authors = ["Logan Markewich <logan@llamaindex.ai>"]
+name = "llama-cloud-services"
+version = "0.1.0"
+description = "Tailored SDK clients for LlamaCloud services."
+authors = ["Logan Markewich <logan@runllama.ai>"]
 license = "MIT"
 readme = "README.md"
-packages = [{include = "llama_parse"}]
+packages = [{include = "llama_cloud_services"}]
 
 [tool.poetry.dependencies]
 python = ">=3.9,<4.0"
 llama-index-core = ">=0.11.0"
+llama-cloud = "^0.1.11"
 pydantic = "!=2.10"
 click = "^8.1.7"
+python-dotenv = "^1.0.1"
+eval-type-backport = {python = "<3.10", version = "^0.2.0"}
 
 [tool.poetry.group.dev.dependencies]
 pytest = "^8.0.0"
 pytest-asyncio = "*"
 ipykernel = "^6.29.0"
+pre-commit = "3.2.0"
+autoevals = "^0.0.114"
+deepdiff = "^8.1.1"
+ipython = "^8.12.3"
+jupyter = "^1.1.1"
+mypy = "^1.14.1"
 
 [tool.poetry.scripts]
-llama-parse = "llama_parse.cli.main:parse"
+llama-parse = "llama_cloud_services.parse.cli.main:parse"
diff --git a/tests/extract/__init__.py b/tests/extract/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/tests/extract/test_benchmark.py b/tests/extract/test_benchmark.py
new file mode 100644
index 0000000..4f3b27e
--- /dev/null
+++ b/tests/extract/test_benchmark.py
@@ -0,0 +1,148 @@
+import os
+import pytest
+from pathlib import Path
+
+from llama_cloud_services.extract import LlamaExtract, ExtractionAgent
+from dotenv import load_dotenv
+from time import perf_counter
+from collections import namedtuple
+import json
+import uuid
+from llama_cloud.types import (
+    ExtractConfig,
+    ExtractMode,
+    LlamaParseParameters,
+    LlamaExtractSettings,
+)
+
+load_dotenv(Path(__file__).parent.parent / ".env.dev", override=True)
+
+
+TEST_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), "data")
+# Get configuration from environment
+LLAMA_CLOUD_API_KEY = os.getenv("LLAMA_CLOUD_API_KEY")
+LLAMA_CLOUD_BASE_URL = os.getenv("LLAMA_CLOUD_BASE_URL")
+LLAMA_CLOUD_PROJECT_ID = os.getenv("LLAMA_CLOUD_PROJECT_ID")
+
+TestCase = namedtuple(
+    "TestCase", ["name", "schema_path", "config", "input_file", "expected_output"]
+)
+
+
+def get_test_cases():
+    """Get all test cases from TEST_DIR.
+
+    Returns:
+        List[TestCase]: List of test cases
+    """
+    test_cases = []
+
+    for data_type in os.listdir(TEST_DIR):
+        data_type_dir = os.path.join(TEST_DIR, data_type)
+        if not os.path.isdir(data_type_dir):
+            continue
+
+        schema_path = os.path.join(data_type_dir, "schema.json")
+        if not os.path.exists(schema_path):
+            continue
+
+        input_files = []
+
+        for file in os.listdir(data_type_dir):
+            file_path = os.path.join(data_type_dir, file)
+            if (
+                not os.path.isfile(file_path)
+                or file == "schema.json"
+                or file.endswith(".test.json")
+            ):
+                continue
+
+            input_files.append(file_path)
+
+        settings = [
+            ExtractConfig(extraction_mode=ExtractMode.FAST),
+            ExtractConfig(extraction_mode=ExtractMode.ACCURATE),
+        ]
+
+        for input_file in sorted(input_files):
+            base_name = os.path.splitext(os.path.basename(input_file))[0]
+            expected_output = os.path.join(data_type_dir, f"{base_name}.test.json")
+
+            if not os.path.exists(expected_output):
+                continue
+
+            test_name = f"{data_type}/{os.path.basename(input_file)}"
+            for setting in settings:
+                test_cases.append(
+                    TestCase(
+                        name=test_name,
+                        schema_path=schema_path,
+                        input_file=input_file,
+                        config=setting,
+                        expected_output=expected_output,
+                    )
+                )
+
+    return test_cases
+
+
+@pytest.fixture(scope="session")
+def extractor():
+    """Create a single LlamaExtract instance for all tests."""
+    extract = LlamaExtract(
+        api_key=LLAMA_CLOUD_API_KEY,
+        base_url=LLAMA_CLOUD_BASE_URL,
+        project_id=LLAMA_CLOUD_PROJECT_ID,
+        verbose=True,
+    )
+    yield extract
+    # Cleanup thread pool at end of session
+    extract._thread_pool.shutdown()
+
+
+@pytest.fixture
+def extraction_agent(test_case: TestCase, extractor: LlamaExtract):
+    """Fixture to create and cleanup extraction agent for each test."""
+    # Create unique name with random UUID (important for CI to avoid conflicts)
+    unique_id = uuid.uuid4().hex[:8]
+    agent_name = f"{test_case.name}_{unique_id}"
+
+    with open(test_case.schema_path, "r") as f:
+        schema = json.load(f)
+
+    # Clean up any existing agents with this name
+    try:
+        agents = extractor.list_agents()
+        for agent in agents:
+            if agent.name == agent_name:
+                extractor.delete_agent(agent.id)
+    except Exception as e:
+        print(f"Warning: Failed to cleanup existing agent: {str(e)}")
+
+    # Create new agent
+    agent = extractor.create_agent(agent_name, schema, config=test_case.config)
+    yield agent
+
+
+@pytest.mark.skipif(
+    "CI" in os.environ,
+    reason="CI environment is not suitable for benchmarking",
+)
+@pytest.mark.parametrize("test_case", get_test_cases(), ids=lambda x: x.name)
+@pytest.mark.asyncio(loop_scope="session")
+async def test_extraction(
+    test_case: TestCase, extraction_agent: ExtractionAgent
+) -> None:
+    start = perf_counter()
+    result = await extraction_agent._queue_extraction_test(
+        test_case.input_file,
+        extract_settings=LlamaExtractSettings(
+            llama_parse_params=LlamaParseParameters(
+                invalidate_cache=True,
+                do_not_cache=True,
+            )
+        ),
+    )
+    end = perf_counter()
+    print(f"Time taken: {end - start} seconds")
+    print(result)
diff --git a/tests/extract/test_extract_api.py b/tests/extract/test_extract_api.py
new file mode 100644
index 0000000..42c4408
--- /dev/null
+++ b/tests/extract/test_extract_api.py
@@ -0,0 +1,189 @@
+import os
+import pytest
+from pathlib import Path
+from pydantic import BaseModel
+from dotenv import load_dotenv
+
+from llama_cloud_services.extract import LlamaExtract, ExtractionAgent
+
+# Load environment variables
+load_dotenv(Path(__file__).parent.parent / ".env.dev", override=True)
+
+# Get configuration from environment
+LLAMA_CLOUD_API_KEY = os.getenv("LLAMA_CLOUD_API_KEY")
+LLAMA_CLOUD_BASE_URL = os.getenv("LLAMA_CLOUD_BASE_URL")
+LLAMA_CLOUD_PROJECT_ID = os.getenv("LLAMA_CLOUD_PROJECT_ID")
+
+# Skip all tests if API key is not set
+pytestmark = pytest.mark.skipif(
+    not LLAMA_CLOUD_API_KEY, reason="LLAMA_CLOUD_API_KEY not set"
+)
+
+
+# Test data
+class TestSchema(BaseModel):
+    title: str
+    summary: str
+
+
+# Test data paths
+TEST_DIR = Path(__file__).parent / "data"
+TEST_PDF = TEST_DIR / "slide" / "saas_slide.pdf"
+
+
+@pytest.fixture
+def llama_extract():
+    return LlamaExtract(
+        api_key=LLAMA_CLOUD_API_KEY,
+        base_url=LLAMA_CLOUD_BASE_URL,
+        project_id=LLAMA_CLOUD_PROJECT_ID,
+        verbose=True,
+    )
+
+
+@pytest.fixture
+def test_agent_name():
+    return "test-api-agent"
+
+
+@pytest.fixture
+def test_schema_dict():
+    return {
+        "type": "object",
+        "properties": {
+            "title": {"type": "string"},
+            "summary": {"type": "string"},
+        },
+    }
+
+
+@pytest.fixture
+def test_agent(llama_extract, test_agent_name, test_schema_dict, request):
+    """Creates a test agent and cleans it up after the test"""
+    test_id = request.node.nodeid
+    test_hash = hex(hash(test_id))[-8:]
+    base_name = test_agent_name
+
+    base_name = next(
+        (marker.args[0] for marker in request.node.iter_markers("agent_name")),
+        base_name,
+    )
+    name = f"{base_name}_{test_hash}"
+
+    schema = next(
+        (
+            marker.args[0][0] if isinstance(marker.args[0], tuple) else marker.args[0]
+            for marker in request.node.iter_markers("agent_schema")
+        ),
+        test_schema_dict,
+    )
+
+    # Cleanup existing agent
+    try:
+        for agent in llama_extract.list_agents():
+            if agent.name == name:
+                llama_extract.delete_agent(agent.id)
+    except Exception as e:
+        print(f"Warning: Failed to cleanup existing agent: {e}")
+
+    agent = llama_extract.create_agent(name=name, data_schema=schema)
+    yield agent
+
+    # Cleanup after test
+    try:
+        llama_extract.delete_agent(agent.id)
+    except Exception as e:
+        print(f"Warning: Failed to delete agent {agent.id}: {e}")
+
+
+class TestLlamaExtract:
+    def test_init_without_api_key(self):
+        env_backup = os.getenv("LLAMA_CLOUD_API_KEY")
+        del os.environ["LLAMA_CLOUD_API_KEY"]
+        with pytest.raises(ValueError, match="The API key is required"):
+            LlamaExtract(api_key=None, base_url=LLAMA_CLOUD_BASE_URL)
+        os.environ["LLAMA_CLOUD_API_KEY"] = env_backup
+
+    @pytest.mark.agent_name("test-dict-schema-agent")
+    def test_create_agent_with_dict_schema(self, test_agent):
+        assert isinstance(test_agent, ExtractionAgent)
+
+    @pytest.mark.agent_name("test-pydantic-schema-agent")
+    @pytest.mark.agent_schema((TestSchema,))
+    def test_create_agent_with_pydantic_schema(self, test_agent):
+        assert isinstance(test_agent, ExtractionAgent)
+
+    def test_get_agent_by_name(self, llama_extract, test_agent):
+        agent = llama_extract.get_agent(name=test_agent.name)
+        assert isinstance(agent, ExtractionAgent)
+        assert agent.name == test_agent.name
+        assert agent.id == test_agent.id
+        assert agent.data_schema == test_agent.data_schema
+
+    def test_get_agent_by_id(self, llama_extract, test_agent):
+        agent = llama_extract.get_agent(id=test_agent.id)
+        assert isinstance(agent, ExtractionAgent)
+        assert agent.id == test_agent.id
+        assert agent.name == test_agent.name
+        assert agent.data_schema == test_agent.data_schema
+
+    def test_list_agents(self, llama_extract, test_agent):
+        agents = llama_extract.list_agents()
+        assert isinstance(agents, list)
+        assert any(a.id == test_agent.id for a in agents)
+
+
+class TestExtractionAgent:
+    @pytest.mark.asyncio
+    async def test_extract_single_file(self, test_agent):
+        result = await test_agent.aextract(TEST_PDF)
+        assert result.status == "SUCCESS"
+        assert result.data is not None
+        assert isinstance(result.data, dict)
+        assert "title" in result.data
+        assert "summary" in result.data
+
+    def test_sync_extract_single_file(self, test_agent):
+        result = test_agent.extract(TEST_PDF)
+        assert result.status == "SUCCESS"
+        assert result.data is not None
+        assert isinstance(result.data, dict)
+        assert "title" in result.data
+        assert "summary" in result.data
+
+    @pytest.mark.asyncio
+    async def test_extract_multiple_files(self, test_agent):
+        files = [TEST_PDF, TEST_PDF]  # Using same file twice for testing
+        response = await test_agent.aextract(files)
+
+        assert len(response) == 2
+        for result in response:
+            assert result.status == "SUCCESS"
+            assert result.data is not None
+            assert isinstance(result.data, dict)
+            assert "title" in result.data
+            assert "summary" in result.data
+
+    def test_save_agent_updates(
+        self, test_agent: ExtractionAgent, llama_extract: LlamaExtract
+    ):
+        new_schema = {
+            "type": "object",
+            "properties": {
+                "new_field": {"type": "string"},
+                "title": {"type": "string"},
+                "summary": {"type": "string"},
+            },
+        }
+        test_agent.data_schema = new_schema
+        test_agent.save()
+
+        # Verify the update by getting a fresh instance
+        updated_agent = llama_extract.get_agent(name=test_agent.name)
+        assert "new_field" in updated_agent.data_schema["properties"]
+
+    def test_list_extraction_runs(self, test_agent: ExtractionAgent):
+        assert len(test_agent.list_extraction_runs()) == 0
+        test_agent.extract(TEST_PDF)
+        runs = test_agent.list_extraction_runs()
+        assert len(runs) > 0
diff --git a/tests/extract/test_extract_e2e.py b/tests/extract/test_extract_e2e.py
new file mode 100644
index 0000000..e66d47a
--- /dev/null
+++ b/tests/extract/test_extract_e2e.py
@@ -0,0 +1,141 @@
+import os
+import pytest
+from pathlib import Path
+
+from llama_cloud_services.extract import LlamaExtract, ExtractionAgent
+from dotenv import load_dotenv
+from collections import namedtuple
+import json
+import uuid
+from llama_cloud.types import ExtractConfig, ExtractMode
+from deepdiff import DeepDiff
+from tests.util import json_subset_match_score
+
+load_dotenv(Path(__file__).parent.parent / ".env.dev", override=True)
+
+
+TEST_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), "data")
+# Get configuration from environment
+LLAMA_CLOUD_API_KEY = os.getenv("LLAMA_CLOUD_API_KEY")
+LLAMA_CLOUD_BASE_URL = os.getenv("LLAMA_CLOUD_BASE_URL")
+LLAMA_CLOUD_PROJECT_ID = os.getenv("LLAMA_CLOUD_PROJECT_ID")
+
+TestCase = namedtuple(
+    "TestCase", ["name", "schema_path", "config", "input_file", "expected_output"]
+)
+
+
+def get_test_cases():
+    """Get all test cases from TEST_DIR.
+
+    Returns:
+        List[TestCase]: List of test cases
+    """
+    test_cases = []
+
+    for data_type in os.listdir(TEST_DIR):
+        data_type_dir = os.path.join(TEST_DIR, data_type)
+        if not os.path.isdir(data_type_dir):
+            continue
+
+        schema_path = os.path.join(data_type_dir, "schema.json")
+        if not os.path.exists(schema_path):
+            continue
+
+        input_files = []
+
+        for file in os.listdir(data_type_dir):
+            file_path = os.path.join(data_type_dir, file)
+            if (
+                not os.path.isfile(file_path)
+                or file == "schema.json"
+                or file.endswith(".test.json")
+            ):
+                continue
+
+            input_files.append(file_path)
+
+        settings = [
+            ExtractConfig(extraction_mode=ExtractMode.FAST),
+            ExtractConfig(extraction_mode=ExtractMode.ACCURATE),
+        ]
+
+        for input_file in sorted(input_files):
+            base_name = os.path.splitext(os.path.basename(input_file))[0]
+            expected_output = os.path.join(data_type_dir, f"{base_name}.test.json")
+
+            if not os.path.exists(expected_output):
+                continue
+
+            test_name = f"{data_type}/{os.path.basename(input_file)}"
+            for setting in settings:
+                test_cases.append(
+                    TestCase(
+                        name=test_name,
+                        schema_path=schema_path,
+                        input_file=input_file,
+                        config=setting,
+                        expected_output=expected_output,
+                    )
+                )
+
+    return test_cases
+
+
+@pytest.fixture(scope="session")
+def extractor():
+    """Create a single LlamaExtract instance for all tests."""
+    extract = LlamaExtract(
+        api_key=LLAMA_CLOUD_API_KEY,
+        base_url=LLAMA_CLOUD_BASE_URL,
+        project_id=LLAMA_CLOUD_PROJECT_ID,
+        verbose=True,
+    )
+    yield extract
+    # Cleanup thread pool at end of session
+    extract._thread_pool.shutdown()
+
+
+@pytest.fixture
+def extraction_agent(test_case: TestCase, extractor: LlamaExtract):
+    """Fixture to create and cleanup extraction agent for each test."""
+    # Create unique name with random UUID (important for CI to avoid conflicts)
+    unique_id = uuid.uuid4().hex[:8]
+    agent_name = f"{test_case.name}_{unique_id}"
+
+    with open(test_case.schema_path, "r") as f:
+        schema = json.load(f)
+
+    # Clean up any existing agents with this name
+    try:
+        agents = extractor.list_agents()
+        for agent in agents:
+            if agent.name == agent_name:
+                extractor.delete_agent(agent.id)
+    except Exception as e:
+        print(f"Warning: Failed to cleanup existing agent: {str(e)}")
+
+    # Create new agent
+    agent = extractor.create_agent(agent_name, schema, config=test_case.config)
+    yield agent
+
+    # Cleanup after test
+    try:
+        extractor.delete_agent(agent.id)
+    except Exception as e:
+        print(f"Warning: Failed to delete agent {agent.id}: {str(e)}")
+
+
+@pytest.mark.skipif(
+    os.environ.get("LLAMA_CLOUD_API_KEY", "") == "",
+    reason="LLAMA_CLOUD_API_KEY not set",
+)
+@pytest.mark.parametrize("test_case", get_test_cases(), ids=lambda x: x.name)
+def test_extraction(test_case: TestCase, extraction_agent: ExtractionAgent) -> None:
+    result = extraction_agent.extract(test_case.input_file).data
+    with open(test_case.expected_output, "r") as f:
+        expected = json.load(f)
+    # TODO: fix the saas_slide test
+    assert json_subset_match_score(expected, result) > 0.3, DeepDiff(
+        expected, result, ignore_order=True
+    )
diff --git a/tests/extract/util.py b/tests/extract/util.py
new file mode 100644
index 0000000..1a6278c
--- /dev/null
+++ b/tests/extract/util.py
@@ -0,0 +1,37 @@
+from typing import Any
+
+from autoevals.string import Levenshtein
+from autoevals.number import NumericDiff
+
+
+def json_subset_match_score(expected: Any, actual: Any) -> float:
+    """
+    Adapted from autoevals.JsonDiff to only test on the subset of keys within the expected json.
+    """
+    string_scorer = Levenshtein()
+    number_scorer = NumericDiff()
+    if isinstance(expected, dict) and isinstance(actual, dict):
+        if len(expected) == 0 and len(actual) == 0:
+            return 1
+        keys = set(expected.keys())
+        scores = [json_subset_match_score(expected.get(k), actual.get(k)) for k in keys]
+        scores = [s for s in scores if s is not None]
+        return sum(scores) / len(scores)
+    elif isinstance(expected, list) and isinstance(actual, list):
+        if len(expected) == 0 and len(actual) == 0:
+            return 1
+        scores = [json_subset_match_score(e1, e2) for (e1, e2) in zip(expected, actual)]
+        scores = [s for s in scores if s is not None]
+        return sum(scores) / max(len(expected), len(actual))
+    elif isinstance(expected, str) and isinstance(actual, str):
+        return string_scorer.eval(expected, actual).score
+    elif (isinstance(expected, int) or isinstance(expected, float)) and (
+        isinstance(actual, int) or isinstance(actual, float)
+    ):
+        return number_scorer.eval(expected, actual).score
+    elif expected is None and actual is None:
+        return 1
+    elif expected is None or actual is None:
+        return 0
+    else:
+        return 0
diff --git a/tests/parse/__init__.py b/tests/parse/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/tests/test_reader.py b/tests/parse/test_llama_parse.py
similarity index 99%
rename from tests/test_reader.py
rename to tests/parse/test_llama_parse.py
index 3a07b05..131d2cc 100644
--- a/tests/test_reader.py
+++ b/tests/parse/test_llama_parse.py
@@ -4,7 +4,7 @@
 from fsspec.implementations.local import LocalFileSystem
 from httpx import AsyncClient
 
-from llama_parse import LlamaParse
+from llama_cloud_services.parse import LlamaParse
 
 
 @pytest.mark.skipif(
diff --git a/tests/report/__init__.py b/tests/report/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/tests/report/test_llama_report.py b/tests/report/test_llama_report.py
new file mode 100644
index 0000000..55cc22c
--- /dev/null
+++ b/tests/report/test_llama_report.py
@@ -0,0 +1,121 @@
+import os
+import pytest
+import uuid
+from typing import AsyncGenerator
+from pytest_asyncio import fixture as async_fixture
+from llama_cloud_services.report import LlamaReport, ReportClient
+
+# Skip tests if no API key is set
+pytestmark = pytest.mark.skipif(
+    not os.getenv("LLAMA_CLOUD_API_KEY"), reason="No API key provided"
+)
+
+
+@async_fixture(scope="function")
+async def client() -> AsyncGenerator[LlamaReport, None]:
+    """Create a LlamaReport client."""
+    client = LlamaReport()
+    reports_before = await client.alist_reports()
+    reports_before_ids = [r.report_id for r in reports_before]
+    try:
+        yield client
+    finally:
+        # clean up reports
+        reports_after = await client.alist_reports()
+        reports_after_ids = [r.report_id for r in reports_after]
+        for report_id in reports_before_ids:
+            if report_id not in reports_after_ids:
+                await client.adelete_report(report_id)
+
+        await client.aclient.aclose()
+
+
+@pytest.fixture(scope="function")
+def unique_name() -> str:
+    """Generate a unique report name."""
+    return f"test-report-{uuid.uuid4()}"
+
+
+@async_fixture(scope="function")
+async def report(
+    client: LlamaReport, unique_name: str
+) -> AsyncGenerator[ReportClient, None]:
+    """Create a report."""
+    report = await client.acreate_report(
+        name=unique_name,
+        template_text=(
+            "# [Some title]\n\n"
+            " ## TLDR\n"
+            "A quick summary of the paper.\n\n"
+            "## Details\n"
+            "More details about the paper, possible more than one section here.\n"
+        ),
+        input_files=["tests/test_files/paper.md"],
+    )
+    try:
+        yield report
+    finally:
+        await report.adelete()
+
+
+@pytest.mark.asyncio
+async def test_create_and_delete_report(
+    client: LlamaReport, report: ReportClient
+) -> None:
+    """Test basic report creation and deletion."""
+    # Verify the report exists
+    metadata = await report.aget_metadata()
+    assert metadata.name == report.name
+
+    # Test listing reports
+    reports = await client.alist_reports()
+    assert any(r.report_id == report.report_id for r in reports)
+
+    # Test getting report by ID
+    fetched_report = await client.aget_report(report.report_id)
+    assert fetched_report.report_id == report.report_id
+    assert fetched_report.name == report.name
+
+
+@pytest.mark.asyncio
+async def test_report_plan_workflow(report: ReportClient) -> None:
+    """Test the report planning workflow."""
+    # Wait for the plan
+    plan = await report.await_for_plan()
+    assert plan is not None
+
+    # Approve the plan
+    response = await report.aupdate_plan(action="approve")
+    assert response is not None
+
+    # Wait for completion
+    completed_report = await report.await_completion()
+    assert len(completed_report.blocks) > 0
+
+
+@pytest.mark.asyncio
+async def test_report_edit_suggestions(report: ReportClient) -> None:
+    """Test getting and handling edit suggestions."""
+
+    # Wait for the report to be ready
+    completed_report = await report.await_completion()
+    assert len(completed_report.blocks) > 0
+
+    # Get edit suggestions
+    suggestions = await report.asuggest_edits(
+        "Make the text more formal.", auto_history=True
+    )
+    assert len(suggestions) > 0
+
+    # Test accepting an edit
+    await report.aaccept_edit(suggestions[0])
+
+    # Get more suggestions and test rejecting
+    more_suggestions = await report.asuggest_edits(
+        "Add a section about machine learning.", auto_history=True
+    )
+    assert len(more_suggestions) > 0
+    await report.areject_edit(more_suggestions[0])
+
+    # Verify chat history is maintained
+    assert len(report.chat_history) >= 4  # 2 user messages + 2 assistant responses
diff --git a/tests/test_files/images/00382f27-1511-44df-ad92-36bad2cadca1-page_1.jpg b/tests/test_files/images/00382f27-1511-44df-ad92-36bad2cadca1-page_1.jpg
new file mode 100644
index 0000000..56b3dbf
Binary files /dev/null and b/tests/test_files/images/00382f27-1511-44df-ad92-36bad2cadca1-page_1.jpg differ
diff --git a/tests/test_files/paper.md b/tests/test_files/paper.md
new file mode 100644
index 0000000..c14704f
--- /dev/null
+++ b/tests/test_files/paper.md
@@ -0,0 +1,269 @@
+# HEALTH-PARIKSHA: Assessing RAG Models for Health Chatbots in
+
+Varun Gumma♠ Anandhita Raghunath\*♢ Mohit Jain†♠ Sunayana Sitaram†♠
+
+♠Microsoft Corporation ♢University of Washington
+
+varun230999@gmail.com, sunayana.sitaram@microsoft.com
+
+# Abstract
+
+Assessing the capabilities and limitations of large language models (LLMs) has garnered significant interest, yet the evaluation of multiple models in real-world scenarios remains rare. Multilingual evaluation often relies on translated benchmarks, which typically do not capture linguistic and cultural nuances present in the source language. This study provides an extensive assessment of 24 LLMs on real world data collected from Indian patients interacting with a medical chatbot in Indian English and 4 other Indic languages. We employ a uniform Retrieval Augmented Generation framework to generate responses, which are evaluated using both automated techniques and human evaluators on four specific metrics relevant to our application. We find that models vary significantly in their performance and that instruction tuned Indic models do not always perform well on Indic language queries. Further, we empirically show that factual correctness is generally lower for responses to Indic queries compared to English queries. Finally, our qualitative work shows that code-mixed and culturally relevant queries in our dataset pose challenges to evaluated models.
+
+# 1 Introduction
+
+Large Language Models (LLMs) have demonstrated impressive proficiency across various domains. Nonetheless, their full spectrum of capabilities and limitations remains unclear, resulting in unpredictable performance on certain tasks. Additionally, there is now a wide selection of LLMs available. Therefore, evaluation has become crucial for comprehending the internal mechanisms of LLMs and for comparing them against each other.
+
+Despite the importance of evaluation, significant challenges still persist. Many widely-used benchmarks for assessing LLMs are contaminated (Ahuja et al., 2024; Oren et al., 2024; Xu et al., 2024), meaning that they often appear in LLM training data. Some of these benchmarks were originally created for conventional Natural Language Processing tasks and may not fully represent current practical applications of LLMs (Conneau et al., 2018; Pan et al., 2017). Recently, there has been growing interest in assessing LLMs within multilingual and multicultural contexts (Ahuja et al., 2023, 2024; Faisal et al., 2024; Watts et al., 2024; Chiu et al., 2024). Traditionally, these benchmarks were developed by translating English versions into various languages. However, due to the loss of linguistic and cultural context during translation, new benchmarks specific to different languages and cultures are now being created. However, such benchmarks are few in number, and several of the older ones are contaminated in training data (Ahuja et al., 2024; Oren et al., 2024). Thus, there is a need for new benchmarks that can test the abilities of models in real-world multilingual settings.
+
+LLMs are employed in various fields, including critical areas like healthcare. Jin et al. (2024) translate an English healthcare dataset into Spanish, Chinese, and Hindi, and demonstrate that performance declines in these languages compared to English. This highlights the necessity of examining LLMs more thoroughly in multilingual contexts for these important uses.
+
+In this study, we conduct the first comprehensive assessment of multilingual models within a real-world healthcare context. We evaluate responses from 24 multilingual and Indic models using 750 questions posed by users of a health chatbot in five languages (Indian English and four Indic languages). All the models being evaluated function within the same RAG framework, and their outputs are compared to doctor-verified ground truth responses. We evaluate LLM responses on four metrics curated for our application, including factual correctness, semantic similarity, coherence,
+
+- Work done during an internship at Microsoft
+
+† Equal Advising
+and conciseness and present leaderboards for each metric, as well as an overall leaderboard. We use human evaluation and automated methods (LLMs-as-a-judge) to compute these metrics by comparing LLM responses with ground-truth reference responses or assessing the responses in a reference-free manner.
+
+Our results suggest that models vary significantly in their performance, with some smaller models outperforming larger ones. Factual Correctness is generally lower for non-English queries compared to English queries. We observe that instruction-tuned Indic models do not always perform well on Indic language queries. Our dataset contains several instances of code-mixed and culturally-relevant queries, which models sometimes struggle to answer. The contributions of our work are as follows:
+
+- We evaluate 24 models (proprietary as well as open weights) in a healthcare setting using queries provided by patients using a medical chatbot. This guarantees that our dataset is not contaminated in the training data of any of the models we evaluate.
+- We curate a dataset of queries from multilingual users that spans multiple languages. The queries feature language typical of multilingual communities, such as code-switching, which is rarely found in translated datasets, making ours a more realistic dataset for model evaluation.
+- We evaluate several models in an identical RAG setting, making it possible to compare models in a fair manner. The RAG setting is a popular configuration that numerous models are being deployed in for real-world applications.
+- We establish relevant metrics for our application and determine an overall combined metric by consulting domain experts - doctors working on the medical chatbot project.
+- We perform assessments (with and without ground truth references) using LLM-as-a-judge and conduct human evaluations on a subset of the models and data to confirm the validity of the LLM assessment.
+
+# 2 Related Works
+
+# Healthcare Chatbots in India
+
+Within the Indian context, the literature has documented great diversity in health seeking and health communication behaviors based on gender (Das et al., 2018), varying educational status, poor functional literacy, cultural context (Islary, 2018), stigmas (Wang et al.) etc. This diversity in behavior may translate to people’s use of medical chatbots, which are increasingly reaching hundreds of Indian patients at the margins of the healthcare system (Mishra et al., 2023). These bots solicit personal health information directly from patients in their native Indic languages or in Indic English. For example, (Ramjee et al., 2024) find that their CataractBot deployed in Bangalore, India yields patient questions on topics such as surgery, preoperative preparation, diet, exercise, discharge, medication, pain management, etc. Mishra et al. (2023) find that Indian people share “deeply personal questions and concerns about sexual and reproductive health” with their chatbot SnehAI. Yadav et al. (2019) find that queries to chatbots are “embedded deeply into a communities myths and existing belief systems” while (Xiao et al., 2023) note that patients have difficulties finding health information at an appropriate level for them to comprehend. Therefore, LLMs powering medical chatbots in India and other Low and Middle Income Countries are challenged to respond lucidly to medical questions that are asked in ways that may be hyperlocal to patient context. Few works have documented how LLMs react to this linguistic diversity in the medical domain. Our work begins to bridge this gap.
+
+# Multilingual and RAG evaluation
+
+Several previous studies have conducted in-depth evaluation of Multilingual capabilities of LLMs by evaluating across standard tasks (Srivastava et al., 2022; Liang et al., 2023; Ahuja et al., 2023, 2024; Asia et al., 2024; Lai et al., 2023; Robinson et al., 2023), with a common finding that current LLMs only have a limited multilingual capacity. Other works (Watts et al., 2024; Leong et al., 2023) include evaluating LLMs on creative and generative tasks. Salemi and Zamani (2024) state that evaluating RAG models require a joint evaluating of the retrieval and generated output. Recent works such as Chen et al. (2024); Chirkova et al. (2024) benchmark LLMs as RAG models in bilingual and multilingual setups. Lastly, several tools and benchmarks have also been built for automatic evaluation of RAG.
+even in medical domains (Es et al., 2024; Tang and Yang, 2024; Xiong et al., 2024a,b), and we refer the readers to Yu et al. (2024) for such a comprehensive list and survey.
+
+# LLM-based Evaluators
+
+With the advent of large-scale instruction following capabilities in LLMs, automatic evaluations with the help of these models is being preferred (Kim et al., 2024a,b; Liu et al., 2024; Shen et al., 2023; Kocmi and Federmann, 2023). However, it has been shown that it is optimal to assess these evaluations in tandem with human annotations as LLMs can provide inflated scores (Hada et al., 2024b,a; Watts et al., 2024). Other works (Zheng et al., 2023; Watts et al., 2024) have employed GPT-4 alongside human evaluators to leaderboards to assess other LLMs. Ning et al. (2024) proposed an innovative approach using LLMs for peer review, where models evaluate each other’s outputs. However, a recent study by Doddapani et al. (2024) highlighted the limitations of LLM-based evaluators, revealing their inability to reliably detect subtle drops in input quality during evaluations, raising concerns about their precision and dependability for fine-grained assessments. In this work, we use LLM-based evaluators both with and without ground-truth references and also use human evaluation to validate LLM-based evaluation.
+
+# 3 Methodology
+
+In this study, we leveraged a dataset collected from a deployed medical chatbot. Here, we provide an overview of the question dataset, the knowledge base employed for answering those questions, the process for generating responses, and the evaluation framework.
+
+# 3.1 Data
+
+The real-world test data was collected by our collaborators as part of an ongoing research effort that designed and deployed a medical chatbot, hereafter referred to as HEALTHBOT, to patients scheduled for cataract surgery at a large hospital in urban India. An Ethics approval was obtained from our institution prior to conducting this work, and once enrolled in the study and consent was obtained, both the patient and their accompanying family member or attendant were instructed on how to use HEALTHBOT on WhatsApp. Through this instructional phase, they were informed that questions could be asked by voice or by text, in one of 5 languages - English, Hindi, Kannada, Tamil, Telugu. The workflow of chatting with HEALTHBOT was as follows: Patients sent questions through the WhatsApp interface to HEALTHBOT. Their questions were transcribed automatically (using a speech recognition system) and translated (using an off-the-shelf translator) into English if needed, after which GPT-4 was used to produce an initial response by performing RAG on the documents in the knowledge base (KB, see below). This initial response was passed to doctors who reviewed, validated, and if needed, edited the answer. The doctor approved answer is henceforth referred to as the ground truth (GT) response associated with the patient query.
+
+Our evaluation dataset was curated from this data by including all questions sent to HEALTHBOT along with their associated GT response. Exclusion criteria removed exact duplicate questions, those with personally identifying information, and those not relevant to health. Additionally, for this work, we only consider questions to which the GPT-4 answer was directly approved by the expert as the “correct and complete answer" without additional editing on the doctors’ part. The final dataset contained 749 question and GT answer pairs that were sent in to HEALTHBOT between December 2023 to June 2024. In the pool, 666 questions were in English, 19 in Hindi, 27 in Tamil, 14 in Telugu, and 23 in Kannada. Note that, queries written in the script of a specific language were classified as belonging to that language. For code-mixed and Romanized queries, we determined whether they were English or non-English based on the matrix language of the query.
+
+The evaluation dataset consists of queries that (1) have misspelled English words, (2) are code-mixed, (3) represent non-native English, (4) are relevant to the patient’s cultural context and (5) are specific to the patient’s condition. We provide some examples of each of these categories.
+
+Examples of misspelled queries include questions such as “How long should saving not be done after surgery?” where the patient intended to ask about shaving, and “Sarjere is don mam?” which the attendant used to inquire about the patient’s discharge status. Instances of code mixing can be seen in phrases like “Agar operation ke baad pain ho raha hai, to kya karna hai?” meaning “If there is pain after the surgery, what should I do?” in Hindi-English (Hinglish). Other examples include “Can I eat before the kanna operation?” where
+“kanna” means eye in Tamil, and “kanna operation” is a well understood, common way of referring to cataract surgery, and “In how many days can a patient take Karwat?” where “Karwat” means turning over in sleep in Hindi.
+
+# 3.3 Models
+
+Indian English was used in a majority of the English queries, making the phrasing of questions different from what they would be with native English speech. Examples are as follows - “Because I have diabetes sugar problem I am worried much”, “Why to eat light meal only? What comes under light meal?” and “Is the patient should be in dark room after surgery?” Taking a shower was commonly referred to as “taking a bath”, and eye glasses were commonly referred to as “goggles”, “spex” or “spectacles”.
+
+Culturally-relevant questions were also many in number, for example questions about specific foods were asked like “Can he take chapati, Puri etc on the day of surgery?” and “Can I eat non veg after surgery?” (“non-veg” is a term used in Indian English to denote eating meat). Questions about yoga were asked, like “How long after the surgery should the Valsalva maneuver be avoided?” and “Are there any specific yoga poses I can do?”. The notion of a patient’s native place or village was brought up in queries such as “If a person gets operated here and then goes to his native place and if some problem occurs what shall he do?” or “Can she travel by car with AC for 100 kms?”.
+
+# 3.4 Response Generation
+
+We chose 24 models including proprietary multilingual models, as well as Open-weights multilingual and Indic language models for our evaluation. A full list of models can be found in Table 1.
+
+We use the standard Retrieval-Augmented-Generation (RAG) strategy to elicit responses from all the models. Each model is asked to respond to the given query by extracting the appropriate pieces of text from the knowledge-base chunks. During prompting, we segregate the chunks into RAWCHUNKS and KBUPDATECHUNKS symbolizing the data from the standard sources, and the KB updates. Then model is explicitly instructed to prioritize the information from the most latest sources, i.e. the KBUPDATECHUNKS (if they are available). The exact prompt used for generation is provided in Appendix X. Note that each model gets the same RAWCHUNKS and KBUPDATECHUNKS, which are also the same that are given to the GPT-4 model in the HEALTHBOT, based on which the GT responses are verified.
+
+# 3.5 Response Evaluation
+
+We used both human and automated evaluation to evaluate the performance of models in the setup described above. GPT-4o3 was employed as an LLM evaluator. We prompted the model separately to judge each metric, as Hada et al. (2024b,a) show that individual calls reduce interaction and influence among and their evaluations.
+
+# 3.5.1 LLM Evaluation
+
+In consultation with domain experts working on the HEALTHBOT, we curated metrics that are relevant for our application. We limit ourselves to 3 classes (Good - 2, Medium - 1, Bad - 0) for each metric, as a larger number of classes could hurt interpretability and lower LLM-evaluator performance. The prompt used for each of our metrics are available in Appendix A.2, and a general overview is provided below.
+
+1 https://www.trychroma.com
+
+2 https://platform.openai.com/docs/guides/embeddings/embedding-models
+
+3 https://openai.com/index/hello-gpt-4o/
+
+# Models
+
+- GPT-4
+- GPT-4o
+- microsoft/Phi-3.5-MoE-instruct
+- CohereForAI/c4ai-command-r-plus-08-2024
+- Qwen/Qwen2.5-72B-Instruct
+- CohereForAI/aya-23-35B
+- mistralai/Mistral-Large-Instruct-2407
+- google/gemma-2-27b-it
+- meta-llama/Meta-Llama-3.1-70B-Instruct
+- GenVRadmin/llama38bGenZ_Vikas-Merged
+- GenVRadmin/AryaBhatta-GemmaOrca-Merged
+- GenVRadmin/AryaBhatta-GemmaUltra-Merged
+- GenVRadmin/AryaBhatta-GemmaGenZ-Vikas-Merged
+- Telugu-LLM-Labs/Indic-gemma-7b-finetuned-sft-Navarasa-2.0
+- ai4bharat/Airavata
+- Cognitive-Lab/LLama3-Gaja-Hindi-8B-v0.1
+- BhabhaAI/Gajendra-v0.1
+- manishiitg/open-aditi-hi-v4
+- abhinand/tamil-llama-7b-instruct-v0.2
+- abhinand/telugu-llama-7b-instruct-v0.1
+- Telugu-LLM-Labs/Telugu-Llama2-7B-v0-Instruct
+- Tensoic/Kan-Llama-7B-SFT-v0.5
+- Cognitive-Lab/Ambari-7B-Instruct-v0.2
+- GenVRadmin/Llamavaad
+
+# Languages Availability
+
+| All    | Proprietary  |
+| ------ | ------------ |
+| All    | Proprietary  |
+| All    | Open-weights |
+| All    | Open-weights |
+| All    | Open-weights |
+| All    | Open-weights |
+| All    | Open-weights |
+| All    | Open-weights |
+| All    | Indic        |
+| All    | Indic        |
+| All    | Indic        |
+| All    | Indic        |
+| All    | Indic        |
+| En, Hi | Indic        |
+| En, Hi | Indic        |
+| En, Hi | Indic        |
+| En, Hi | Indic        |
+| En, Ta | Indic        |
+| En, Te | Indic        |
+| En, Te | Indic        |
+| En, Ka | Indic        |
+| En, Ka | Indic        |
+| En, Hi | Indic        |
+
+Table 1: List of models tested. “En” for English, “Hi” for Hindi, “Ka” for Kannada, “Ta” for Tamil, “Te” for Telugu, and “All" refers to all the aforementioned languages. All Indic models are open-weights.
+
+# Metrics
+
+- FACTUAL CORRECTNESS (FC): As Doddapa-neni et al. (2024) had shown that LLM-based evaluators fail to identify subtle factual inaccuracies, we curate a separate metric to double-check facts like dates, numbers, procedure and medicine names.
+- SEMANTIC SIMILARITY (SS): Similarly, we formulate another metric to specifically analyse if both the prediction and the ground-truth response convey the same information semantically, especially when they are in different languages.
+- COHERENCE (COH): This metric evaluates if the model was able to stitch together appropriate pieces of information from the three data chunks provided to yield a coherent response.
+- CONCISENESS (CON): Since the knowledge base chunks extracted and provided to the model can be quite large, with important facts embedded at different positions, we build this metric to assess the ability of the model to extract and compress all these bits of information relevant to the query into a crisp response.
+
+# 3.5.2 Human Evaluation
+
+Following previous works (Hada et al., 2024b,a; Watts et al., 2024), we augment the LLM evaluation with human evaluation and draw correlations between the LLM evaluator and human evaluation for a subset of the models (PHI-3.5-MOE-INSTRUCT, MISTRAL-LARGE-INSTRUCT-2407, GPT-4O, META-LLAMA-3.1-70B-INSTRUCT, INDIC-GEMMA-7B-FINETUNED-SFT-NAVARASA-2.0). These models were selected based on results from early automated evaluations, covering a range of scores and representing models of interest.
+
+The human annotators were employed by
+KARYA, a data annotation company and were all native speakers of Indian languages that we evaluated. We selected a sample of 100 queries from English, and all the queries from Indic languages for annotation, yielding a total of 183 queries. Each instance was annotated by one annotator for SEMANTIC SIMILARITY between the model’s response and the GT response provided by the doctor. The annotations began with a briefing about the task and each of them was given a sample test task, and were provided some guidance based on their difficulties and mistakes. Finally, the annotators were asked to evaluate the model response based on the metric4, query, and ground-truth response on a scale of 0 to 2, similar to the LLM-evaluator.
+
+# 4 Results
+
+In this section, we present the outcomes of both the LLM and human evaluations. We begin by examining the average scores across all our metrics including the combined metric for English queries, followed by results for queries in other languages. Next, we examine the ranking of models based on the human and LLM-evaluator, details of which can be found in the Appendix A.1 and find it to be consistently higher than 0.7 on average across all languages and models. This shows the reliability of our LLM-based evaluation for SEMANTIC SIMILARITY which uses the GT response as a reference.
+
+# 4.1 LLM evaluator results
+
+We see from Table 2 that for English, the best performing models is the QWEN2.5-72B-INSTRUCT model across all metrics. Note that it is expected that GPT-4 performs well, as the ground truth responses are based on responses generated by GPT-4. The PHI-3.5-MOE-INSTRUCT model also performs well on all metrics, followed by MISTRAL-LARGE-INSTRUCT-2407 and OPEN-ADITI-HI-V4, which is the only Indic model that performs near the top even for English queries. Surprisingly, the META-LLAMA-3.1-70B-INSTRUCT model performs worse than expected on this task, frequently regurgitating the entire prompt that was provided. In general, all models get higher scores on conciseness and many models do well on coherence.
+
+# 4.2 Comparison of human and LLM evaluators
+
+We perform human evaluation on five models on the SEMANTIC SIMILARITY (SS) task and compare human and LLM evaluation by inspecting the ranking of the models in Appendix A.3. We find that for all languages except Telugu, we get identical rankings of all models. Additionally, we also measure the Percentage Agreement (PA) between the human and LLM-evaluator.
+
+| Model                                                                                                                             | English                                                                                                                           | English                                                                                                                           |     |
+| --------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------- | --- | --- |
+|                                                                                                                                   | gpt-4o                                                                                                                            | Indic-gemma-7b-finetuned-sft-Navarasa-2.0                                                                                         |
+| Mistral-Large-Instruct-2407                                                                                                       | 0.0                                                                                                                               | 0.1                                                                                                                               |
+| Meta-Llama-3.1-70B-Instruct                                                                                                       | 0.2                                                                                                                               | 0.3                                                                                                                               |
+| Phi-3.5-MoE-instruct                                                                                                              | 0.4                                                                                                                               | 0.5                                                                                                                               |
+| Figure 1: Percentage Agreement between human and LLM-evaluators for English. The red line indicates the average PA across models. | Figure 1: Percentage Agreement between human and LLM-evaluators for English. The red line indicates the average PA across models. | Figure 1: Percentage Agreement between human and LLM-evaluators for English. The red line indicates the average PA across models. |     |     |
+
+# 4.3 Qualitative Analysis
+
+One of the authors of the paper performed a qualitative analysis of responses from the evaluated LLMs on 100 selected patient questions. The questions were chosen to cover a range of medical topics and languages. Thematic analysis involved (1) initial familiarization with the queries and associated LLM responses, (2) theme identification where 5 themes were generated and (3) thematic coding where the generated themes were applied to the 100 question-answer pairs. We briefly summarize these results below.
+
+# Table 2: Metric-wise scores for English. The Proprietary, Open-Weights and Indic models are highlighted appropriately. All Indic models are open-weights.
+
+| Model                                     | AGG  | COH  | CON  | FC   | SS   |
+| ----------------------------------------- | ---- | ---- | ---- | ---- | ---- |
+| QWEN2.5-72B-INSTRUCT                      | 1.46 | 1.86 | 1.96 | 1.62 | 1.43 |
+| GPT-4                                     | 1.40 | 1.71 | 1.95 | 1.56 | 1.36 |
+| PHI-3.5-MOE-INSTRUCT                      | 1.29 | 1.65 | 1.93 | 1.43 | 1.22 |
+| MISTRAL-LARGE-INSTRUCT-2407               | 1.29 | 1.60 | 1.95 | 1.42 | 1.24 |
+| OPEN-ADITI-HI-V4                          | 1.27 | 1.69 | 1.85 | 1.37 | 1.22 |
+| LLAMAVAAD                                 | 1.16 | 1.34 | 0.97 | 1.36 | 1.20 |
+| ARYABHATTA-GEMMAGENZ-VIKAS-MERGED         | 1.12 | 1.48 | 1.65 | 1.22 | 1.07 |
+| KAN-LLAMA-7B-SFT-V0.5                     | 1.01 | 1.39 | 1.64 | 1.07 | 0.97 |
+| GEMMA-2-27B-IT                            | 1.00 | 1.28 | 1.88 | 1.07 | 0.91 |
+| ARYABHATTA-GEMMAORCA-MERGED               | 0.97 | 1.32 | 1.62 | 1.03 | 0.92 |
+| LLAMA3-GAJA-HINDI-8B-V0.1                 | 0.91 | 0.63 | 1.65 | 1.09 | 0.98 |
+| GPT-4O                                    | 0.91 | 1.08 | 1.78 | 0.98 | 0.87 |
+| AYA-23-35B                                | 0.91 | 1.09 | 1.65 | 1.00 | 0.83 |
+| GAJENDRA-V0.1                             | 0.88 | 1.21 | 1.38 | 0.93 | 0.85 |
+| C4AI-COMMAND-R-PLUS-08-2024               | 0.82 | 1.15 | 1.48 | 0.85 | 0.74 |
+| TAMIL-LLAMA-7BINSTRUCT-V0.2               | -    | 0.81 | 1.13 | 1.50 | 0.83 |
+| AIRAVATA                                  | 0.80 | 1.03 | 1.38 | 0.85 | 0.78 |
+| AMBARI-7B-INSTRUCTV0.2                    | -    | 0.73 | 0.86 | 1.11 | 0.76 |
+| META-LLAMA-3.1-70B-INSTRUCT               | 0.65 | 0.55 | 1.12 | 0.77 | 0.67 |
+| TELUGU-LLAMA2-7B-V0-INSTRUCT              | 0.51 | 0.60 | 1.12 | 0.53 | 0.53 |
+| LLAMA38BGENZ_VIKAS-MERGED                 | 0.51 | 0.52 | 1.09 | 0.55 | 0.53 |
+| INDIC-GEMMA-7B-FINETUNED-SFT-NAVARASA-2.0 | 0.35 | 0.32 | 0.53 | 0.40 | 0.39 |
+| ARYABHATTA-GEMMAULTRA-MERGED              | 0.32 | 0.38 | 1.19 | 0.31 | 0.27 |
+| TELUGU-LLAMA-7B-INSTRUCTV0.1              | -    | 0.04 | 0.00 | 0.58 | 0.03 |
+
+(1) misspelling of English words, (2) code-mixing, (3) non-native English, (4) relevance to cultural context and (5) specificity to the patient’s condition.
+
+For queries that involve misspellings (such as “saving” and “sarjere” mentioned in Section 3.1), many evaluated LLM were not able to come up with an appropriate response. For the query with the word “saving", responses varied from “The patient should not be saved for more than 15 days after the surgery” to “Saving should not be done after surgery” to “You should not strain to pass motion for 15 days after the surgery. If you are constipated, it is recommended to consult the doctor”. All of these responses deviate from the GPT-4 generated GT, which said “You can have a shave after the cataract surgery. However, you should avoid having a head bath or shampoo for 15 days post-surgery.”
+
+In cases of code mixing and Indian English, LLMs were more robust in their responses than to misspellings. The term “Kanna operation” was well understood by most models, and Hinglish queries were responded to in a manner that mirrored the GT answer. However, less common terms were not understood when mixed with English. The query with the word “Karwat” as mentioned in 3.1 received responses ranging from “you can start cooking after 1 week” to “I’m sorry, but I cannot provide an answer to your question. The information you are seeking is not relevant to cataract surgery or the guidelines for post-operative care” to “be careful when children get near you”. Most of the evaluated LLMs understood the use of “sugar” in reference to diabetes, as well as sentences following different syntax than would be common in native English.
+
+The responses for culturally-relevant questions varied greatly between evaluated LLMs. For example, to the question on appropriateness of chapati and puri on the day of surgery, some LLMs approved, saying “Yes, he can take chapati, Puri etc on the day of cataract surgery” while others were against this, saying “You should have a light meal before the surgery. Avoid heavy or oily foods like chapati and Puri on the day of your cataract surgery.”
+surgery. It’s best to stick to easily digestible foods. If you have any specific dietary concerns, please discuss them with your healthcare team. Questions relating to returning to a “native place” were met with refusals by around half of the evaluated LLMs.
+
+Questions that were specific to the patient’s condition were also responded to in a diverse manner by the evaluated LLMs. For example, the query “Can aztolet20 (atorvastatin and clopidogrel) tablet be taken post surgery” had the GT response “I do not know the answer to your question. If this needs to be answered by a doctor, please schedule a consultation” as there was no mention of this medication in the KB. However, some LLMs approved its use, responding “Yes, you can take the aztolet20 (atorvastatin and clopidogrel) tablet post cataract surgery. However, it is important to follow your doctor’s prescription and instructions” while others responded with mentions of medication that was referred to in the KB, “If you are referring to the IMOL tablet, it is usually taken when you experience pain. However, for other medications, please share the name so I can provide a more accurate answer. Always remember to follow your doctor’s prescription.” Around half refused to answer the question, mirroring the GT.
+
+# 5 Discussion
+
+In this study, we evaluated 24 models on healthcare-related queries in the RAG setting. Our findings revealed many insights which we share below:
+
+# Difference in model scores
+
+We find that the models that we evaluate vary widely in their scores. This indicates that not all models are suitable for use in the healthcare setting, and we find that some models perform worse than expected. For example, GPT-4O and META-LLAMA-3.1-70B-INSTRUCT perform worse than smaller models on this task.
+
+# English vs. Multilingual Queries
+
+Although the number of non-English queries is small, we find that some Indic models perform better on English queries than non-English queries. We also observe that the Factual Correctness score is lower for non-English queries than English queries on average, indicating that models find it difficult to answer non-English queries accurately. This may be due to the cultural and linguistic nuances present in our queries.
+
+# Multilingual vs. Indic models
+
+We evaluate several models that are specifically fine-tuned on Indic languages and on Indic data and observe that they do not always perform well on non-English queries. This could be because several instruction tuned models are tuned on synthetic instruction data which is usually a translation of English instruction data. A notable exception is the AYA-23-35B model, that contains manually created instruction tuning data for different languages and performs well for Hindi. Additionally, several multilingual instruction tuning datasets have short instructions, which may not be suitable for complex RAG settings, which typically have longer prompts and large chunks of data.
+
+# Human vs. LLM-based evaluation
+
+We conduct human evaluation on a subset of models and data points and observe strong alignment with the LLM evaluator overall, especially regarding the final ranking of the models. However, for certain models like MISTRAL-LARGE-INSTRUCT-2407 (for Telugu) and META-LLAMA-3.1-70B-INSTRUCT (for other languages), the agreement is low. It is important to note that we use LLM-evaluators both with and without references, and assess human agreement for SEMANTIC SIMILARITY which uses ground truth references. This suggests that LLM-evaluators should be used cautiously in a multilingual context, and we plan to broaden human evaluation to include more metrics in future work.
+
+# Evaluation in controlled settings with uncontaminated datasets
+
+We evaluate 24 models in an identical setting, leading to a fair comparison between models. Our dataset is curated based on questions from users of an application and is not contaminated in the training dataset of any of the models we evaluate, lending credibility to the results and insights we gather.
+
+# Locally-grounded, non-translated datasets
+
+Our dataset includes various instances of code-switching, Indian English colloquialisms, and culturally specific questions which cannot be obtained by translating datasets, particularly with automated translations. While models were able to handle code-switching to a certain extent, responses varied greatly to culturally-relevant questions. This underscores the importance of collecting datasets from target populations while building models or systems for real-world use.
+
+# 6 Limitations
+
+Our work is subject to several limitations.
+
+- Because our dataset is derived from actual users of a healthcare bot, we couldn’t regulate the ratio of English to non-English queries. Consequently, the volume of non-English queries in our dataset is significantly lower than that of English queries, meaning the results on non-English queries should not be considered definitive. Similarly, since the HEALTHBOT is available only in four Indian languages, we also could not evaluate on languages beyond these. The scope of our HEALTHBOT setting is currently confined to queries from patients at one hospital in India, resulting in less varied data. We intend to expand this study as HEALTHBOT extends its reach to other parts of the country.
+- While we evaluated numerous models in this work, some were excluded from this study for various reasons, such as ease of access. We aim to incorporate more models in future research.
+- Research has indicated that LLM-based evaluators tend to prefer their own responses. In our evaluations, we use GPT-4O, and there may be a bias leading to higher scores for the GPT-4O model and other models within the GPT family. Although not investigated in prior research, it is also conceivable that models fine-tuned with synthetic data generated by GPT-4O might receive elevated scores. We urge readers to keep these in mind while interpreting the scores. In future work, we plan to use multiple LLM-evaluators to obtain more robust results.
+- Finally, our human evaluation was limited to a subset of models and data, and a single metric due to time and budget constraints. In future work, we plan to incorporate more human evaluation, as well as qualitative analysis of the results.
+
+# 7 Ethical Considerations
+
+We use the framework by Bender and Friedman (2018) to discuss the ethical considerations for our work.
+
+# 8 Acknowledgements
+
+We thank Aditya Yadavalli, Vivek Seshadri, the Operations team and Annotators from KARYA for the streamlined annotation process. We also extend our gratitude to Bhuvan Sachdeva for helping us with the HEALTHBOT deployment, data collection and organization process.