Refactor into llama-cloud-services (#597)

run-llama · Feb 6, 2025 · 1ae4d2b · 1ae4d2b
1 parent ae38f40
commit 1ae4d2b
Show file tree

Hide file tree

Showing 116 changed files with 5,775 additions and 983 deletions.
diff --git a/.github/workflows/build_package.yml b/.github/workflows/build_package.yml
@@ -45,4 +45,4 @@ jobs:
       - name: Test import
         shell: bash
         working-directory: ${{ vars.RUNNER_TEMP }}
-        run: python -c "import llama_parse"
+        run: python -c "import llama_cloud_services"
diff --git a/.github/workflows/publish_release.yml b/.github/workflows/publish_release.yml
@@ -23,16 +23,31 @@ jobs:
         uses: actions/setup-python@v4
         with:
           python-version: ${{ env.PYTHON_VERSION }}
+
       - name: Install Poetry
         uses: snok/install-poetry@v1
         with:
           version: ${{ env.POETRY_VERSION }}
+
       - name: Install deps
         shell: bash
         run: pip install -e .
-      - name: Build and publish to pypi
+
+      - name: Build and publish llama-cloud-services
         uses: JRubics/poetry-publish@v2.1
         with:
+          poetry_version: ${{ env.POETRY_VERSION }}
+          python_version: ${{ env.PYTHON_VERSION }}
+          working_directory: "llama_cloud_services"
+          pypi_token: ${{ secrets.LLAMA_PARSE_PYPI_TOKEN }}
+          poetry_install_options: "--without dev"
+
+      - name: Build and publish llama-parse
+        uses: JRubics/poetry-publish@v2.1
+        with:
+          poetry_version: ${{ env.POETRY_VERSION }}
+          python_version: ${{ env.PYTHON_VERSION }}
+          working_directory: "llama_parse"
           pypi_token: ${{ secrets.LLAMA_PARSE_PYPI_TOKEN }}
           poetry_install_options: "--without dev"
 
@@ -52,6 +67,7 @@ jobs:
           export PKG=$(ls dist/ | grep tar)
           set -- $PKG
           echo "name=$1" >> $GITHUB_ENV
+
       - name: Upload Release Asset (sdist) to GitHub
         id: upload-release-asset
         uses: actions/upload-release-asset@v1

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -33,6 +33,7 @@ repos:
     rev: v1.0.1
     hooks:
       - id: mypy
+        exclude: ^tests/
         additional_dependencies:
           [
             "types-requests",
@@ -46,7 +47,7 @@ repos:
           [
             --disallow-untyped-defs,
             --ignore-missing-imports,
-            --python-version=3.8,
+            --python-version=3.10,
           ]
   - repo: https://github.com/adamchainz/blacken-docs
     rev: 1.16.0

diff --git a/README.md b/README.md
@@ -1,165 +1,50 @@
-# LlamaParse
-
-[![PyPI - Downloads](https://img.shields.io/pypi/dm/llama-parse)](https://pypi.org/project/llama-parse/)
-[![GitHub contributors](https://img.shields.io/github/contributors/run-llama/llama_parse)](https://github.com/run-llama/llama_parse/graphs/contributors)
+[![PyPI - Downloads](https://img.shields.io/pypi/dm/llama-cloud-services)](https://pypi.org/project/llama-cloud-services/)
+[![GitHub contributors](https://img.shields.io/github/contributors/run-llama/llama_cloud_services)](https://github.com/run-llama/llama_cloud_services/graphs/contributors)
 [![Discord](https://img.shields.io/discord/1059199217496772688)](https://discord.gg/dGcwcsnxhU)
 
-LlamaParse is a **GenAI-native document parser** that can parse complex document data for any downstream LLM use case (RAG, agents).
-
-It is really good at the following:
-
-- ✅ **Broad file type support**: Parsing a variety of unstructured file types (.pdf, .pptx, .docx, .xlsx, .html) with text, tables, visual elements, weird layouts, and more.
-- ✅ **Table recognition**: Parsing embedded tables accurately into text and semi-structured representations.
-- ✅ **Multimodal parsing and chunking**: Extracting visual elements (images/diagrams) into structured formats and return image chunks using the latest multimodal models.
-- ✅ **Custom parsing**: Input custom prompt instructions to customize the output the way you want it.
+# Llama Cloud Services
 
-LlamaParse directly integrates with [LlamaIndex](https://github.com/run-llama/llama_index).
+This repository contains the code for hand-written SDKs and clients for interacting with LlamaCloud.
 
-The free plan is up to 1000 pages a day. Paid plan is free 7k pages per week + 0.3c per additional page by default. There is a sandbox available to test the API [**https://cloud.llamaindex.ai/parse ↗**](https://cloud.llamaindex.ai/parse).
+This includes:
 
-Read below for some quickstart information, or see the [full documentation](https://docs.cloud.llamaindex.ai/).
-
-If you're a company interested in enterprise RAG solutions, and/or high volume/on-prem usage of LlamaParse, come [talk to us](https://www.llamaindex.ai/contact).
+- [LlamaParse](./parse.md) - A GenAI-native document parser that can parse complex document data for any downstream LLM use case (Agents, RAG, data processing, etc.).
+- [LlamaReport (beta/invite-only)](./report.md) - A prebuilt agentic report builder that can be used to build reports from a variety of data sources.
+- [LlamaExtract (coming soon!)]() - A prebuilt agentic data extractor that can be used to transform data into a structured JSON representation.
 
 ## Getting Started
 
-First, login and get an api-key from [**https://cloud.llamaindex.ai/api-key ↗**](https://cloud.llamaindex.ai/api-key).
-
-Then, make sure you have the latest LlamaIndex version installed.
-
-**NOTE:** If you are upgrading from v0.9.X, we recommend following our [migration guide](https://pretty-sodium-5e0.notion.site/v0-10-0-Migration-Guide-6ede431dcb8841b09ea171e7f133bd77), as well as uninstalling your previous version first.
-
-```
-pip uninstall llama-index  # run this if upgrading from v0.9.x or older
-pip install -U llama-index --upgrade --no-cache-dir --force-reinstall
-```
-
-Lastly, install the package:
-
-`pip install llama-parse`
-
-Now you can parse your first PDF file using the command line interface. Use the command `llama-parse [file_paths]`. See the help text with `llama-parse --help`.
+Install the package:
 
 ```bash
-export LLAMA_CLOUD_API_KEY='llx-...'
-
-# output as text
-llama-parse my_file.pdf --result-type text --output-file output.txt
-
-# output as markdown
-llama-parse my_file.pdf --result-type markdown --output-file output.md
-
-# output as raw json
-llama-parse my_file.pdf --output-raw-json --output-file output.json
+pip install llama-cloud-services
 ```
 
-You can also create simple scripts:
-
-```python
-import nest_asyncio
-
-nest_asyncio.apply()
-
-from llama_parse import LlamaParse
-
-parser = LlamaParse(
-    api_key="llx-...",  # can also be set in your env as LLAMA_CLOUD_API_KEY
-    result_type="markdown",  # "markdown" and "text" are available
-    num_workers=4,  # if multiple files passed, split in `num_workers` API calls
-    verbose=True,
-    language="en",  # Optionally you can define a language, default=en
-)
-
-# sync
-documents = parser.load_data("./my_file.pdf")
-
-# sync batch
-documents = parser.load_data(["./my_file1.pdf", "./my_file2.pdf"])
+Then, get your API key from [LlamaCloud](https://cloud.llamaindex.ai/).
 
-# async
-documents = await parser.aload_data("./my_file.pdf")
-
-# async batch
-documents = await parser.aload_data(["./my_file1.pdf", "./my_file2.pdf"])
-```
-
-## Using with file object
-
-You can parse a file object directly:
+Then, you can use the services in your code:
 
 ```python
-import nest_asyncio
-
-nest_asyncio.apply()
-
-from llama_parse import LlamaParse
+from llama_cloud_services import LlamaParse, LlamaReport
 
-parser = LlamaParse(
-    api_key="llx-...",  # can also be set in your env as LLAMA_CLOUD_API_KEY
-    result_type="markdown",  # "markdown" and "text" are available
-    num_workers=4,  # if multiple files passed, split in `num_workers` API calls
-    verbose=True,
-    language="en",  # Optionally you can define a language, default=en
-)
-
-file_name = "my_file1.pdf"
-extra_info = {"file_name": file_name}
-
-with open(f"./{file_name}", "rb") as f:
-    # must provide extra_info with file_name key with passing file object
-    documents = parser.load_data(f, extra_info=extra_info)
-
-# you can also pass file bytes directly
-with open(f"./{file_name}", "rb") as f:
-    file_bytes = f.read()
-    # must provide extra_info with file_name key with passing file bytes
-    documents = parser.load_data(file_bytes, extra_info=extra_info)
+parser = LlamaParse(api_key="YOUR_API_KEY")
+report = LlamaReport(api_key="YOUR_API_KEY")
 ```
 
-## Using with `SimpleDirectoryReader`
+See the quickstart guides for each service for more information:
 
-You can also integrate the parser as the default PDF loader in `SimpleDirectoryReader`:
-
-```python
-import nest_asyncio
-
-nest_asyncio.apply()
-
-from llama_parse import LlamaParse
-from llama_index.core import SimpleDirectoryReader
-
-parser = LlamaParse(
-    api_key="llx-...",  # can also be set in your env as LLAMA_CLOUD_API_KEY
-    result_type="markdown",  # "markdown" and "text" are available
-    verbose=True,
-)
-
-file_extractor = {".pdf": parser}
-documents = SimpleDirectoryReader(
-    "./data", file_extractor=file_extractor
-).load_data()
-```
-
-Full documentation for `SimpleDirectoryReader` can be found on the [LlamaIndex Documentation](https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader.html).
-
-## Examples
-
-Several end-to-end indexing examples can be found in the examples folder
-
-- [Getting Started](examples/demo_basic.ipynb)
-- [Advanced RAG Example](examples/demo_advanced.ipynb)
-- [Raw API Usage](examples/demo_api.ipynb)
+- [LlamaParse](./parse.md)
+- [LlamaReport (beta/invite-only)](./report.md)
+- [LlamaExtract (coming soon!)]()
 
 ## Documentation
 
-[https://docs.cloud.llamaindex.ai/](https://docs.cloud.llamaindex.ai/)
+You can see complete SDK and API documentation for each service on [our official docs](https://docs.cloud.llamaindex.ai/).
 
 ## Terms of Service
 
 See the [Terms of Service Here](./TOS.pdf).
 
 ## Get in Touch (LlamaCloud)
 
-LlamaParse is part of LlamaCloud, our e2e enterprise RAG platform that provides out-of-the-box, production-ready connectors, indexing, and retrieval over your complex data sources. We offer SaaS and VPC options.
-
-LlamaCloud is currently available via waitlist (join by [creating an account](https://cloud.llamaindex.ai/)). If you're interested in state-of-the-art quality and in centralizing your RAG efforts, come [get in touch with us](https://www.llamaindex.ai/contact).
+You can get in touch with us by following our [contact link](https://www.llamaindex.ai/contact).
diff --git a/...anced_rag/dynamic_section_retrieval.ipynb → ...anced_rag/dynamic_section_retrieval.ipynb b/...anced_rag/dynamic_section_retrieval.ipynb → ...anced_rag/dynamic_section_retrieval.ipynb
@@ -53,7 +53,7 @@
    "source": [
     "!pip install llama-index\n",
     "!pip install llama-index-core\n",
-    "!pip install llama-parse"
+    "!pip install llama-cloud-services"
    ]
   },
   {
@@ -190,7 +190,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
     "\n",
     "parser = LlamaParse(result_type=\"markdown\")"
    ]

diff --git a/...ced_rag/dynamic_section_retrieval_img.png → ...ced_rag/dynamic_section_retrieval_img.png b/...ced_rag/dynamic_section_retrieval_img.png → ...ced_rag/dynamic_section_retrieval_img.png
diff --git a/...les/agents/demo_simple_openai_agent.ipynb → ...rse/agents/demo_simple_openai_agent.ipynb b/...les/agents/demo_simple_openai_agent.ipynb → ...rse/agents/demo_simple_openai_agent.ipynb
@@ -22,7 +22,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "!pip install llama-parse llama-index llama-index-postprocessor-sbert-rerank"
+    "!pip install llama-cloud-services llama-index llama-index-postprocessor-sbert-rerank"
    ]
   },
   {
@@ -82,7 +82,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
     "\n",
     "parser = LlamaParse(\n",
     "    result_type=\"markdown\",\n",

diff --git a/...es/caltrain/caltrain_schedule_weekend.pdf → ...se/caltrain/caltrain_schedule_weekend.pdf b/...es/caltrain/caltrain_schedule_weekend.pdf → ...se/caltrain/caltrain_schedule_weekend.pdf
diff --git a/examples/caltrain/caltrain_text_mode.ipynb → ...s/parse/caltrain/caltrain_text_mode.ipynb b/examples/caltrain/caltrain_text_mode.ipynb → ...s/parse/caltrain/caltrain_text_mode.ipynb
@@ -81,7 +81,7 @@
     }
    ],
    "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
     "\n",
     "docs = LlamaParse(result_type=\"text\").load_data(\"./caltrain_schedule_weekend.pdf\")"
    ]

diff --git a/examples/data/BP_Excel.xlsx → examples/parse/data/BP_Excel.xlsx b/examples/data/BP_Excel.xlsx → examples/parse/data/BP_Excel.xlsx
diff --git a/...ia_quarterly_revenue_trend_by_market.xlsx → ...ia_quarterly_revenue_trend_by_market.xlsx b/...ia_quarterly_revenue_trend_by_market.xlsx → ...ia_quarterly_revenue_trend_by_market.xlsx
diff --git a/examples/demo_advanced.ipynb → examples/parse/demo_advanced.ipynb b/examples/demo_advanced.ipynb → examples/parse/demo_advanced.ipynb
@@ -26,7 +26,7 @@
     "!pip install llama-index-embeddings-openai\n",
     "!pip install llama-index-postprocessor-flag-embedding-reranker\n",
     "!pip install git+https://github.com/FlagOpen/FlagEmbedding.git\n",
-    "!pip install llama-parse"
+    "!pip install llama-cloud-services"
    ]
   },
   {
@@ -108,7 +108,7 @@
     }
    ],
    "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
     "\n",
     "documents = LlamaParse(result_type=\"markdown\").load_data(\"./apple_2021_10k.pdf\")"
    ]

diff --git a/examples/demo_advanced_astradb.ipynb → examples/parse/demo_advanced_astradb.ipynb b/examples/demo_advanced_astradb.ipynb → examples/parse/demo_advanced_astradb.ipynb
@@ -22,7 +22,7 @@
     "%pip install llama-index-embeddings-openai\n",
     "%pip install llama-index-postprocessor-flag-embedding-reranker\n",
     "%pip install git+https://github.com/FlagOpen/FlagEmbedding.git\n",
-    "%pip install llama-parse\n",
+    "%pip install llama-cloud-services\n",
     "%pip install llama-index-vector-stores-astra-db"
    ]
   },
@@ -107,7 +107,7 @@
     }
    ],
    "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
     "\n",
     "documents = LlamaParse(result_type=\"markdown\").load_data(\"./uber_10q_march_2022.pdf\")"
    ]

diff --git a/examples/demo_advanced_weaviate.ipynb → examples/parse/demo_advanced_weaviate.ipynb b/examples/demo_advanced_weaviate.ipynb → examples/parse/demo_advanced_weaviate.ipynb
@@ -176,7 +176,7 @@
     }
    ],
    "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
     "\n",
     "documents = LlamaParse(result_type=\"markdown\").load_data(\"./uber_10q_march_2022.pdf\")"
    ]

diff --git a/examples/demo_api.ipynb → examples/parse/demo_api.ipynb b/examples/demo_api.ipynb → examples/parse/demo_api.ipynb
diff --git a/examples/demo_astradb.ipynb → examples/parse/demo_astradb.ipynb b/examples/demo_astradb.ipynb → examples/parse/demo_astradb.ipynb
@@ -130,7 +130,7 @@
     }
    ],
    "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
     "\n",
     "documents = LlamaParse(result_type=\"text\").load_data(file_path)"
    ]

diff --git a/examples/demo_basic.ipynb → examples/parse/demo_basic.ipynb b/examples/demo_basic.ipynb → examples/parse/demo_basic.ipynb
@@ -73,7 +73,7 @@
     }
    ],
    "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
     "\n",
     "documents = LlamaParse(result_type=\"text\").load_data(\"./attention.pdf\")"
    ]
@@ -120,7 +120,7 @@
     }
    ],
    "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
     "\n",
     "documents = LlamaParse(result_type=\"markdown\").load_data(\"./attention.pdf\")"
    ]

diff --git a/examples/demo_elasticsearch_vectordb.ipynb → ...s/parse/demo_elasticsearch_vectordb.ipynb b/examples/demo_elasticsearch_vectordb.ipynb → ...s/parse/demo_elasticsearch_vectordb.ipynb
@@ -142,7 +142,7 @@
     }
    ],
    "source": [
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
     "\n",
     "documents = LlamaParse(result_type=\"text\").load_data(file_path)"
    ]

diff --git a/examples/demo_excel.ipynb → examples/parse/demo_excel.ipynb b/examples/demo_excel.ipynb → examples/parse/demo_excel.ipynb
@@ -21,7 +21,7 @@
    "outputs": [],
    "source": [
     "%pip install llama-index\n",
-    "%pip install llama-parse"
+    "%pip install llama-cloud-services"
    ]
   },
   {
@@ -41,7 +41,7 @@
     "\n",
     "nest_asyncio.apply()\n",
     "\n",
-    "from llama_parse import LlamaParse\n",
+    "from llama_cloud_services import LlamaParse\n",
     "\n",
     "api_key = \"llx-\"  # get from cloud.llamaindex.ai"
    ]