Skip to content

Commit

Permalink
Refactor into llama-cloud-services (#597)
Browse files Browse the repository at this point in the history
  • Loading branch information
logan-markewich authored Feb 6, 2025
1 parent ae38f40 commit 1ae4d2b
Show file tree
Hide file tree
Showing 116 changed files with 5,775 additions and 983 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/build_package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,4 +45,4 @@ jobs:
- name: Test import
shell: bash
working-directory: ${{ vars.RUNNER_TEMP }}
run: python -c "import llama_parse"
run: python -c "import llama_cloud_services"
18 changes: 17 additions & 1 deletion .github/workflows/publish_release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,16 +23,31 @@ jobs:
uses: actions/setup-python@v4
with:
python-version: ${{ env.PYTHON_VERSION }}

- name: Install Poetry
uses: snok/install-poetry@v1
with:
version: ${{ env.POETRY_VERSION }}

- name: Install deps
shell: bash
run: pip install -e .
- name: Build and publish to pypi

- name: Build and publish llama-cloud-services
uses: JRubics/poetry-publish@v2.1
with:
poetry_version: ${{ env.POETRY_VERSION }}
python_version: ${{ env.PYTHON_VERSION }}
working_directory: "llama_cloud_services"
pypi_token: ${{ secrets.LLAMA_PARSE_PYPI_TOKEN }}
poetry_install_options: "--without dev"

- name: Build and publish llama-parse
uses: JRubics/poetry-publish@v2.1
with:
poetry_version: ${{ env.POETRY_VERSION }}
python_version: ${{ env.PYTHON_VERSION }}
working_directory: "llama_parse"
pypi_token: ${{ secrets.LLAMA_PARSE_PYPI_TOKEN }}
poetry_install_options: "--without dev"

Expand All @@ -52,6 +67,7 @@ jobs:
export PKG=$(ls dist/ | grep tar)
set -- $PKG
echo "name=$1" >> $GITHUB_ENV
- name: Upload Release Asset (sdist) to GitHub
id: upload-release-asset
uses: actions/upload-release-asset@v1
Expand Down
3 changes: 2 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ repos:
rev: v1.0.1
hooks:
- id: mypy
exclude: ^tests/
additional_dependencies:
[
"types-requests",
Expand All @@ -46,7 +47,7 @@ repos:
[
--disallow-untyped-defs,
--ignore-missing-imports,
--python-version=3.8,
--python-version=3.10,
]
- repo: https://github.com/adamchainz/blacken-docs
rev: 1.16.0
Expand Down
157 changes: 21 additions & 136 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,165 +1,50 @@
# LlamaParse

[![PyPI - Downloads](https://img.shields.io/pypi/dm/llama-parse)](https://pypi.org/project/llama-parse/)
[![GitHub contributors](https://img.shields.io/github/contributors/run-llama/llama_parse)](https://github.com/run-llama/llama_parse/graphs/contributors)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/llama-cloud-services)](https://pypi.org/project/llama-cloud-services/)
[![GitHub contributors](https://img.shields.io/github/contributors/run-llama/llama_cloud_services)](https://github.com/run-llama/llama_cloud_services/graphs/contributors)
[![Discord](https://img.shields.io/discord/1059199217496772688)](https://discord.gg/dGcwcsnxhU)

LlamaParse is a **GenAI-native document parser** that can parse complex document data for any downstream LLM use case (RAG, agents).

It is really good at the following:

-**Broad file type support**: Parsing a variety of unstructured file types (.pdf, .pptx, .docx, .xlsx, .html) with text, tables, visual elements, weird layouts, and more.
-**Table recognition**: Parsing embedded tables accurately into text and semi-structured representations.
-**Multimodal parsing and chunking**: Extracting visual elements (images/diagrams) into structured formats and return image chunks using the latest multimodal models.
-**Custom parsing**: Input custom prompt instructions to customize the output the way you want it.
# Llama Cloud Services

LlamaParse directly integrates with [LlamaIndex](https://github.com/run-llama/llama_index).
This repository contains the code for hand-written SDKs and clients for interacting with LlamaCloud.

The free plan is up to 1000 pages a day. Paid plan is free 7k pages per week + 0.3c per additional page by default. There is a sandbox available to test the API [**https://cloud.llamaindex.ai/parse**](https://cloud.llamaindex.ai/parse).
This includes:

Read below for some quickstart information, or see the [full documentation](https://docs.cloud.llamaindex.ai/).

If you're a company interested in enterprise RAG solutions, and/or high volume/on-prem usage of LlamaParse, come [talk to us](https://www.llamaindex.ai/contact).
- [LlamaParse](./parse.md) - A GenAI-native document parser that can parse complex document data for any downstream LLM use case (Agents, RAG, data processing, etc.).
- [LlamaReport (beta/invite-only)](./report.md) - A prebuilt agentic report builder that can be used to build reports from a variety of data sources.
- [LlamaExtract (coming soon!)]() - A prebuilt agentic data extractor that can be used to transform data into a structured JSON representation.

## Getting Started

First, login and get an api-key from [**https://cloud.llamaindex.ai/api-key**](https://cloud.llamaindex.ai/api-key).

Then, make sure you have the latest LlamaIndex version installed.

**NOTE:** If you are upgrading from v0.9.X, we recommend following our [migration guide](https://pretty-sodium-5e0.notion.site/v0-10-0-Migration-Guide-6ede431dcb8841b09ea171e7f133bd77), as well as uninstalling your previous version first.

```
pip uninstall llama-index # run this if upgrading from v0.9.x or older
pip install -U llama-index --upgrade --no-cache-dir --force-reinstall
```

Lastly, install the package:

`pip install llama-parse`

Now you can parse your first PDF file using the command line interface. Use the command `llama-parse [file_paths]`. See the help text with `llama-parse --help`.
Install the package:

```bash
export LLAMA_CLOUD_API_KEY='llx-...'

# output as text
llama-parse my_file.pdf --result-type text --output-file output.txt

# output as markdown
llama-parse my_file.pdf --result-type markdown --output-file output.md

# output as raw json
llama-parse my_file.pdf --output-raw-json --output-file output.json
pip install llama-cloud-services
```

You can also create simple scripts:

```python
import nest_asyncio

nest_asyncio.apply()

from llama_parse import LlamaParse

parser = LlamaParse(
api_key="llx-...", # can also be set in your env as LLAMA_CLOUD_API_KEY
result_type="markdown", # "markdown" and "text" are available
num_workers=4, # if multiple files passed, split in `num_workers` API calls
verbose=True,
language="en", # Optionally you can define a language, default=en
)

# sync
documents = parser.load_data("./my_file.pdf")

# sync batch
documents = parser.load_data(["./my_file1.pdf", "./my_file2.pdf"])
Then, get your API key from [LlamaCloud](https://cloud.llamaindex.ai/).

# async
documents = await parser.aload_data("./my_file.pdf")

# async batch
documents = await parser.aload_data(["./my_file1.pdf", "./my_file2.pdf"])
```

## Using with file object

You can parse a file object directly:
Then, you can use the services in your code:

```python
import nest_asyncio

nest_asyncio.apply()

from llama_parse import LlamaParse
from llama_cloud_services import LlamaParse, LlamaReport

parser = LlamaParse(
api_key="llx-...", # can also be set in your env as LLAMA_CLOUD_API_KEY
result_type="markdown", # "markdown" and "text" are available
num_workers=4, # if multiple files passed, split in `num_workers` API calls
verbose=True,
language="en", # Optionally you can define a language, default=en
)

file_name = "my_file1.pdf"
extra_info = {"file_name": file_name}

with open(f"./{file_name}", "rb") as f:
# must provide extra_info with file_name key with passing file object
documents = parser.load_data(f, extra_info=extra_info)

# you can also pass file bytes directly
with open(f"./{file_name}", "rb") as f:
file_bytes = f.read()
# must provide extra_info with file_name key with passing file bytes
documents = parser.load_data(file_bytes, extra_info=extra_info)
parser = LlamaParse(api_key="YOUR_API_KEY")
report = LlamaReport(api_key="YOUR_API_KEY")
```

## Using with `SimpleDirectoryReader`
See the quickstart guides for each service for more information:

You can also integrate the parser as the default PDF loader in `SimpleDirectoryReader`:

```python
import nest_asyncio

nest_asyncio.apply()

from llama_parse import LlamaParse
from llama_index.core import SimpleDirectoryReader

parser = LlamaParse(
api_key="llx-...", # can also be set in your env as LLAMA_CLOUD_API_KEY
result_type="markdown", # "markdown" and "text" are available
verbose=True,
)

file_extractor = {".pdf": parser}
documents = SimpleDirectoryReader(
"./data", file_extractor=file_extractor
).load_data()
```

Full documentation for `SimpleDirectoryReader` can be found on the [LlamaIndex Documentation](https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader.html).

## Examples

Several end-to-end indexing examples can be found in the examples folder

- [Getting Started](examples/demo_basic.ipynb)
- [Advanced RAG Example](examples/demo_advanced.ipynb)
- [Raw API Usage](examples/demo_api.ipynb)
- [LlamaParse](./parse.md)
- [LlamaReport (beta/invite-only)](./report.md)
- [LlamaExtract (coming soon!)]()

## Documentation

[https://docs.cloud.llamaindex.ai/](https://docs.cloud.llamaindex.ai/)
You can see complete SDK and API documentation for each service on [our official docs](https://docs.cloud.llamaindex.ai/).

## Terms of Service

See the [Terms of Service Here](./TOS.pdf).

## Get in Touch (LlamaCloud)

LlamaParse is part of LlamaCloud, our e2e enterprise RAG platform that provides out-of-the-box, production-ready connectors, indexing, and retrieval over your complex data sources. We offer SaaS and VPC options.

LlamaCloud is currently available via waitlist (join by [creating an account](https://cloud.llamaindex.ai/)). If you're interested in state-of-the-art quality and in centralizing your RAG efforts, come [get in touch with us](https://www.llamaindex.ai/contact).
You can get in touch with us by following our [contact link](https://www.llamaindex.ai/contact).
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@
"source": [
"!pip install llama-index\n",
"!pip install llama-index-core\n",
"!pip install llama-parse"
"!pip install llama-cloud-services"
]
},
{
Expand Down Expand Up @@ -190,7 +190,7 @@
"metadata": {},
"outputs": [],
"source": [
"from llama_parse import LlamaParse\n",
"from llama_cloud_services import LlamaParse\n",
"\n",
"parser = LlamaParse(result_type=\"markdown\")"
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
"metadata": {},
"outputs": [],
"source": [
"!pip install llama-parse llama-index llama-index-postprocessor-sbert-rerank"
"!pip install llama-cloud-services llama-index llama-index-postprocessor-sbert-rerank"
]
},
{
Expand Down Expand Up @@ -82,7 +82,7 @@
"metadata": {},
"outputs": [],
"source": [
"from llama_parse import LlamaParse\n",
"from llama_cloud_services import LlamaParse\n",
"\n",
"parser = LlamaParse(\n",
" result_type=\"markdown\",\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@
}
],
"source": [
"from llama_parse import LlamaParse\n",
"from llama_cloud_services import LlamaParse\n",
"\n",
"docs = LlamaParse(result_type=\"text\").load_data(\"./caltrain_schedule_weekend.pdf\")"
]
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
"!pip install llama-index-embeddings-openai\n",
"!pip install llama-index-postprocessor-flag-embedding-reranker\n",
"!pip install git+https://github.com/FlagOpen/FlagEmbedding.git\n",
"!pip install llama-parse"
"!pip install llama-cloud-services"
]
},
{
Expand Down Expand Up @@ -108,7 +108,7 @@
}
],
"source": [
"from llama_parse import LlamaParse\n",
"from llama_cloud_services import LlamaParse\n",
"\n",
"documents = LlamaParse(result_type=\"markdown\").load_data(\"./apple_2021_10k.pdf\")"
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
"%pip install llama-index-embeddings-openai\n",
"%pip install llama-index-postprocessor-flag-embedding-reranker\n",
"%pip install git+https://github.com/FlagOpen/FlagEmbedding.git\n",
"%pip install llama-parse\n",
"%pip install llama-cloud-services\n",
"%pip install llama-index-vector-stores-astra-db"
]
},
Expand Down Expand Up @@ -107,7 +107,7 @@
}
],
"source": [
"from llama_parse import LlamaParse\n",
"from llama_cloud_services import LlamaParse\n",
"\n",
"documents = LlamaParse(result_type=\"markdown\").load_data(\"./uber_10q_march_2022.pdf\")"
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,7 @@
}
],
"source": [
"from llama_parse import LlamaParse\n",
"from llama_cloud_services import LlamaParse\n",
"\n",
"documents = LlamaParse(result_type=\"markdown\").load_data(\"./uber_10q_march_2022.pdf\")"
]
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@
}
],
"source": [
"from llama_parse import LlamaParse\n",
"from llama_cloud_services import LlamaParse\n",
"\n",
"documents = LlamaParse(result_type=\"text\").load_data(file_path)"
]
Expand Down
4 changes: 2 additions & 2 deletions examples/demo_basic.ipynb → examples/parse/demo_basic.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@
}
],
"source": [
"from llama_parse import LlamaParse\n",
"from llama_cloud_services import LlamaParse\n",
"\n",
"documents = LlamaParse(result_type=\"text\").load_data(\"./attention.pdf\")"
]
Expand Down Expand Up @@ -120,7 +120,7 @@
}
],
"source": [
"from llama_parse import LlamaParse\n",
"from llama_cloud_services import LlamaParse\n",
"\n",
"documents = LlamaParse(result_type=\"markdown\").load_data(\"./attention.pdf\")"
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,7 @@
}
],
"source": [
"from llama_parse import LlamaParse\n",
"from llama_cloud_services import LlamaParse\n",
"\n",
"documents = LlamaParse(result_type=\"text\").load_data(file_path)"
]
Expand Down
4 changes: 2 additions & 2 deletions examples/demo_excel.ipynb → examples/parse/demo_excel.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
"outputs": [],
"source": [
"%pip install llama-index\n",
"%pip install llama-parse"
"%pip install llama-cloud-services"
]
},
{
Expand All @@ -41,7 +41,7 @@
"\n",
"nest_asyncio.apply()\n",
"\n",
"from llama_parse import LlamaParse\n",
"from llama_cloud_services import LlamaParse\n",
"\n",
"api_key = \"llx-\" # get from cloud.llamaindex.ai"
]
Expand Down
Loading

0 comments on commit 1ae4d2b

Please sign in to comment.