Skip to content

Commit

Permalink
readme improvements
Browse files Browse the repository at this point in the history
  • Loading branch information
emcf committed Mar 22, 2024
1 parent 16c5945 commit 6443f37
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ The pipe is a tool for feeding complex real-world data into large language model

## 🛠️ How it works

The pipe is accessible from the command line or from python. The input source is either a file path, a URL, or a directory (or zip file) path. The pipe will extract information from the source and process it for downstream use with LLMs. The output from the pipe is an opinionated, sensible, text-based (or multimodal) representation of the extracted information, carefully crafted to work well with LLMs such as GPT or Claude. It uses a variety of heuristics to optimize the output for tasks such as [AI-native extraction](https://docs.mathpix.com/#process-a-pdf), [LLMLingua](https://arxiv.org/abs/2403.12968), [Ctags](https://en.wikipedia.org/wiki/Ctags), automatic image encoding, and more.
The pipe is accessible from the command line or from [Python](https://www.python.org/downloads/). The input source is either a file path, a URL, or a directory (or zip file) path. The pipe will extract information from the source and process it for downstream use with [LLMs](https://en.wikipedia.org/wiki/Large_language_model). The output from the pipe is an opinionated, sensible, text-based (or multimodal) representation of the extracted information, carefully crafted to scale well in performance for any model size from [GPT-4](https://openai.com/gpt-4) to [gemma-7b](https://huggingface.co/google/gemma-7b). It uses a variety of heuristics to optimize the output for LLMs, including [AI-native document extraction](https://docs.mathpix.com/#process-a-pdf), [efficient token compression](https://arxiv.org/abs/2403.12968), [code compression with Ctags](https://en.wikipedia.org/wiki/Ctags), automatic [image encoding](https://en.wikipedia.org/wiki/Base64), reranking for [LITM](https://arxiv.org/abs/2307.03172) effects, and more, all pre-built to work out-of-the-box.

## 📂 Supported input sources

Expand Down

0 comments on commit 6443f37

Please sign in to comment.