Skip to content

Commit

Permalink
Add to docs, update arg
Browse files Browse the repository at this point in the history
  • Loading branch information
nfcampos committed Mar 28, 2023
1 parent 1b3771a commit 4c71c14
Show file tree
Hide file tree
Showing 3 changed files with 20 additions and 5 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,15 @@ hide_table_of_contents: true

# PDF files

This example goes over how to load data from PDF files. One document will be created for each PDF file.
This example goes over how to load data from PDF files. By default, one document will be created for each page in the PDF file, you can change this behavior by setting the `splitPages` option to `false`.

# Setup

```bash npm2yarn
npm install pdf-parse
npm install pdfjs-dist
```

# Usage
# Usage, one document per page

```typescript
import { PDFLoader } from "langchain/document_loaders";
Expand All @@ -21,3 +21,15 @@ const loader = new PDFLoader("src/document_loaders/example_data/example.pdf");

const docs = await loader.load();
```

# Usage, one document per file

```typescript
import { PDFLoader } from "langchain/document_loaders";

const loader = new PDFLoader("src/document_loaders/example_data/example.pdf", {
splitPages: false,
});

const docs = await loader.load();
```
5 changes: 4 additions & 1 deletion langchain/src/document_loaders/pdf.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,11 @@ import { Document } from "../document.js";
import { BufferLoader } from "./buffer.js";

export class PDFLoader extends BufferLoader {
constructor(filePathOrBlob: string | Blob, public splitPages = true) {
private splitPages: boolean;

constructor(filePathOrBlob: string | Blob, { splitPages = true } = {}) {
super(filePathOrBlob);
this.splitPages = splitPages;
}

public async parse(
Expand Down
2 changes: 1 addition & 1 deletion langchain/src/document_loaders/tests/pdf.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ test("Test PDF loader from file to single document", async () => {
path.dirname(url.fileURLToPath(import.meta.url)),
"./example_data/1706.03762.pdf"
);
const loader = new PDFLoader(filePath, false);
const loader = new PDFLoader(filePath, { splitPages: false });
const docs = await loader.load();

expect(docs.length).toBe(1);
Expand Down

0 comments on commit 4c71c14

Please sign in to comment.