Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Dropbox Document Loader #7301

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
---
hide_table_of_contents: true
sidebar_class_name: node-only
---

# Dropbox Loader

The `DropboxLoader` allows you to load documents from Dropbox into your LangChain applications. It retrieves files or directories from your Dropbox account and converts them into documents ready for processing.

## Overview

Dropbox is a file hosting service that brings all your files—traditional documents, cloud content, and web shortcuts—together in one place. With the `DropboxLoader`, you can seamlessly integrate Dropbox file retrieval into your projects.

## Setup

1. Create a dropbox app, using the [Dropbox App Console](https://www.dropbox.com/developers/apps/create).
2. Ensure the app has the `files.metadata.read`, `files.content.read` scope permissions:
3. Generate the access token from the Dropbox App Console.
4. To use this loader, you'll need to have Unstructured already set up and ready to use at an available URL endpoint. It can also be configured to run locally.
See the docs [here](https://www.dropbox.com/developers/apps/create) for information on how to do that.
5. Install the necessary packages:

```bash npm2yarn
npm install @langchain/community @langchain/core dropbox
```

## Usage

### Loading Specific Files

To load specific files from Dropbox, specify the file paths:

```typescript
import { DropboxLoader } from "@langchain/community/document_loaders/web/dropbox";

const loader = new DropboxLoader({
clientOptions: {
accessToken: "your-dropbox-access-token",
},
unstructuredOptions: {
apiUrl: "http://localhost:8000/general/v0/general", // Replace with your Unstructured API URL
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's emphasize somewhere that this wraps Unstructured

Should we call this DropboxUnstructuredLoader instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can rename the loader class to DropboxUnstructuredLoader
I want to confirm if I need to rename the file to say dropbox_unstructured.ts as well?

Also, I noticed that a few preexisting loaders utilize unstructured as well. Would they need to be renamed as well in the future?:

},
filePaths: ["/path/to/file1.txt", "/path/to/file2.pdf"], // Replace with file paths on Dropbox.
});

const docs = await loader.load();
console.log(docs);
```

### Loading Files from a Directory

To load all files from a specific directory, provide the `folderPath` and set the `mode` to `"directory"`. Set `recursive` to `true` to traverse subdirectories:

```typescript
import { DropboxLoader } from "@langchain/community/document_loaders/web/dropbox";

const loader = new DropboxLoader({
clientOptions: {
accessToken: "your-dropbox-access-token",
},
unstructuredOptions: {
apiUrl: "http://localhost:8000/general/v0/general",
},
folderPath: "/path/to/folder",
recursive: true, // Load documents found in subdirectories
mode: "directory",
});

const docs = await loader.load();
console.log(docs);
```

### Streaming Documents

To process large datasets efficiently, use the `loadLazy` method to stream documents asynchronously:

```typescript
import { DropboxLoader } from "@langchain/community/document_loaders/web/dropbox";

const loader = new DropboxLoader({
clientOptions: {
accessToken: "your-dropbox-access-token",
},
unstructuredOptions: {
apiUrl: "http://localhost:8000/general/v0/general",
},
folderPath: "/large/dataset",
recursive: true,
mode: "directory",
});

for await (const doc of loader.loadLazy()) {
// Process each document as it's loaded
console.log(doc);
}
```

### Authentication with Environment Variables

You can set the `DROPBOX_ACCESS_TOKEN` environment variable instead of passing the access token in `clientOptions`:

```bash
export DROPBOX_ACCESS_TOKEN=your-dropbox-access-token
```

Then initialize the loader without specifying `accessToken`:

```typescript
import { DropboxLoader } from "@langchain/community/document_loaders/web/dropbox";

const loader = new DropboxLoader({
clientOptions: {},
unstructuredOptions: {
apiUrl: "http://localhost:8000/general/v0/general",
},
filePaths: ["/important/notes.txt"],
});

const docs = await loader.load();
console.log(docs[0].pageContent);
```

## Configuration Options
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just link to API refs instead

Copy link
Author

@Ser0n-ath Ser0n-ath Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this fine? dccce35


Here are the configuration options for the `DropboxLoader`:

| Option | Type | Description |
| --------------------- | ------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `clientOptions` | `DropboxOptions` | Configuration options for initializing the Dropbox client, including authentication details. Refer to the [Dropbox SDK Documentation](https://dropbox.github.io/dropbox-sdk-js/Dropbox.html#Dropbox__anchor) for more information. |
| `unstructuredOptions` | `UnstructuredLoaderOptions` | Options for the `UnstructuredLoader` used to process downloaded files. Includes the `apiUrl` for your Unstructured server. |
| `folderPath` | `string` (optional) | The path to the folder in Dropbox from which to load files. Defaults to the root folder (`""`) if not specified. |
| `filePaths` | `string[]` (optional) | An array of specific file paths in Dropbox to load. Required if `mode` is set to `"file"`. |
| `recursive` | `boolean` (optional) | Indicates whether to recursively traverse folders when `mode` is `"directory"`. Defaults to `false`. |
| `mode` | `"file"` or `"directory"` (optional) | The mode of operation. Set to `"file"` to load specific files or `"directory"` to load all files in a directory. Defaults to `"file"`. |

## API References

- [Dropbox SDK for JavaScript](https://github.com/dropbox/dropbox-sdk-js)
1 change: 1 addition & 0 deletions langchain/.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -102,3 +102,4 @@ GOOGLE_ROUTES_API_KEY=ADD_YOURS_HERE
CONFLUENCE_USERNAME=ADD_YOURS_HERE
CONFLUENCE_PASSWORD=ADD_YOURS_HERE
CONFLUENCE_PATH=ADD_YOURS_HERE
DROPBOX_ACCESS_TOKEN=ADD_YOURS_HERE
4 changes: 4 additions & 0 deletions libs/langchain-community/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -894,6 +894,10 @@ document_loaders/web/cheerio.cjs
document_loaders/web/cheerio.js
document_loaders/web/cheerio.d.ts
document_loaders/web/cheerio.d.cts
document_loaders/web/dropbox.cjs
document_loaders/web/dropbox.js
document_loaders/web/dropbox.d.ts
document_loaders/web/dropbox.d.cts
document_loaders/web/html.cjs
document_loaders/web/html.js
document_loaders/web/html.d.ts
Expand Down
2 changes: 2 additions & 0 deletions libs/langchain-community/langchain.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -278,6 +278,7 @@ export const config = {
"document_loaders/web/azure_blob_storage_file",
"document_loaders/web/browserbase": "document_loaders/web/browserbase",
"document_loaders/web/cheerio": "document_loaders/web/cheerio",
"document_loaders/web/dropbox": "document_loaders/web/dropbox",
"document_loaders/web/html": "document_loaders/web/html",
"document_loaders/web/puppeteer": "document_loaders/web/puppeteer",
"document_loaders/web/playwright": "document_loaders/web/playwright",
Expand Down Expand Up @@ -499,6 +500,7 @@ export const config = {
"document_loaders/web/azure_blob_storage_file",
"document_loaders/web/browserbase",
"document_loaders/web/cheerio",
"document_loaders/web/dropbox",
"document_loaders/web/puppeteer",
"document_loaders/web/playwright",
"document_loaders/web/college_confidential",
Expand Down
18 changes: 18 additions & 0 deletions libs/langchain-community/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,7 @@
"dotenv": "^16.0.3",
"dpdm": "^3.12.0",
"dria": "^0.0.3",
"dropbox": "^10.34.0",
"duck-duck-scrape": "^2.2.5",
"epub2": "^3.0.1",
"eslint": "^8.33.0",
Expand Down Expand Up @@ -299,6 +300,7 @@
"d3-dsv": "^2.0.0",
"discord.js": "^14.14.1",
"dria": "^0.0.3",
"dropbox": "^10.34.0",
"duck-duck-scrape": "^2.2.5",
"epub2": "^3.0.1",
"faiss-node": "^0.5.1",
Expand Down Expand Up @@ -575,6 +577,9 @@
"dria": {
"optional": true
},
"dropbox": {
"optional": true
},
"duck-duck-scrape": {
"optional": true
},
Expand Down Expand Up @@ -2728,6 +2733,15 @@
"import": "./document_loaders/web/cheerio.js",
"require": "./document_loaders/web/cheerio.cjs"
},
"./document_loaders/web/dropbox": {
"types": {
"import": "./document_loaders/web/dropbox.d.ts",
"require": "./document_loaders/web/dropbox.d.cts",
"default": "./document_loaders/web/dropbox.d.ts"
},
"import": "./document_loaders/web/dropbox.js",
"require": "./document_loaders/web/dropbox.cjs"
},
"./document_loaders/web/html": {
"types": {
"import": "./document_loaders/web/html.d.ts",
Expand Down Expand Up @@ -4042,6 +4056,10 @@
"document_loaders/web/cheerio.js",
"document_loaders/web/cheerio.d.ts",
"document_loaders/web/cheerio.d.cts",
"document_loaders/web/dropbox.cjs",
"document_loaders/web/dropbox.js",
"document_loaders/web/dropbox.d.ts",
"document_loaders/web/dropbox.d.cts",
"document_loaders/web/html.cjs",
"document_loaders/web/html.js",
"document_loaders/web/html.d.ts",
Expand Down
Loading