-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Dropbox Document Loader #7301
Open
Ser0n-ath
wants to merge
12
commits into
langchain-ai:main
Choose a base branch
from
Ser0n-ath:add-dropbox-document-loader
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 8 commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
e0dd398
Add document loader skeleton
Ser0n-ath 53253fe
Add dropbox document loader
Ser0n-ath 81a7db0
add integration tests base
yashankxy 459d306
Merge branch 'add-dropbox-document-loader' of https://github.com/Ser0…
yashankxy 9efc17e
added test cases
b2a1dda
Add dropbox document loader docs
Ismail-Bashir 0447aec
Merge branch 'langchain-ai:main' into add-dropbox-document-loader
Ser0n-ath 7c8bd04
remove direct dependency
Ser0n-ath a95ffa0
Merge branch 'langchain-ai:main' into add-dropbox-document-loader
Ser0n-ath 7027578
remove fs usage
Ser0n-ath dccce35
update doc api ref
Ser0n-ath cd6dfe9
Merge branch 'langchain-ai:main' into add-dropbox-document-loader
Ser0n-ath File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
138 changes: 138 additions & 0 deletions
138
docs/core_docs/docs/integrations/document_loaders/web_loaders/dropbox.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,138 @@ | ||
--- | ||
hide_table_of_contents: true | ||
sidebar_class_name: node-only | ||
--- | ||
|
||
# Dropbox Loader | ||
|
||
The `DropboxLoader` allows you to load documents from Dropbox into your LangChain applications. It retrieves files or directories from your Dropbox account and converts them into documents ready for processing. | ||
|
||
## Overview | ||
|
||
Dropbox is a file hosting service that brings all your files—traditional documents, cloud content, and web shortcuts—together in one place. With the `DropboxLoader`, you can seamlessly integrate Dropbox file retrieval into your projects. | ||
|
||
## Setup | ||
|
||
1. Create a dropbox app, using the [Dropbox App Console](https://www.dropbox.com/developers/apps/create). | ||
2. Ensure the app has the `files.metadata.read`, `files.content.read` scope permissions: | ||
3. Generate the access token from the Dropbox App Console. | ||
4. To use this loader, you'll need to have Unstructured already set up and ready to use at an available URL endpoint. It can also be configured to run locally. | ||
See the docs [here](https://www.dropbox.com/developers/apps/create) for information on how to do that. | ||
5. Install the necessary packages: | ||
|
||
```bash npm2yarn | ||
npm install @langchain/community @langchain/core dropbox | ||
``` | ||
|
||
## Usage | ||
|
||
### Loading Specific Files | ||
|
||
To load specific files from Dropbox, specify the file paths: | ||
|
||
```typescript | ||
import { DropboxLoader } from "@langchain/community/document_loaders/web/dropbox"; | ||
|
||
const loader = new DropboxLoader({ | ||
clientOptions: { | ||
accessToken: "your-dropbox-access-token", | ||
}, | ||
unstructuredOptions: { | ||
apiUrl: "http://localhost:8000/general/v0/general", // Replace with your Unstructured API URL | ||
}, | ||
filePaths: ["/path/to/file1.txt", "/path/to/file2.pdf"], // Replace with file paths on Dropbox. | ||
}); | ||
|
||
const docs = await loader.load(); | ||
console.log(docs); | ||
``` | ||
|
||
### Loading Files from a Directory | ||
|
||
To load all files from a specific directory, provide the `folderPath` and set the `mode` to `"directory"`. Set `recursive` to `true` to traverse subdirectories: | ||
|
||
```typescript | ||
import { DropboxLoader } from "@langchain/community/document_loaders/web/dropbox"; | ||
|
||
const loader = new DropboxLoader({ | ||
clientOptions: { | ||
accessToken: "your-dropbox-access-token", | ||
}, | ||
unstructuredOptions: { | ||
apiUrl: "http://localhost:8000/general/v0/general", | ||
}, | ||
folderPath: "/path/to/folder", | ||
recursive: true, // Load documents found in subdirectories | ||
mode: "directory", | ||
}); | ||
|
||
const docs = await loader.load(); | ||
console.log(docs); | ||
``` | ||
|
||
### Streaming Documents | ||
|
||
To process large datasets efficiently, use the `loadLazy` method to stream documents asynchronously: | ||
|
||
```typescript | ||
import { DropboxLoader } from "@langchain/community/document_loaders/web/dropbox"; | ||
|
||
const loader = new DropboxLoader({ | ||
clientOptions: { | ||
accessToken: "your-dropbox-access-token", | ||
}, | ||
unstructuredOptions: { | ||
apiUrl: "http://localhost:8000/general/v0/general", | ||
}, | ||
folderPath: "/large/dataset", | ||
recursive: true, | ||
mode: "directory", | ||
}); | ||
|
||
for await (const doc of loader.loadLazy()) { | ||
// Process each document as it's loaded | ||
console.log(doc); | ||
} | ||
``` | ||
|
||
### Authentication with Environment Variables | ||
|
||
You can set the `DROPBOX_ACCESS_TOKEN` environment variable instead of passing the access token in `clientOptions`: | ||
|
||
```bash | ||
export DROPBOX_ACCESS_TOKEN=your-dropbox-access-token | ||
``` | ||
|
||
Then initialize the loader without specifying `accessToken`: | ||
|
||
```typescript | ||
import { DropboxLoader } from "@langchain/community/document_loaders/web/dropbox"; | ||
|
||
const loader = new DropboxLoader({ | ||
clientOptions: {}, | ||
unstructuredOptions: { | ||
apiUrl: "http://localhost:8000/general/v0/general", | ||
}, | ||
filePaths: ["/important/notes.txt"], | ||
}); | ||
|
||
const docs = await loader.load(); | ||
console.log(docs[0].pageContent); | ||
``` | ||
|
||
## Configuration Options | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just link to API refs instead There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this fine? dccce35 |
||
|
||
Here are the configuration options for the `DropboxLoader`: | ||
|
||
| Option | Type | Description | | ||
| --------------------- | ------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||
| `clientOptions` | `DropboxOptions` | Configuration options for initializing the Dropbox client, including authentication details. Refer to the [Dropbox SDK Documentation](https://dropbox.github.io/dropbox-sdk-js/Dropbox.html#Dropbox__anchor) for more information. | | ||
| `unstructuredOptions` | `UnstructuredLoaderOptions` | Options for the `UnstructuredLoader` used to process downloaded files. Includes the `apiUrl` for your Unstructured server. | | ||
| `folderPath` | `string` (optional) | The path to the folder in Dropbox from which to load files. Defaults to the root folder (`""`) if not specified. | | ||
| `filePaths` | `string[]` (optional) | An array of specific file paths in Dropbox to load. Required if `mode` is set to `"file"`. | | ||
| `recursive` | `boolean` (optional) | Indicates whether to recursively traverse folders when `mode` is `"directory"`. Defaults to `false`. | | ||
| `mode` | `"file"` or `"directory"` (optional) | The mode of operation. Set to `"file"` to load specific files or `"directory"` to load all files in a directory. Defaults to `"file"`. | | ||
|
||
## API References | ||
|
||
- [Dropbox SDK for JavaScript](https://github.com/dropbox/dropbox-sdk-js) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's emphasize somewhere that this wraps Unstructured
Should we call this
DropboxUnstructuredLoader
instead?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can rename the loader class to
DropboxUnstructuredLoader
I want to confirm if I need to rename the file to say
dropbox_unstructured.ts
as well?Also, I noticed that a few preexisting loaders utilize unstructured as well. Would they need to be renamed as well in the future?: