Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Connect S3 as file loader #829

Open
vkehfdl1 opened this issue Oct 11, 2024 · 2 comments
Open

[Feature Request] Connect S3 as file loader #829

vkehfdl1 opened this issue Oct 11, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@vkehfdl1
Copy link
Contributor

Is your feature request related to a problem? Please describe.
I want to connect Amazon S3 as the file loader to parse to the VectorDB.

Describe the solution you'd like
Use Langchain or LlamaIndex (or something better) one to connect many document source to parsing.

Describe alternatives you've considered
We can use other library, like liteLLM for getting documents.

@vkehfdl1 vkehfdl1 added the enhancement New feature or request label Oct 11, 2024
@vkehfdl1 vkehfdl1 self-assigned this Oct 11, 2024
@vkehfdl1
Copy link
Contributor Author

As alternative, we can build example jupyter notebook.

@vkehfdl1
Copy link
Contributor Author

To support AWS well, I think it is better to use fsspec. Unified interface for loading files!
We are now only support pdf, so loading pdf files from all kinds of file system.

Below is the full fsspec supported protocol.

It contains dropbox, google drive, S3, even jupyter & github!

['abfs',
 'adl',
 'arrow_hdfs',
 'asynclocal',
 'az',
 'blockcache',
 'box',
 'cached',
 'dask',
 'data',
 'dbfs',
 'dir',
 'dropbox',
 'dvc',
 'file',
 'filecache',
 'ftp',
 'gcs',
 'gdrive',
 'generic',
 'git',
 'github',
 'gs',
 'hdfs',
 'hf',
 'http',
 'https',
 'jlab',
 'jupyter',
 'lakefs',
 'libarchive',
 'local',
 'memory',
 'oci',
 'ocilake',
 'oss',
 'reference',
 'root',
 's3',
 's3a',
 'sftp',
 'simplecache',
 'smb',
 'ssh',
 'tar',
 'wandb',
 'webdav',
 'webhdfs',
 'zip']

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant