-
Notifications
You must be signed in to change notification settings - Fork 744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when importing larger PDF files #50
Comments
Interesting! Thanks for the issue. It looks like the chunks are too big for OpenAI's ADA model (max 8192 tokens, but somehow one chunks seems to be 23941 tokens). What are your chunking settings? |
@thomashacker Chunking settings were 250 tokens with 50 overlap. Error should be easily replicable by just uploading long pdf files. |
@thomashacker I opened #55 which ended up being duplicative of the issue here. I added some of my investigation info in that issue, but tldr the issue isn't in the chunks, its in the document upload itself before the chunking can occur. |
This is interesting @cam-barts ! Thank you so much for your investigation, this really helped. The document schema should not be vectorized in the first placed, since it only acts as a document store for Verba. This is definitely a bug, I'll look into it and fix it for the next release |
I encounter the same problem when starting Verba in a virtualenv with No problem when using the weaviate embedded. I had exactly the same problem with verba 0.2. |
I have the same issue |
Thanks everyone, I found the issue! The Docker configuration was set to automatically use the |
Now it works for me, I had to clean all volumes, images, etc, and then a fresh environment and the new code did the trick. |
That's great to hear! |
When trying to import larger PDF files (tested with upwards of 20 pages) using ADAEmbedder I'm seeing the following error in the front end and in the console.
However the embeddings somehow seem to be generated since asking questions for that context works. But the uploaded document isn't showing in the frontend under the documents section.
The text was updated successfully, but these errors were encountered: