Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] 分块失败 #4975

Open
wangxiaodong1021 opened this issue Dec 11, 2024 · 4 comments
Open

[Bug] 分块失败 #4975

wangxiaodong1021 opened this issue Dec 11, 2024 · 4 comments
Labels
🐛 Bug Something isn't working | 缺陷 files 上传文件/知识库

Comments

@wangxiaodong1021
Copy link

📦 Environment

Zeabur

📌 Version

v1.36.6

💻 Operating System

Ubuntu

🌐 Browser

Edge

🐛 Bug Description

将一个网页保存为 pdf,分块失败,pdf 在知识库里可以正常打开查阅。且之前的别的 pdf 都正常,我将 pdfzuo'wei 作为附件提供了
image
Diffusion Meets Flow Matching.pdf

📷 Recurrence Steps

No response

🚦 Expected Behavior

No response

📝 Additional Information

No response

@wangxiaodong1021 wangxiaodong1021 added the 🐛 Bug Something isn't working | 缺陷 label Dec 11, 2024
@github-project-automation github-project-automation bot moved this to Roadmap - Chat 1.x in Lobe Chat Routine Dec 11, 2024
@lobehubbot
Copy link
Member

👀 @wangxiaodong1021

Thank you for raising an issue. We will investigate into the matter and get back to you as soon as possible.
Please make sure you have given us as much context as possible.
非常感谢您提交 issue。我们会尽快调查此事,并尽快回复您。 请确保您已经提供了尽可能多的背景信息。

Copy link

dosubot bot commented Dec 11, 2024

@wangxiaodong1021 你好!我是Dosu,一个可以帮助你解决 bug、回答问题,甚至成为贡献者的机器人。我会在这里帮助你,等待人类维护者的到来。期待与你一起解决问题!🤖

The error message "invalid byte sequence for encoding 'UTF8': 0x00" suggests that there might be an issue with the character encoding of the PDF file when attempting to process it. This could be due to non-UTF8 characters or null bytes in the file.

Here are a few steps you can take to troubleshoot and potentially resolve this issue:

  1. Check the PDF File: Ensure that the PDF file does not contain any corrupted data or non-standard characters that might cause encoding issues.

  2. Encoding Conversion: Try converting the PDF file to ensure it is encoded in UTF-8. You can use tools like iconv or similar to convert the file encoding.

  3. Update Dependencies: Make sure that all dependencies related to PDF processing in your environment are up to date. The lobe-chat project uses libraries such as pdf-parse, pdfjs-dist, and react-pdf for PDF handling, which might need updates [1].

  4. Inspect Logs: Check the application logs for any additional error messages or stack traces that might provide more context about the failure.

  5. Test with Other PDFs: Since other PDFs are working fine, compare the problematic PDF with those that work to identify any differences in structure or content.

If the issue persists, you might need to delve deeper into the specific PDF processing logic in the lobe-chat codebase or consider reaching out to the maintainers for more detailed support.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@dosubot dosubot bot added the files 上传文件/知识库 label Dec 11, 2024
@157374010
Copy link

附议,我也遇到了同样的问题同样的报错。Window11,Chrome浏览器

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Seconded, I also encountered the same problem and the same error. Window11, Chrome browser

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 Bug Something isn't working | 缺陷 files 上传文件/知识库
Projects
Status: Roadmap - Chat 1.x
Development

No branches or pull requests

3 participants