Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fix sequence item 2: expected str instance, NoneType found exception when table output is set to markdown. #27

Merged
merged 3 commits into from
Apr 19, 2024

Conversation

ic-xu
Copy link
Contributor

@ic-xu ic-xu commented Apr 19, 2024

behavior:

I get an exception as follows:

/python3.10/site-packages/openparse/tables/pymupdf/parse.py", line 25, in output_to_markdown
 markdown_output = "| " + " | ".join(headers) + " |\n"
TypeError: sequence item 2: expected str instance, NoneType found

When parsing PDF tables, the output format is set to

table_args={
 "parsing_algorithm": "pymupdf",
 "table_output_format": "markdown"
 }

After analysis, I found that the reason may be the following:
When the headers of the table are:

header = ['(See Note 11)', '', None, None]

Then execute the following code

 markdown_output = "| " + " | ".join(headers) + " |\n"
 markdown_output += "|---" * len(headers) + "|\n"

You will get the following error

/python3.10/site-packages/openparse/tables/pymupdf/parse.py", line 25, in output_to_markdown
 markdown_output = "| " + " | ".join(headers) + " |\n"
TypeError: sequence item 2: expected str instance, NoneType found

So my solution is to replace None with ' ' to solve this problem

@Filimoa Filimoa merged commit 106465d into Filimoa:main Apr 19, 2024
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants