Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allowing offset xref table with pdf parser #277

Closed
wants to merge 1 commit into from

Conversation

galkahana
Copy link
Owner

attempting to solve #274.
in the presented case there's an offset xref, in that it is positioned 8 bytes later than where the declared position is. This is also true for the rest of the positions within the xref.

this solution tries to parse later, up to 1024 bytes, for a next position that might be an xref start, per what is considered an xref (xref symbol or integer). the offset is stored as additional offset (to potential initial header offset) so that the rest of the read positions can be translated to the correct ones...again assuming that offset is constant.

@huangtiansama
Copy link

image
image
image
have a little error ,by call AppendPDFPagesFromPDF function in this version

@galkahana
Copy link
Owner Author

I see. it fails on parsing object 1, which is at offset 15 as the xref provides it and is also actually in offset 15.
ok. so it's inconsistent in its offset.
@huangtiansama i think im abandoning this, and will be closing the PR. it's not helpful.
This PDF is faulty. got a bad xref which needs to be corrected.

There's either of two options that i propose (or both):

  1. you can fix the PDF xref by opening it in adobe acrobat. when later closing it Acrobat will propose to you to save the file. The reason is that it is faulty and acrobat "fixed" it. save the corrected file and then you can use it with PDFWriter with no required changes
  2. contact whoever provided you with the PDF, let them know what tool they use creates faulty PDFs

@galkahana galkahana closed this Aug 20, 2024
@huangtiansama
Copy link

I see. it fails on parsing object 1, which is at offset 15 as the xref provides it and is also actually in offset 15. ok. so it's inconsistent in its offset. @huangtiansama i think im abandoning this, and will be closing the PR. it's not helpful. This PDF is faulty. got a bad xref which needs to be corrected.

There's either of two options that i propose (or both):

  1. you can fix the PDF xref by opening it in adobe acrobat. when later closing it Acrobat will propose to you to save the file. The reason is that it is faulty and acrobat "fixed" it. save the corrected file and then you can use it with PDFWriter with no required changes
  2. contact whoever provided you with the PDF, let them know what tool they use creates faulty PDFs

OK,thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants