feat: allowing offset xref table with pdf parser #277

galkahana · 2024-08-19T13:18:29Z

attempting to solve #274.
in the presented case there's an offset xref, in that it is positioned 8 bytes later than where the declared position is. This is also true for the rest of the positions within the xref.

this solution tries to parse later, up to 1024 bytes, for a next position that might be an xref start, per what is considered an xref (xref symbol or integer). the offset is stored as additional offset (to potential initial header offset) so that the rest of the read positions can be translated to the correct ones...again assuming that offset is constant.

huangtiansama · 2024-08-20T01:43:30Z

have a little error ,by call AppendPDFPagesFromPDF function in this version

galkahana · 2024-08-20T04:55:02Z

I see. it fails on parsing object 1, which is at offset 15 as the xref provides it and is also actually in offset 15.
ok. so it's inconsistent in its offset.
@huangtiansama i think im abandoning this, and will be closing the PR. it's not helpful.
This PDF is faulty. got a bad xref which needs to be corrected.

There's either of two options that i propose (or both):

you can fix the PDF xref by opening it in adobe acrobat. when later closing it Acrobat will propose to you to save the file. The reason is that it is faulty and acrobat "fixed" it. save the corrected file and then you can use it with PDFWriter with no required changes
contact whoever provided you with the PDF, let them know what tool they use creates faulty PDFs

huangtiansama · 2024-08-20T05:39:44Z

I see. it fails on parsing object 1, which is at offset 15 as the xref provides it and is also actually in offset 15. ok. so it's inconsistent in its offset. @huangtiansama i think im abandoning this, and will be closing the PR. it's not helpful. This PDF is faulty. got a bad xref which needs to be corrected.

There's either of two options that i propose (or both):

you can fix the PDF xref by opening it in adobe acrobat. when later closing it Acrobat will propose to you to save the file. The reason is that it is faulty and acrobat "fixed" it. save the corrected file and then you can use it with PDFWriter with no required changes

contact whoever provided you with the PDF, let them know what tool they use creates faulty PDFs

OK,thanks

feat: allowing offset xref table with pdf parser

92015a0

galkahana mentioned this pull request Aug 19, 2024

PDFParser::ParseFileDirectory,Unexpected object at xref start #274

Closed

galkahana closed this Aug 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: allowing offset xref table with pdf parser #277

feat: allowing offset xref table with pdf parser #277

galkahana commented Aug 19, 2024

huangtiansama commented Aug 20, 2024

galkahana commented Aug 20, 2024

huangtiansama commented Aug 20, 2024

feat: allowing offset xref table with pdf parser #277

feat: allowing offset xref table with pdf parser #277

Conversation

galkahana commented Aug 19, 2024

huangtiansama commented Aug 20, 2024

galkahana commented Aug 20, 2024

huangtiansama commented Aug 20, 2024