Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to modify pdf while parsing rotation of pages #154

Open
gnat42 opened this issue Feb 15, 2019 · 1 comment
Open

Unable to modify pdf while parsing rotation of pages #154

gnat42 opened this issue Feb 15, 2019 · 1 comment

Comments

@gnat42
Copy link

gnat42 commented Feb 15, 2019

I'm 99% sure I'm doing something wrong here but can't seem to combine the multiple examples you have here into a working piece of code.

Background. I'm trying to take a given PDF and write text and images to it. Now I ran into a situation where some documents we were receiving were created rotated. I tried to manually rotate them before we processed them, however learned that it doesn't actually change anything about the PDF itself, just sets the rotation in some kind of page property. What it does for me is when I place text it, it comes out rotated with respect to the page if you get my meaning. Thus I need to check the rotation of the page and write my text with identical rotation for it to all work out. I tried modifying my currently working code to just output the page rotation however it causes a segfault. I've included my code and the backtrace.

Relevant code

void PdfWriter::writePdf(Php::Parameters &params) {
      PDFParser parser = writer.GetModifiedFileParser();
      InputFile file = writer.GetModifiedInputFile();

    // Iterate and print keys and values of unordered_map
    for (const auto &n : pageText ) {
      RefCountPtr<PDFDictionary> page = parser.ParsePage(n.first);

      if (!page) {
          Php::out << "Failed to parse page" << std::endl;
      } else {
          PDFPageInput input(&parser, page);
          Php::out << "Rot: "<< input.GetRotate() <<std::endl;
      }

        PDFModifiedPage thePage(&writer, n.first, true);
        AbstractContentContext* contentContext = thePage.StartContentContext();

        if (contentContext) {
            // iterate over each PdfText point
            for (const auto &iter : n.second) {
                this->writeText(iter, contentContext);
                delete iter; //deleted because it was new PdfText() in our writeTextToPage calls
            }

            // see if there are images destined for this page and write them at the same time
            auto images = pageImages.find(n.first);

            if (images != pageImages.end()) {
                for (const auto& i : images->second) {
                    this->writeImage(i,contentContext);
                    delete i;
                }

                pageImages.erase(images);
            }

            thePage.EndContentContext();
        }

    // causes segfault #1 
    file.CloseFile();
    pageText.clear();
    pageImages.clear();
    // causes segfault #2
    writer.EndPDF();

Segfault - 1

#0  0x00007fa052ba8df2 in PDFParser::MovePositionInStream (this=this@entry=0x55ba00c06750, inPosition=46930)
    at /usr/src/debug/pdf-writer-4.0-1.fc29.x86_64/PDFWriter/PDFParser.cpp:1557
#1  0x00007fa052baacc4 in PDFParser::ParseExistingInDirectObject (this=0x55ba00c06750, inObjectID=25) at /usr/src/debug/pdf-writer-4.0-1.fc29.x86_64/PDFWriter/PDFParser.cpp:714
#2  0x00007fa052b71f70 in PDFHummus::DocumentContext::GetOriginalDocumentPageTreeRoot(PDFParser*) ()
    at /usr/src/debug/pdf-writer-4.0-1.fc29.x86_64/PDFWriter/DocumentContext.cpp:2354
#3  0x00007fa052b7786b in PDFHummus::DocumentContext::FinalizeModifiedPDF (this=this@entry=0x55ba00c06048, inModifiedFileParser=inModifiedFileParser@entry=0x55ba00c06750, 
    inModifiedPDFVersion=ePDFVersion14) at /usr/src/debug/pdf-writer-4.0-1.fc29.x86_64/PDFWriter/DocumentContext.cpp:2262
#4  0x00007fa052bb2e01 in PDFWriter::EndPDF (this=0x55ba00c05fd0) at /usr/src/debug/pdf-writer-4.0-1.fc29.x86_64/PDFWriter/PDFWriter.cpp:99
#5  0x00007fa052d36d54 in PdfWriter::writePdf(Php::Parameters&) () from /usr/lib64/php/modules/pdf.so

Segfault - 2

#0  0x00007f09dfe02998 in fclose@@GLIBC_2.2.5 () from /lib64/libc.so.6
#1  0x00007f09cebed711 in InputFileStream::Close (this=0x55bb8ead2f60) at /usr/src/debug/pdf-writer-4.0-1.fc29.x86_64/PDFWriter/InputFileStream.cpp:52
#2  0x00007f09cebed2dc in InputFile::CloseFile (this=0x7ffd8e6cc550) at /usr/src/debug/pdf-writer-4.0-1.fc29.x86_64/PDFWriter/InputFile.cpp:75
#3  InputFile::CloseFile (this=0x7ffd8e6cc550) at /usr/src/debug/pdf-writer-4.0-1.fc29.x86_64/PDFWriter/InputFile.cpp:67
#4  0x00007f09cebed331 in InputFile::~InputFile (this=0x7ffd8e6cc550, __in_chrg=<optimized out>) at /usr/src/debug/pdf-writer-4.0-1.fc29.x86_64/PDFWriter/InputFile.cpp:36
#5  0x00007f09ced9e8d2 in PdfWriter::writePdf(Php::Parameters&) () from /usr/lib64/php/modules/pdf.so
@gnat42
Copy link
Author

gnat42 commented Feb 15, 2019

I re-review my code and ended up with this that fails as well. I added

    InputFile file   = writer.GetModifiedInputFile();
    status             = parser.StartPDFParsing(file.GetInputStream());

    if(status != PDFHummus::eSuccess)
    {
        Php::out<<"unable to parse input file: "<< status <<std::endl;
        throw Php::Exception("Unable to parse input file");
    }

which results in a failed status (-1). Not sure why. The PDFWriter object is initialized via writer.ModifyPDF(..).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant