Skip to content

Commit

Permalink
Change xref representation to cope better with sparse xrefs.
Browse files Browse the repository at this point in the history
Currently each xref in the file results in an array from 0 to
num_objects. If we have a file that has been updated many times
this causes a huge waste of memory.

Instead we now hold each xref as a list of non-overlapping subsections
(exactly as the file holds them).

Lookup is therefore potentially slower, but only on files where the
xrefs are highly fragmented (i.e. where we would be saving in memory
terms).

Some parts of our code (notably the file writing code that does
garbage collection etc) assumes that lookups of object entry pointers
will not change previous object entry pointers that have been
looked up. To cope with this, and to cope with the case where we are
updating/creating new objects, we introduce the idea of a 'solid'
xref.

A solid xref is one where it has a single subsection record that spans
the entire range of valid object numbers for a file. Once we have
ensured that an xref is 'solid', we can safely work on the pointers
within it without fear of them moving.

We ensure that any 'incremental' xref is solid.

We also ensure that any non-incremental write makes the xref solid.
  • Loading branch information
robinwatts committed Nov 26, 2014
1 parent 37779f9 commit e767bd7
Show file tree
Hide file tree
Showing 5 changed files with 345 additions and 115 deletions.
1 change: 1 addition & 0 deletions include/mupdf/pdf/document.h
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,7 @@ struct pdf_document_s
pdf_ocg_descriptor *ocg;
pdf_hotspot hotspot;

int max_xref_len;
int num_xref_sections;
pdf_xref *xref_sections;
int xref_altered;
Expand Down
14 changes: 12 additions & 2 deletions include/mupdf/pdf/xref.h
Original file line number Diff line number Diff line change
Expand Up @@ -48,10 +48,20 @@ enum
PDF_OBJ_FLAG_MARK = 1,
};

struct pdf_xref_s
typedef struct pdf_xref_subsec_s pdf_xref_subsec;

struct pdf_xref_subsec_s
{
pdf_xref_subsec *next;
int len;
int start;
pdf_xref_entry *table;
};

struct pdf_xref_s
{
int num_objects;
pdf_xref_subsec *subsec;
pdf_obj *trailer;
pdf_obj *pre_repair_trailer;
};
Expand Down Expand Up @@ -89,7 +99,7 @@ int pdf_xref_is_incremental(pdf_document *doc, int num);
void pdf_repair_xref(pdf_document *doc, pdf_lexbuf *buf);
void pdf_repair_obj_stms(pdf_document *doc);
pdf_obj *pdf_new_ref(pdf_document *doc, pdf_obj *obj);

void pdf_ensure_solid_xref(pdf_document *doc, int num);
void pdf_mark_xref(pdf_document *doc);
void pdf_clear_xref(pdf_document *doc);
void pdf_clear_xref_to_mark(pdf_document *doc);
Expand Down
6 changes: 4 additions & 2 deletions source/pdf/pdf-repair.c
Original file line number Diff line number Diff line change
Expand Up @@ -204,7 +204,7 @@ pdf_repair_obj_stm(pdf_document *doc, int num, int gen)
fz_warn(ctx, "ignoring object with invalid object number (%d %d R)", n, i);
continue;
}
else if (n > MAX_OBJECT_NUMBER)
else if (n >= pdf_xref_len(doc))
{
fz_warn(ctx, "ignoring object with invalid object number (%d %d R)", n, i);
continue;
Expand Down Expand Up @@ -455,7 +455,9 @@ pdf_repair_xref(pdf_document *doc, pdf_lexbuf *buf)
Dummy access to entry to assure sufficient space in the xref table
and avoid repeated reallocs in the loop
*/
(void)pdf_get_populating_xref_entry(doc, maxnum);
/* Ensure that the first xref table is a 'solid' one from
* 0 to maxnum. */
pdf_ensure_solid_xref(doc, maxnum);

for (i = 0; i < listlen; i++)
{
Expand Down
8 changes: 8 additions & 0 deletions source/pdf/pdf-write.c
Original file line number Diff line number Diff line change
Expand Up @@ -2632,9 +2632,17 @@ void pdf_write_document(pdf_document *doc, char *filename, fz_write_options *fz_
opts.rev_gen_list[num] = pdf_get_xref_entry(doc, num)->gen;
}

if (opts.do_incremental && opts.do_garbage)
fz_throw(ctx, FZ_ERROR_GENERIC, "Can't do incremental writes with garbage collection");
if (opts.do_incremental && opts.do_linear)
fz_throw(ctx, FZ_ERROR_GENERIC, "Can't do incremental writes with linearisation");

/* Make sure any objects hidden in compressed streams have been loaded */
if (!opts.do_incremental)
{
pdf_ensure_solid_xref(doc, xref_len);
preloadobjstms(doc);
}

/* Sweep & mark objects from the trailer */
if (opts.do_garbage >= 1)
Expand Down
Loading

0 comments on commit e767bd7

Please sign in to comment.