-
Notifications
You must be signed in to change notification settings - Fork 335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optional Content Groups improvements #51
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Also create separate make target to build docs without having to install them.
Use surrogate pairs in output as required. Also (thanks to Tor!) fix fmtquote_xml output. Thanks to Xybei for the report, and the first version of the patch.
Required for use of std::min in some custom methods.
First, add a comment. Second, all outline entries inserted into PDF are closed, regardless of the is_open setting. Thirdly, any insertion into an empty outline object no longer opens it.
The parser encounters an object that it hasn't read from an objstm before, loads it and finds it's a null object. It then thinks that because it's a NULL pointer, that it didn't actually read that object, and throws an error. To fix this, rewrite null 'o' objects as 'f' objects.
…ces." This reverts commit 2586767. The bug that this commit patches over was apparently fixed in another way later in commit 6f8cb56 for bug 699629. The check causes certain (broken) files to extract text worse than they could. Example file has code space ranges and character ranges given with space padded rather than zero padded hex strings: begincodespacerange < 4> <8f> endcodespacerange This is parsed (as per spec) as <40> <8f> which then means we lost all character mappings < 0x40. By ignoring the range checks, we now at least get characters 0x10 and up correct.
Incorrectly changed from MAX_OBJECT_NUMBER to pdf_xref_len in e767bd7 The pdf_xref_len is not valid during repair, because the current object being repaired may extend it! Compare with the implementation limit instead.
Instead of calling pdf_repair_obj_stms implicitly inside pdf_repair_trailer, call it explicitly at the appropriate call sites.
We should (in these cases at least) be checking for the use of the WIN32 API, not the particular compiler.
Currently, we are relying on the accurate bbox for a char and checking to see if a line crosses it. This will go wrong both for chars that are displaced from the origin (e.g. '^') and for underlines that don't cross chars without descenders. As a fix for this, we calculate an 'expanded' rectangle for each char. We extend the rectangle down to the baseline of the glyph, and then 25% of the size of the font below it. This fixes lines such as "Title 1" (underlined!), which contains no descenders, correctly being spotted as underlined chars.
The offsets in the loca table must be increasing, all offsets must be filled in, and the length of the glyph data is the difference between an offset and the next offset. The previous code didn't fill in the offsets of unused glyphs between used glyphs' offsets, this caused complaints when opening the font in freetype or fontforge. In addition the length computed for one glyph's data was not applied to the remapped glyph id, but to the glyph id after that one, causing the glyph appearance to be applied to another random glyph.
A hmtx table consists of two parts: * an array of entries with both advance widths and left bearing values * an array of entries with only left bearing values Combined, both arrays are meant to provide horizontal metrics for all the glyphs. The number of glyphs is parsed out from the font file's maxp table. The length of the first array is parsed out from the font file's hhea table. The length of the second array is computed by taking the number of glyphs and subtracting the number of entries in the first array. Previously, the first array length, stored in orig_num_long_hor_metrics and was capped by the hmtx table size, while the second array length, stored in max16, was capped both by the hmtx table size and the number of glyphs. This meant that max16 could end up being a lower value than orig_num_log_hor_metrics, in which case the second array entries were never updated. Three changes are provided in this commit: * Rename array lengths in the code to long_metrics and short_metrics respectively. * Use separate counter to count the number of entries in the first and the second arrays. * Make sure to cap both array lengths by the number of glyphs in the font.
Both 1487_-_right_margin_cut_off_when_printing.pdf font objects 7 0 R and 14 0 R and PDFIA1.7_SUBSET/CATX1641.pdf font objects 93 0 R, 99 0 R and 105 0 R contain symbolic fonts that all contain a single (3,0) cmap, while MuPDF previously only accepted (1,0) cmaps. The fix is to simply add a check for (3,0) cmaps and these two files can now be subset correctly.
2201_-_transparent_image_covers_background.pdf object 68 0 R has only a single (0,1) cmap using format 4. Now we can cope with that!
This fixes Coverity issue 430251.
Move vectors into line with chars. We now return a consistent representation. 'color' has been deliberately changed to 'argb' as a) it's clearer, and b) it forces people to update their code and not suddenly see failures with alpha being unexpectedly in the top 8 bits.
PyMuPDF builds use build directories containing `Py_LIMITED_API=*` which results in auto-dependency files containing things like: build/PyMuPDF-amd64-shared-tesseract-Py_LIMITED_API=0x030a0000-release/source/fitz/context.o: foo.h Unfortunately the `=` breaks thngs because it causes make to treat this as setting variable `build/PyMuPDF-amd64-shared-tesseract-Py_LIMITED_API` to `0x030a0000-release/source/fitz/context.o: foo.h`, not as a `<target>: <prerequisites>` rule. PyMuPDF will be changed to use build directories without the `=` such as build/PyMuPDF-amd64-shared-tesseract-Py_LIMITED_API_0x030a0000-release/, which we now accept.
Document.search(), Page.search(), and DisplayList.search() all suffered from the same bug where they are declared to return Quad[][] but the last call in the JNI wrappers called ArrayList.toArray() which returns Object[]. Desktop java accepted type of casting without any compilation or runtime complaints. Compiling for Android worked equally well, but searching in the app, the app crashed at runtime with: JNI DETECTED ERROR IN APPLICATION: attempt to return an instance of java.lang.Object[] from fitz.Quad[][] fitz.Page.search(java.lang.String). The fix for this problem is to at the end of the JNI wrappers call <T> T[] ArrayList.toArray(T[]). But even with this code change the app still crashes at runtime with: JNI DETECTED ERROR IN APPLICATION: the return type of CallVoidMethod does not match boolean java.util.ArrayList.add(java.lang.Object) Which was resolved by calling the function using CallBooleanMethod and handling the return value suitably. With these two changes searching now works well in both desktop Java and Android.
I had failed to update the code that calculates bboxes and does bidirectional ordering in the presence of structure tags.
Previously the segment length was represented by a signed integer, so when pdf_parse_jbig2_segment_header() parsed the segment length the value read from the file could be interpreted as negative. Later on in pdf_copy_jbig2_segments() a data pointer is incremented by the segment length, which could lead to an out of bounds pointer and a subsequent crash. This fix is to represent the parsed value as an 32-bit unsigned integer as mentioned in the specification.
This only affects document handlers that insist on file backed streams (i.e. sodochandler, currently). If we ask to convert a stream to be file backed, we may get the same stream back instantly (indicating that it is already file backed). Do NOT drop it in this case.
This enables people to spot 'invisible' text.
fz_convert_pixmap() returns a kept reference, but the C++ wrappers did not know this so did an additional keep, which results in a leak. The fix is to add name 'convert' in function_name_implies_kept_references().
The PDF operator stream in this file does: W <path> n /Image Do The filter code therefore currently tries to deal with the 'W' without a path being defined. Currently it bounds it, finds it is an empty rect, and therefore suppresses the image. According to the PDF spec though, W and W* are not supposed to look at the path at the point they are called. Instead they merely set it up so that a side effect of the next 'path painting operator' (in this case 'n') is to intersect the clip path AFTER than operator has taken place. This commit therefore changes the behaviour of the filter to follow this. All we do when we get a 'W' or a 'W*' is to set a new clip_op field. Whenever we then process a path painting operator, we look at this field, and process W or W* as appropriate. We are at pains to forward the W or W* operator AFTER the path has been processed, but before the path painting operator is sent. This should keep bugged renderers such as Preview happy. The has meant that the culler handling has changed slightly in that the culler now doesn't deal with CLIP paths separately to FILL or STROKE paths, but rather now as a combination (e.g. CLIP_FILL_PATH). Trying to deal differently with CLIP and FILL paths (for example) would probably have gone wrong in the past so there is no real downside here.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR solves a number of issues around Optional Content Groups (OCGs, aka "Layers"), including those reported in Bug 707999:
2e3298b — OCMD visibility is now computed correctly for
AllOn
andAnyOff
as well asAnyOn
andAllOff
(issue 2 in the bug report above).a313d32 — I fixed a bug in the
find_ocg
function that caused it to always return0
.e3dba5c — I added functions to query and select the default OCG configuration (issue 1 in the bug report above), in particular:
void pdf_default_layer_config_info(fz_context *ctx, pdf_document *doc, pdf_layer_config *info)
fetches the name and creator of the default configuration (likepdf_layer_config_info
).void pdf_select_default_layer_config(fz_context *ctx, pdf_document *doc)
activates the default layer configuration (likepdf_select_layer_config
).Furthermore:
pdf_layer_config_info
andpdf_select_layer_config
, so that when theconfig_num
/config
argument is-1
, they act on the default configuration.mudraw.c
so thatmutool draw -y l
also lists the default configuration, andmutool draw -y d
explicitly selects the default configuration.2c2cd91 — I added the missing logic to interpret visibility expressions (
VE
entry) inOCMD
dictionaries (issue 3 in the bug report).static
functions related to this as well (ve_new_stack
,ve_stack_pop
,ve_stack_push
,ve_stack_contains
,ve_stack_free
).pdf_is_ocg_hidden_imp
and passed around to nested calls topdf_is_ve_hidden
orpdf_is_ocg_hidden_imp
.da41599 — Wherever OCG objects are being compared, I replaced
pdf_objcmp_resolve
withpdf_objcmp
(so that the references are compared rather than the contents of the dictionary).2c54b56 — The logic to deal with radio button OCG UI elements was quite broken, I fixed it.
2c54b56 — I added functions to check whether link actions are visible and to "activate" actions with subtype
SetOCGState
.int pdf_is_link_hidden(fz_context *ctx, const char *usage, fz_link* link)
checks the visibility of the OCG defined by theOC
entry of the link dictionary.int pdf_is_link_set_ocg_state(fz_link* link)
checks whether apdf_link
object represents an action with subtypeSetOCGState
or a "normal" link.void pdf_activate_link_set_ocg_state(fz_context *ctx, pdf_document *doc, fz_link *link)
activates a link that represents an action with subtypeSetOCGState
, thus enabling/disabling/toggling OCGs as specified in theState
entry of the action dictionary.mupdf-gl
andmupdf-x11
to use these new functions, so that hidden links do not react to mouse events andSetOCGState
actions do what they are supposed to do when clicking on them.Here are a few test PDFs:
OptionalContentGroups.pdf
: basic example, with three OCGs and objects whose visibility is controlled by a single OCG or a visibility expression.OptionalContentGroupsTree.pdf
: a document with an OCG UI tree with radio buttons.OptionalContentGroupsControls.pdf
: a document with "buttons" and "check boxes" to control OCG visibility (implemented asSetOCGState
actions).NoughtsCrosses.pdf
: a document with complex logic that can be used by two humans to play noughts and crosses (tic-tac-toe).