Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix magic detection for HTML with <svg #74

Merged
merged 1 commit into from
Aug 10, 2022

Conversation

ursm
Copy link
Contributor

@ursm ursm commented Aug 5, 2022

Fixed a problem in which HTML files containing <svg were misidentified as SVG.

At first glance, this problem appears to be solved by adjusting the value of the priority attribute in custom.xml. However, due to the special treatment of common_types in script/generate_tables.rb, it was actually necessary to rearrange common_types itself.

Fixes #67

Comment on lines -2414 to +2415
['application/msword', [[0..8, b["\320\317\021\340\241\261\032\341"], [[546, b['jbjb']], [546, b['bjbj']]]]]],
['application/msword', [[2080, b['Microsoft Word 6.0 Document']], [2080, b['Documento Microsoft Word 6']], [2112, b['MSWordDoc']], [0, b["1\276\000\000"]], [0, b['PO^Q`']], [0, b["\3767\000#"]], [0, b["\333\245-\000\000\000"]], [0, b["\224\246."]], [0..8, b["\320\317\021\340\241\261\032\341"], [[1152..4096, b["W\000o\000r\000d\000D\000o\000c\000u\000m\000e\000n\000t"]]]]]],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines appear to have been replaced with L2502-L2503. I have no idea why this is happening.

@rafaelfranca rafaelfranca merged commit 8e28563 into rails:main Aug 10, 2022
@tricknotes
Copy link

@rafaelfranca
Could you release a new version?
I want to use this change as a released version.

sjoulbak added a commit to sjoulbak/marcel that referenced this pull request Mar 4, 2024
This issue is introduced after merging rails#74 and released at 1.0.3.
The title element provides an accessible, short-text description
of the SVG and so cannot be used to determine whether an element
is text/html.
@jeremy
Copy link
Member

jeremy commented Mar 6, 2024

Note this PR modifies the autogenerated tables directly. It'll be overwritten the next time Tika data is updated.

I'll see whether I can fix it but will revert otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

image/svg+xml returned for an html document with svg in it
4 participants