Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Non HTML tags #91

Closed
MohanVijayakumar opened this issue Oct 10, 2016 · 2 comments
Closed

Allow Non HTML tags #91

MohanVijayakumar opened this issue Oct 10, 2016 · 2 comments

Comments

@MohanVijayakumar
Copy link

MohanVijayakumar commented Oct 10, 2016

Hi,

I have the string like below in my input
The diameter starts <MM 78 > radial
In the above string i get only The diameter starts. Is there anyway to get the string as full as it does not contain any HTML tags

Thanks

@mganss
Copy link
Owner

mganss commented Oct 11, 2016

The HTML parser parses the input as The diameter starts <mm 78=""> radial</mm>, then strips the mm element. If there was a space before the mm, i.e. < mm, you'd get The diameter starts &lt; MM 78 &gt; radial.

I currently see no elegant way out of this dilemma. You could pre-process your input by inserting a zero width space (&#8203;) after every < which is not followed by what you consider a HTML tag name or slash.

@mganss
Copy link
Owner

mganss commented Nov 10, 2016

I came across this problem myself recently so I hacked up a gist.

A few things to note:

  • The HTML elements recognized are a combination of HTML 4 and HTML 5.
  • Tags are matched with a regex which may fail if you've got < within attributes (although I'm not sure this is really a problem in this particular case since we're only replacing < with &lt;).
  • When a "non-HTML" element is encountered, the < is replaced by &lt; which causes the sanitizer to not recognize it as an HTML element. I think that's less hacky than the approach I suggested above.
  • Obviously this will not take care of cases where the input contains actual HTML elements which are not intended to be recognized as such.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants