Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vulnerable Regular Expressions #212

Open
cristianstaicu opened this issue Sep 6, 2017 · 31 comments
Open

Vulnerable Regular Expressions #212

cristianstaicu opened this issue Sep 6, 2017 · 31 comments

Comments

@cristianstaicu
Copy link

The following regular expressions used in underscore and unescapeHTML methods are vulnerable to ReDoS:

/([A-Z\d]+)([A-Z][a-z])/g
/\&([^;]+);/g

The slowdown is moderately low (for 50,000 characters around 2 seconds matching time). I would suggest one of the following:

  • remove the regex,
  • anchor the regex,
  • limit the number of characters that can be matched by the repetition,
  • limit the input size.

If needed, I can provide an actual example showing the slowdown.

@kvanbere
Copy link

anchor the regex

Could you please elaborate on the fix, I'm curious. Thank you!

@cristianstaicu
Copy link
Author

Sorry, that was a standard comment I sent around. In this case it would not work! The anchoring solution is well suited when the regex is used for validation purposes. Let us say you want to check an input string is an email address. Instead of having a simple regex like:

/[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]/

you should have something like:

/^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]$/

Word bounadries can also be used as explained here:
https://www.regular-expressions.info/email.html

This will decrease the complexity of the matching since the engine will only try to match the regex once, not n (length of input string) times like in the case of the first regex.

@kvanbere
Copy link

Thanks for the explanation!

I hope this gets fixed soon, it's giving my package red lights :(

@burt202
Copy link

burt202 commented Sep 26, 2017

This issue now seems to being picked up by an nsp check making our builds red too. See https://nodesecurity.io/advisories/536

@danzGentrack
Copy link

danzGentrack commented Sep 29, 2017

The slowdown is moderately low (for 50,000 characters around 2 seconds matching time).

Just wondering why it is a problem? For example, the string "underscore" algorithm could be a very complex one and may need lots of seconds to process a 50,000 characters long string.

@cristianstaicu
Copy link
Author

cristianstaicu commented Sep 29, 2017

I recommend the following two articles for understanding the issue:
https://blog.liftsecurity.io/2014/11/03/regular-expression-dos-and-node.js/
https://dl.acm.org/citation.cfm?id=3065916

Basically, the problem comes from the fact that the regex matching is done in the main loop of Node.js and thus blocking this loop for 2 seconds is equivalent to blocking all the incoming requests for 2 seconds.

@danzGentrack
Copy link

Thanks for the info. The example in the first link is a classic evil regex and it can be written in a more efficient way.
var emailExpression = /^([a-zA-Z0-9_.-])+@(([a-zA-Z0-9-])+.)+([a-zA-Z0-9]{2,4})+$/;

The regex used in the "underscore" case is fine: /([A-Z\d]+)([A-Z][a-z])/g , which may not incur much performance penalty compared to an alternative implementation. I understand the impact of slowing regex matching to an application. My point is that if a regular expression is used correctly, then it is not a problem to be slow.

@revelt
Copy link

revelt commented Oct 1, 2017

For posterity, Snyk also reports it:
https://snyk.io/vuln/npm:string:20170907

I'm looking forward to solution. The html-stripping features, for example, at the competition are very poor and I can't find the alternative for String.js..

@ericnorris
Copy link

ericnorris commented Oct 2, 2017

Hey there! I haven't looked too closely at this issue but it was linked by @revelt above. I'd definitely suggest using the striptags library for removing HTML. It is based on the PHP function strip_tags, which is widely known to be safe.

In the README I link to two articles, one indicating the regular expressions are not safe for removing dangerous HTML (here), and the second demonstrating that PHP's strip_tags function is not vulnerable (here).

https://github.com/ericnorris/striptags take a look to see if that works for you - it uses a state machine very similar to the one that is in the PHP internals instead of a regular expression. As a result it is super fast and just as secure!

@stevenvachon
Copy link

This is a critical issue. The author and the contributors don't seem to care at all. Kill this project with fire.

@revelt
Copy link

revelt commented Oct 11, 2017

@Steve come on, it's not that bad :) team is just regrouping, I'm sure guys will fix that eventually

smarusa pushed a commit to smarusa/swagger-tools that referenced this issue Oct 17, 2017
smarusa added a commit to smarusa/swagger-tools that referenced this issue Oct 17, 2017
@yieme
Copy link

yieme commented Oct 18, 2017

@jprichardson, string.js is a nice library and I would prefer to continue using it; however, such a vulnerability isn't something that can be used in production. Are you aware of this issue or has the project been abandoned as the last update was about 11 months ago?

@nconf-base
Copy link

@revelt, I can appreciate the concerns raised by @Steve and @yieme correctly points out lack of update activity for almost a year. This vulnerability has been live for over a month. At what point should we consider "eventually" unacceptable?

whitlockjc pushed a commit to apigee-127/swagger-tools that referenced this issue Oct 18, 2017
#542)

* Remove stringjs dependency due to vulnerability in string 3.3. It is used so little there is no need for the extra dependency in Swagger-tools.

Source: CERT
Name: https://nodesecurity.io/advisories/536
Url: https://nodesecurity.io/advisories/536
Source: CERT
Name: jprichardson/string.js#212
Url: jprichardson/string.js#212

* Suggested changes.
@revelt
Copy link

revelt commented Oct 18, 2017

@nconf-base I think now, because if you have string.js in production, your code is vulnerable now. Personally, I'm going to switch to alternatives asap and equally come back asap when it's fixed.

As far as my stuff is concerned, I already replaced every string.js methods with alternatives on my libraries except I can't find a good quality, nothing-assumed tool to strip HTML tags from strings. Within October I plan to release the alternative, string-only, no-regex library which strips HTML, which will untie me completely from string.js.

I'm thinking, it can't be that all the methods of string.js are affected, can it? If some are OK, maybe we could split string.js into separate libs like lodash does with lodash.* and so on? This way, affected methods/libraries would be isolated and it would be easier to pinpoint the culprits. You know, divide and conquer approach...

@nconf-base
Copy link

@revelt, I think you are correct the issue only affects a couple of the methods, specifically underscore and unescapeHTML, as per https://nodesecurity.io/advisories/536

I think your idea of decomposing the larger string.js library into smaller pieces might help those who use the library without those particular methods. Splitting out the problem methods on a fork might be a start.

How does your strip tags method differ from https://github.com/ericnorris/striptags that eric pointed out?

@revelt
Copy link

revelt commented Oct 18, 2017

@nconf-base it's assuming too much what results in bunch of false positives. I just scratched the surface with that issue, many of my unit tests fail because of striptags if I switch. Same with other similarly-named libraries from the first npm results page, I raised issues on at least two others.

For the record, string.js copes fine stripping HTML from stuff like aaa<bbb.

@ericnorris
Copy link

@revelt, I'm not sure I would agree that striptags results in "false positives". Any function that ostensibly strips HTML must assume all input is HTML, and as such must remove tags, incomplete or not.

Leaving aaa<bbb may result in unsafe HTML if it is concatenated with stuff on the page. A more concrete example is if it is fed aaa<script src="evil.js". The tag is incomplete, but when placed on a page (or concatenated with other safe HTML) may end up producing valid HTML that results in evil.js executing. That's not acceptable for a library responsible for removing HTML.

If the input is "safe", then < should be HTML encoded, e.g. aaa&lt;bbb.

@nconf-base I would strongly suggest using striptags! I am, of course, biased, but it is based on the very popular function from PHP (http://php.net/manual/en/function.strip-tags.php) and is battle-tested.

@revelt
Copy link

revelt commented Oct 18, 2017

@ericnorris I see your point. Basically, there are two kinds of HTML-stripping cases and we both are proponents of each.

My case is Detergent.js, email text copy cleaning app - HTML can come in the input, there is high risk of false positives, but zero chance of rogue code being executed and code if does come, it comes unescaped. This way, code a < b and c > d is very likely and legit (consider even b as valid tag name).

Your case is a regular web-dev ops where you clean HTML in order to defuse potentially malicious code. Prioritising security over risk of false positives. All code assumed to be escaped and unescaped-one a no-no.

String.js was this middle-ground solution, being able to detect aaa<bbb correctly, yet, at the same time, reliably strip malicious HTML for years in everybody's apps. It tended both sides. I'd gladly use striptags if it tended both sides as well.

Do you see my point?

@ghost
Copy link

ghost commented Nov 22, 2017

@revelt that's not tending both sides. < is never safe in HTML and under some legacy parsing modes I think the browser may even auto-complete what it thinks is an incorrectly truncated tag with >. Any library that leaves any < inside something to be embedded into HTML is simply unsafe to use for HTML output, plain and simple. (since < outside of a HTML tag is never valid)

Edit: If you refer to the use case of stripping first and then replacing any remaining < and > afterwards with &lt and &gt anyway, I suppose that's fair enough.

@revelt
Copy link

revelt commented Nov 22, 2017

@Jonast Yes, you are right, it's not safe and not valid in HTML. But... Detergent accepts both HTML and pure text and outputs text with sprinkled HTML or without. There is a thin path in here where > is valid both as input and output. For example, if people want to remove invisible characters from some text copy. Like a < b and c > d\u0003 (invisible ETX character in the end). That's a legit request and Detergent will do that. I want to produce a < b and c > d, not a d\u0003 or a d. Do you see my point?

Since user pastes the copy into Detergent, input is assumed to be safe, so I see no problems letting them output unencoded < if they wish (encoding is turned on by default, but it can be turned off).

Having said that, I'm about to publish my type of HTML stripping tool which behaves like that.

@StorytellerCZ
Copy link

@az7arul Are you currently maintaining this or who is in charge here?

@az7arul
Copy link
Collaborator

az7arul commented Dec 7, 2017

Hi @StorytellerCZ, I wasn't maintaining this for a while. But for this I can take a look this weekend, meanwhile any PR that address this is welcome.

@ghost
Copy link

ghost commented Dec 7, 2017

// This function converts a binary string (can be HTML or JavaScript) to a safe string that can be used in a HTML file within JavaScript tags.
function EncodeHTML(S) { var i, C; S = S.split(""); for (i = 0; i < S.length; i++) { C = S[i].charCodeAt(0); if (C == 7) S[i] = "\t"; else if (C == 10) S[i] = "\r"; else if (C == 13) S[i] = "\n"; else if (C == 92) S[i] = "\\"; else if (C == 34 || C == 38 || C == 39 || C == 60 || C == 62 || C < 32 || C > 126) S[i] = "\x" + toHex(C); } return '"' + S.join("") + '"'; }

// This function converts a Js safe string with quotation marks around it to its original source. It's the opposite of EncodeHTML()
function DecodeHTML(S) { S += ""; if (S.length > 1) { if (S.charAt(0) == '"' || S.charAt(0) == "'") S = S.slice(1, -1); } S = S.split("\t").join("\t").split("\r").join("\r").split("\n").join("\n").split("\\").join("\").split("\x"); for (var i = 1; i < S.length; i++) S[i] = String.fromCharCode(parseInt(S[i].substr(0, 2), 16)) + S[i].slice(2); return S.join(""); }

@ghost
Copy link

ghost commented Dec 7, 2017

Oops. I just noticed that the above code didn't come out right. I should have put that code within CODE section, not as plain text, because the double backslash characters have been replaced with single. :/

@najibk
Copy link

najibk commented Dec 26, 2017

Hello, any news about this ?
Thanks

@angleman
Copy link

@az7arul, thank you for looking into this. We have nsp check as part of our deploy process and this security issue breaks the build.

@revelt
Copy link

revelt commented Mar 1, 2018

@ankurloriya No, the all-in-one monolithic replacement doesn't exist.

But you can replace its methods one-by-one with libraries from npm. For example, I replaced replaceAll() and with replace-string, lines() with split-lines. There were no suitable HTML tag stripping libraries to replace stripTags() so I coded my own. If your unit tests are thorough they will pick up any inconsistencies (or their absence) of a replacement library. I found it difficult to judge replacements by their readmes, - my unit tests were the judge.

Performance-wise, switching from monolithic string.js to separate libs will reduce your bundle's footprint too. You might even get some of them as Rollup'ed 3-in-1 builds today, where your Webpack/Rollup will tap their ES module's build directly. On other hand, it will take time for string.js to migrate to Rollup, it doesn't have an ES Module build at the moment.

@yieme
Copy link

yieme commented Jul 27, 2018

We changed to using various other libraries due to this security issue. Appears the project may be abandoned with #218 the http://stringjs.com/ domain apparently going away. Sad, it was a good library to have a lot of useful tools in one place. Hopefully it gets turned around for people. We just couldn't wait any longer for the security issue to be resolved. Cheers.

yamgent added a commit to yamgent/vue-strap that referenced this issue Aug 3, 2018
We use string@3.3.3 to generate our header id.

string@3.3.3 is reported to be unsafe and contains vulnerabilities [1].

As a step towards removing string@3.3.3 as a dependencies, let's replace
all usage of the 'string' methods with suitable equivalents.

[1] - jprichardson/string.js#212
yamgent added a commit to yamgent/vue-strap that referenced this issue Aug 3, 2018
We no longer use string@3.3.3 in our codebase.

string@3.3.3 is reported to be unsafe and contains vulnerabilities [1].

Let's remove string@3.3.3 as one of our dependencies.

[1] - jprichardson/string.js#212
yamgent added a commit to yamgent/vue-strap that referenced this issue Aug 3, 2018
We use string@3.3.3 to generate our header id.

string@3.3.3 is reported to be unsafe and contains vulnerabilities [1].

As a step towards removing string@3.3.3 as a dependencies, let's replace
all usage of the 'string' methods with suitable equivalents.

[1] - jprichardson/string.js#212
yamgent added a commit to yamgent/vue-strap that referenced this issue Aug 3, 2018
We no longer use string@3.3.3 in our codebase.

string@3.3.3 is reported to be unsafe and contains vulnerabilities [1].

Let's remove string@3.3.3 as one of our dependencies.

[1] - jprichardson/string.js#212
yamgent added a commit to yamgent/markbind that referenced this issue Aug 3, 2018
markdown-it-anchor@4.0.0 uses string@3.3.3.

string@3.3.3 is reported to be unsafe and contains vulnerabilities [1].
It is also no longer in use in markdown-it-anchor@5.0.0.

Let's upgrade markdown-it-anchor, so that it no longer uses
string@3.3.3.

[1] - jprichardson/string.js#212
yamgent added a commit to yamgent/markbind that referenced this issue Aug 3, 2018
markdown-it-table-of-contents@0.3.2 uses string@3.3.3.

string@3.3.3 is reported to be unsafe and contains vulnerabilities [1].
It is also no longer in use in markdown-it-table-of-contents@0.4.0.

Let's upgrade markdown-it-table-of-contents, so that it no longer uses
string@3.3.3.

[1] - jprichardson/string.js#212
daohoangson added a commit to xfrocks/node_pubhubsubbub_pushserver that referenced this issue Sep 25, 2018
yamgent added a commit to yamgent/vue-strap that referenced this issue Jul 15, 2019
We no longer use string@3.3.3 in our codebase.

string@3.3.3 is reported to be unsafe and contains vulnerabilities [1].

Let's remove string@3.3.3 as one of our dependencies.

[1] - jprichardson/string.js#212
yamgent added a commit to yamgent/vue-strap that referenced this issue Jul 15, 2019
We no longer use string@3.3.3 in our codebase.

string@3.3.3 is reported to be unsafe and contains vulnerabilities [1].

Let's remove string@3.3.3 as one of our dependencies.

[1] - jprichardson/string.js#212
yamgent added a commit to MarkBind/vue-strap that referenced this issue Jul 15, 2019
We no longer use string@3.3.3 in our codebase.

string@3.3.3 is reported to be unsafe and contains vulnerabilities [1].

Let's remove string@3.3.3 as one of our dependencies.

[1] - jprichardson/string.js#212
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests