Offering my REDOS tools #208

davisjam · 2018-04-05T20:45:20Z

Summary

I'm a PhD student at Virginia Tech. I've just finished polishing up a release of the tools I used in a recent project studying the incidence of REDOS in practice. I'd like to suggest that these tools should be used during CI in any server-side module or application.

Motivation

To give a sense of why we should think about this as a best practice, let me summarize what I found in my study:

Two vulnerabilities in Node v4 (vuln 1, vuln 2).
Three vulnerabilities in Python core (vulns 1 and 2, vuln 3)
A bunch more in npm and pypi (e.g. affecting Marked, Hapi, Django, and MongoDB).

I've got an academic paper under submission with all the gory details.

Tools

Here are the tools I've developed:

General repo (includes detectors and a bin/* for testing a regex/file/tree/URL).
The vuln-regex-detector npm module to test a regex. This is a "dumb client" and queries a server, see below.
The eslint-plugin-vuln-regex-detector "eslint plugin". Uses the vuln-regex-detector module. This "plugin" is really intended for use as a separate CI stage. I just found eslint's infrastructure useful.

Infrastructure

I'm hosting a server at toybox.cs.vt.edu:8000 that answers queries from the npm modules.
The server is documented here.

What I'd like to see

I envision adding this to package.json of any module that might be used server-side (potential REDOS):

"test:regex": "eslint --plugin vuln-regex-detector --rule '\"vuln-regex-detector/no-vuln-regex\": 2' FILES_YOU_CARE_ABOUT"

and including it during the CI process. Then REDOS issues would be caught by Travis instead of by @ChALkeR.

Issues

While I would love to see these tools adopted by everyone, there are a few potential hiccups if a lot of people start using it:

The server does not support batched queries, so users have to run separate queries for every regex. This may be a performance issue for clients, especially those with many regexes (like useragent modules).
The server validates results with a single thread. Some regexes take awhile to validate (performance issue); malicious clients can deliberately craft hard-to-validate regexes (DoS concern).
The server is currently a single desktop here at Virginia Tech. It might not scale well.

I welcome PRs and suggestions/help with DevOps.

In addition, there are some "whodunnit" questions that might arise if a project starts using these tools during CI. See this discussion, especially this point by @styfle. One concern is that incorporating this into CI might accidentally disclose previously-undiscovered REDOS issues that exist in production.

Documentation

I've tried to be thorough in the documentation, but let me know if anything is confusing or seems wrong.

How is this different from `safe-regex`?

Existing safe regex detectors (e.g. eslint-plugin-security's detect-unsafe-regex rule) all rely on the safe-regex module.

The safe-regex module uses "star height" (nested quantifiers, e.g. /(a+)+/) as a heuristic to identify vulnerable regexes.

Unfortunately, safe-regex has three issues:

It's unmaintained but implemented incorrectly.
It is prone to false positives. all regexes with star height > 1 are vulnerable. In my research I found that, in practice, most aren't.
More critically, it is also prone to false negatives. In my research I found that most regexes that suffer from catastrophic backtracking do not have star height > 1.

My project only uses detectors that suggest an attack string, and it confirms that the attack string triggers catastrophic backtracking in the language you request (JS-Node, Perl, PHP, Ruby, Python).
Look ma, no false positives! But note that if the regex in question is not reachable by client input, then it's a possible time bomb but not a current REDOS problem.

My project may have false negatives:

If you extract regexes statically (like the eslint plugin does) then any dynamically-declared regexes (e.g. new RegExp(p)) won't be tested.
If the detectors can't handle the regex, then the server currently marks the regex as safe.

The text was updated successfully, but these errors were encountered:

davisjam · 2018-04-05T20:58:02Z

/cc @lirantal @grnd @ChALkeR @mhdawson

mhdawson · 2018-04-05T22:22:56Z

I had a talk with @davisjam earlier today and this looks interesting. We should discuss if scanning using some of these tools is something we should suggest as best practice when putting together a module.

zeke · 2018-04-06T01:52:08Z

cc @substack would you be willing to move safe-regex to a GitHub org?

davisjam · 2018-04-06T02:04:50Z

move safe-regex to a GitHub org?

If so, I have several other lightweight analyses that I can contribute to safe-regex.

They are in pure JS and can run locally.
They can identify nearly all of the vulnerable regexes the heavyweight detectors can (plus a bunch of bonus non-vulnerable ones!). i.e. high false positives, low false negatives.

Since the heavyweight detectors exist and are available, I'm not sure the false positive rates of safe-regex-style heuristics are something I would embrace. But to each their own.

davisjam · 2018-04-07T02:14:48Z

We should discuss if scanning using some of these tools is something we should suggest as best practice when putting together a module.

These tools could also be incorporated into Node's CI. They would have prevented the REDOS issue in Node v4 since those regexes were statically declared.

If some kind of persistent FS mount is available across CI runs, then overhead can be minimized using the "persistent cache" feature of the vuln-regex-detector module. Relevant discussion here.

If this sounds interesting I'm happy to contribute a PR. I would need to be pointed in the right direction though. @Trott?

Trott · 2018-04-07T02:55:20Z

These tools could also be incorporated into Node's CI.

...

If some kind of persistent FS mount is available across CI runs, then overhead can be minimized using the "persistent cache" feature of the vuln-regex-detector module. Relevant discussion here.

If this sounds interesting I'm happy to contribute a PR. I would need to be pointed in the right direction though. @Trott?

Seems do-able to me, but someone who understands our Jenkins set-up better than I do would likely need to point you in the right direction. /ping @nodejs/build

mhdawson · 2018-04-09T16:43:27Z

I'm not sure it would need to be run on every PR, a nightly run with an email to the security team might be enough? If that was the case we might be able to select a smaller set of machines where they would build up a cache after the first run.

davisjam · 2018-10-15T18:20:17Z

@zeke

cc substack would you be willing to move safe-regex to a GitHub org?

I now have publish rights to safe-regex.

mikesamuel · 2018-12-04T16:50:30Z

@davisjam

Given a source tree, /.../ style RegExps are easy to find but I wonder if we could get dynamically created RegExps in test-covered code to your analyzer.

If a project uses mocha to run tests, might a wrapper around mocha do something like the below before running tests so that a posttest script could fire up your analyzer:

const { console, RegExp: builinRegExp, Reflect } = global;

function onNewRegExp(re) {
  // Instead dump pattern, flags, and call stack to a file for later analysis.
  console.log(`Constructed regexp ${ re.source }`);
}

const LoggingRegExp = new Proxy(
    RegExp,
    {
        apply(...args) {
            const re = Reflect.apply(...args);
            onNewRegExp(re);
            return re;
        },
        construct(...args) {
            const re = Reflect.construct(...args);
            onNewRegExp(re);
            return re;
        },
    });

global.RegExp = LoggingRegExp;

Using a proxy makes it transparent to instanceof so that things like the following behave the same regardless of whether it's installed.

const a = new RegExp('^a*$');
const b = /^a*$/;
const c = RegExp('^a*$');

class MyRegExp extends RegExp {
    constructor(...args) {
        super(...args);
    }
}

const d = new MyRegExp('^a*$');

console.log(`a=${ a } : ${ a instanceof RegExp }`);
console.log(`b=${ b } : ${ b instanceof RegExp }`);
console.log(`c=${ c } : ${ c instanceof RegExp }`);
console.log(`d=${ d } : ${ d instanceof RegExp }, ${ d instanceof MyRegExp }`);
console.log(a.constructor === b.constructor);
console.log(c.constructor === b.constructor);
console.log(d.constructor === MyRegExp);

This won't cover regular expressions created by attacker controlled strings.

It also won't cover builtins that use the builtin RegExp under the hood like "string".match('[str]*') but they could be with more instrumenting.

@MarcinHoppe fyi.

davisjam · 2018-12-04T17:11:14Z

@mikesamuel

Yes, that should work.
In many projects I think regexes are statically declared and easy to get at, but this idea would be helpful in projects that use dynamically-generated regexes a lot -- for example, marked.

mikesamuel · 2018-12-04T17:17:38Z

Do you have a sense of how much of the ReDoS attack surface is application-defined RegExps vs RegExps in common dependencies vs attacker-controlled RegExps?

davisjam · 2018-12-04T17:33:09Z

I have only studied common dependencies (i.e. npm modules). Applications are hard to come by. If you happen to have a large set of applications I'd love to chat.
I do not know the frequency of attacker-controlled RegExps. I would guess it is pretty low based on typical coding practices.

MarcinHoppe · 2018-12-04T17:44:08Z

My observations are similar, static regexp patterns are easy to catch using static analysis but developers tend to get creative and generate regexp patterns on the fly. In this case instrumenting the test suite sounds like an effective approach.

mikesamuel · 2018-12-04T20:20:59Z

I'm afraid I mostly have library code as well.

These use dynamic RegExps.

fraxken · 2022-07-18T10:34:10Z

👋

I have just discovered this topic, and the problem with dynamic RegExps is something I don't take into account in the NodeSecure/js-x-ray SAST project 😱. I will work on it :)

However not sure if the initial subject of integrating it in the Node CI is still relevant here.

On the side of ecosystem we have a good support in the NodeSecure projects.
For example in the CLI/UI:

And our CI project will support it too (not ready for production yet).

We also have links to your projects in the Awesome Node.js Security repo and I think we will probably add some links when we are going to build the user documentation (after the security model).

github-actions · 2022-10-17T00:23:02Z

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

davisjam mentioned this issue Apr 12, 2018

Update reference for bug bounty program for node core #169

Merged

3 tasks

fraxken mentioned this issue Jul 18, 2022

Detect usage of dynamic RegExps NodeSecure/js-x-ray#42

Open

github-actions bot added the stale label Oct 17, 2022

github-actions bot closed this as completed Nov 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Offering my REDOS tools #208

Offering my REDOS tools #208

davisjam commented Apr 5, 2018 •

edited

Loading

davisjam commented Apr 5, 2018

mhdawson commented Apr 5, 2018

zeke commented Apr 6, 2018

davisjam commented Apr 6, 2018 •

edited

Loading

davisjam commented Apr 7, 2018

Trott commented Apr 7, 2018

mhdawson commented Apr 9, 2018

davisjam commented Oct 15, 2018

mikesamuel commented Dec 4, 2018

davisjam commented Dec 4, 2018

mikesamuel commented Dec 4, 2018

davisjam commented Dec 4, 2018 •

edited

Loading

MarcinHoppe commented Dec 4, 2018

mikesamuel commented Dec 4, 2018

fraxken commented Jul 18, 2022

github-actions bot commented Oct 17, 2022

Offering my REDOS tools #208

Offering my REDOS tools #208

Comments

davisjam commented Apr 5, 2018 • edited Loading

Summary

Motivation

Tools

Infrastructure

What I'd like to see

Issues

Documentation

How is this different from safe-regex?

davisjam commented Apr 5, 2018

mhdawson commented Apr 5, 2018

zeke commented Apr 6, 2018

davisjam commented Apr 6, 2018 • edited Loading

davisjam commented Apr 7, 2018

Trott commented Apr 7, 2018

mhdawson commented Apr 9, 2018

davisjam commented Oct 15, 2018

mikesamuel commented Dec 4, 2018

davisjam commented Dec 4, 2018

mikesamuel commented Dec 4, 2018

davisjam commented Dec 4, 2018 • edited Loading

MarcinHoppe commented Dec 4, 2018

mikesamuel commented Dec 4, 2018

fraxken commented Jul 18, 2022

github-actions bot commented Oct 17, 2022

davisjam commented Apr 5, 2018 •

edited

Loading

How is this different from `safe-regex`?

davisjam commented Apr 6, 2018 •

edited

Loading

davisjam commented Dec 4, 2018 •

edited

Loading