feat: add rule parser for advertiser hosts #304

jrconlin · 2021-10-15T15:21:07Z

This will allow for simple pattern matches against advertiser_hosts (See
#303). This does not use regular expressions, but rather a very
primative string match function that matches the final portion of the
host and (optionally) the leading portion of the path.
e.g.

    "advertiser_hosts": ["acme.biz/ca", "acme.biz/uk", "acme.tv"]

Closes #303

mythmon · 2021-10-15T15:37:11Z

I think using regex here is the wrong tool. I've heard a lot of stories of bugs because URLs were parsed too permissively. The biggest issue is the special behavior of . in regex, which can cause bugs. I think that making it so these settings are sometimes regex and sometimes fixed strings complicates things. I see you've made some efforts to fix this, but they seem incomplete compared to a proper regex escaping scheme.

An exploitable version of it would be something like a rule r#foo.example.com. An attacker could register foo-example.com, and that would still pass the filter.

I think something like globs would be more fitting, or to define that any subdomain of a matched domain is valid. I would also like to see incoming URLs parsed as URLs instead of treated as unstructured strings, but that might be out of scope for this change.

If we do keep regex, I think it would be beneficial to consider using RegexSet from the same crate, which can more efficiently match against many regexes at once.

jrconlin · 2021-10-18T23:20:26Z

Yeah, not really going to complain. Regex is really powerful, a resource hog, and made out of pure explodium. I think we're probably NOT going to land this PR. I created it mostly as a draft to see what all would be involved in actually creating it, and realized I had accidentally built a patch.

There's a working document that I'm using with our customer to create a simpler string based rule set that should solve their requirements without invoking the great satan that regex can be.

This will allow for simple pattern matches against `advertiser_hosts` (See #303). This does not use regular expressions, but rather a very primative string match function that matches the final portion of the host and (optionally) the leading portion of the path. e.g. ```json "advertiser_hosts": ["acme.biz/ca", "acme.biz/uk", "acme.tv"] ``` Closes #303

Based on conversations and requirement clarifications

src/adm/filter.rs

ncloudioj · 2021-10-26T21:20:12Z

src/adm/filter.rs

+                filter.clone()
+            };
+            // make sure to do the same s/./*/ to the filter, because you never know...
+            let filter_parts = filter.splitn(2, '/').collect::<Vec<&str>>();


Perhaps add a guard here to check the vector length? Otherwise, filter_parts[1] might be problematic on malformed filters.

Also, I don't quite follow the special handling on ".". Can you show me some examples?

w/ the guaranteed trailing '/' above splitn would always give us 2 elements. Maybe worth a comment though

We're forcing the string to add a / at the end, so there should always be at least 2 parts, even if one of them is an empty string.

I have a test case that illustrates the reason I convert . to * in the path, but basically it's to avoid situations like

https://evil.com/good.com/etc from passing.

I see. But this doesn't fully convince me though. As we hold control over the filters, we will make sure of their validity. So Contile will drop those bad hosts even if they manage to sneak into the partner's API.

If we still want to have this extra check on filters, I'd recommend doing this dot-escaping only once perhaps during the startup. Doing this over and over again for each request seems wasteful and unnecessary to me.

Does that make sense?

src/adm/filter.rs

pjenvey · 2021-10-26T21:44:39Z

src/error.rs

@@ -198,6 +198,12 @@ impl From<HandlerError> for HttpResponse {
    }
 }

+impl From<regex::Error> for HandlerError {


This looks leftover from the last version, might as well kill it

src/adm/filter.rs

pjenvey · 2021-10-26T22:57:25Z

Almost forgot a couple more things we're missing per the new spec:

It would be an error to provide a filter that contains a bare host and one that contains a path (e.g. example.com example.com/ca)
Contile would also enforce `https` as the only valid scheme.

I think we need #1 to go along w/ this PR. #2 should be a few lines of easy code

… 303/prefix

e.g allow `example.com/ca` & `example.co.uk`

src/adm/filter.rs

ncloudioj

r+ with one more comment.

Thanks!

Co-authored-by: Nan Jiang <njiang028@gmail.com>

ncloudioj

🚢

hackebrot · 2021-11-03T17:10:53Z

Hi @jrconlin! 👋🏻 Would you say this new feature is sufficiently covered by unit tests? Do you think we can add at least one integration test that uses settings which rely on this feature?

jrconlin requested a review from a team October 15, 2021 15:21

jrconlin added 3 commits October 20, 2021 14:44

f remove regex for simple string search.

9aa7428

Based on conversations and requirement clarifications

f handle dots in filter paths

b963729

jrconlin force-pushed the 303/prefix branch from f175c8b to b963729 Compare October 20, 2021 22:12

jrconlin marked this pull request as ready for review October 20, 2021 22:13

jrconlin changed the title ~~feat: add regex rule parser for advertiser hosts~~ feat: add rule parser for advertiser hosts Oct 20, 2021

jrconlin and others added 2 commits October 20, 2021 15:35

f update exceptions

725fd59

Merge branch 'main' into 303/prefix

194432b

data-sync-user mentioned this pull request Oct 25, 2021

Add subdomain allowlist to Contile #303

Closed

ncloudioj reviewed Oct 26, 2021

View reviewed changes

src/adm/filter.rs Outdated Show resolved Hide resolved

ncloudioj reviewed Oct 26, 2021

View reviewed changes

src/adm/filter.rs Outdated Show resolved Hide resolved

ncloudioj reviewed Oct 26, 2021

View reviewed changes

pjenvey reviewed Oct 26, 2021

View reviewed changes

jrconlin added 6 commits October 26, 2021 17:15

f r's

6db6269

Merge branch '303/prefix' of github.com:mozilla-services/contile into…

f934b53

… 303/prefix

f fix test to match spec def.

ed729ba

e.g allow `example.com/ca` & `example.co.uk`

f simplify advertiser check

e99182b

f remove comment

c53c141

f remove extra dbg

5adf246

ncloudioj reviewed Oct 27, 2021

View reviewed changes

src/adm/filter.rs Outdated Show resolved Hide resolved

ncloudioj previously approved these changes Oct 27, 2021

View reviewed changes

f suggested r

2150154

Co-authored-by: Nan Jiang <njiang028@gmail.com>

jrconlin dismissed ncloudioj’s stale review via 2150154 October 28, 2021 17:36

Merge branch 'main' into 303/prefix

e507eb9

ncloudioj self-requested a review October 28, 2021 18:10

ncloudioj approved these changes Oct 28, 2021

View reviewed changes

jrconlin merged commit 0a28722 into main Oct 28, 2021

jrconlin deleted the 303/prefix branch October 28, 2021 20:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add rule parser for advertiser hosts #304

feat: add rule parser for advertiser hosts #304

jrconlin commented Oct 15, 2021 •

edited

Loading

mythmon commented Oct 15, 2021

jrconlin commented Oct 18, 2021

ncloudioj Oct 26, 2021

pjenvey Oct 26, 2021

jrconlin Oct 26, 2021

ncloudioj Oct 27, 2021

pjenvey Oct 26, 2021

pjenvey commented Oct 26, 2021

ncloudioj left a comment

ncloudioj left a comment

hackebrot commented Nov 3, 2021

feat: add rule parser for advertiser hosts #304

feat: add rule parser for advertiser hosts #304

Conversation

jrconlin commented Oct 15, 2021 • edited Loading

mythmon commented Oct 15, 2021

jrconlin commented Oct 18, 2021

ncloudioj Oct 26, 2021

Choose a reason for hiding this comment

pjenvey Oct 26, 2021

Choose a reason for hiding this comment

jrconlin Oct 26, 2021

Choose a reason for hiding this comment

ncloudioj Oct 27, 2021

Choose a reason for hiding this comment

pjenvey Oct 26, 2021

Choose a reason for hiding this comment

pjenvey commented Oct 26, 2021

ncloudioj left a comment

Choose a reason for hiding this comment

ncloudioj left a comment

Choose a reason for hiding this comment

hackebrot commented Nov 3, 2021

jrconlin commented Oct 15, 2021 •

edited

Loading