Skip to content
This repository has been archived by the owner on Jul 15, 2019. It is now read-only.

URL whitelist filter and baseURL processing #53

Open
wants to merge 11 commits into
base: improve-performance
Choose a base branch
from

Conversation

adon-at-work
Copy link
Contributor

first step to facilitate further discussions

@adon-at-work adon-at-work force-pushed the whitelist-url-filters branch 2 times, most recently from b786772 to 2767614 Compare October 2, 2015 19:17
@adon-at-work
Copy link
Contributor Author

hi @maditya , pls see if you have early comments on the whitelist xss filtering design.
the yuwlFactory() allows a number of options as stated in the comment block.
say, to use the most conservative one, we can simply do yuwl = yuwlFactory().
or, to check against if a URL must carry example.com and www.example.com as hostnames, we can do yuwl = yuwlFactory(false, false, false, ['example.com', 'www.example.com'])

finally, yuwl('http://example.com')

(!reHosts || (reHosts && reHosts.test(url.slice(result[0].length)))))) {
return url;
}
return 'x-' + url;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may consider returning 'unsafe:' + url, otherwise we'd actually disable a relative url

@maditya
Copy link

maditya commented Oct 6, 2015

what if a user wants to allow https://www.yahoo.com but disallow http://www.yahoo.com ?

@maditya
Copy link

maditya commented Oct 6, 2015

have we considered implemeting it with a state machine as mentioned in the specs? with regexes, we might miss out on something.

@adon-at-work
Copy link
Contributor Author

can we also allow configurable set of schemes? users might want to have other common schemes like mailto, xmpp or even their own custom schemes
what if a user wants to allow https://www.yahoo.com but disallow http://www.yahoo.com ?

similarly, can you suggest an interface to config this? I can think of:

  • ['https://*.yahoo.com', 'https://*.flickr.com', 'http://*.flickr.com'] as allowHosts, or call it allowOrigins;
  • a separate array to take allowProtocols?

@adon-at-work
Copy link
Contributor Author

have we considered implemeting it with a state machine as mentioned in the specs? with regexes, we might miss out on something.

I've thought of that too. the most favorable reason of using regexp is that it consumes much less code size, since this lib can be used on client-side. The current whitelist-based regexp let us assert a stricter grammar than the specs. Say, the specs allows ipv6, but I suspect we may not need that in the near future, and feel like that's what we can miss out. thoughts?

@adon-at-work adon-at-work force-pushed the whitelist-url-filters branch from 7809b14 to d11a4e4 Compare October 6, 2015 09:33
@adon-at-work adon-at-work force-pushed the whitelist-url-filters branch from d11a4e4 to b542b7e Compare October 6, 2015 09:34
@adon-at-work
Copy link
Contributor Author

@maditya 747f938 handled the comments.

some example use cases:

  • yuwlFactory({subdomain:true, hosts:['yahoo.com']})('http://www.yahoo.com') returns http://www.yahoo.com. Without the config, it returns unsafe:http://www.yahoo.com
  • yuwlFactory({protocols:['https', 'http', 'mailto', 'xmpp']})('mailto:test@example.com') returns test@example.com. Without the config, it returns unsafe:mailto:test@example.com

allow protocol config, allow subdomain for hosts
@adon-at-work adon-at-work force-pushed the whitelist-url-filters branch from b542b7e to 747f938 Compare October 6, 2015 09:43
@adon-at-work adon-at-work force-pushed the whitelist-url-filters branch 3 times, most recently from 9370e7e to 3d337da Compare October 29, 2015 12:07
@adon-at-work adon-at-work force-pushed the whitelist-url-filters branch 5 times, most recently from b867bfd to d8654ca Compare November 2, 2015 15:11
- enhanced yuwlFactory to take an optional callback to process scheme, host, port, and path
- enabled safe image data URIs
- supported ipv6
- added relPath and relPathOnly options
@adon-at-work adon-at-work force-pushed the whitelist-url-filters branch from d8654ca to 1a2deb4 Compare November 2, 2015 15:12
- added url filter
- added base url resolver
- broken down functions into separate files
- removed browserify
@adon-at-work adon-at-work force-pushed the whitelist-url-filters branch from 1a2deb4 to 04041bf Compare November 2, 2015 15:17
@adon-at-work adon-at-work changed the title Draft: URL whitelist filter URL whitelist filter and baseURL processing Nov 2, 2015
@adon-at-work
Copy link
Contributor Author

Hi @maditya, here's the major update to the URL processor. The URL filter now observes the URL spec, and is well-documented (for what has been simplified, and the non-standard behaviors the are required) in this latest commit.

This covers all the URL use cases so far you shared with me. pls check/review. thanks :)

p.s. This PR is designed for reviews only, and should be merged separately with a new PR to the master branch.

@adon-at-work adon-at-work force-pushed the whitelist-url-filters branch from 3582b52 to 3944f6c Compare November 3, 2015 09:13
- added polyfill for Array.prototype.indexOf during tests
- added polyfill for JSON.stringify during tests
- added deleteCount for array.splice
- changed urlFilterFactory's callbacks to always have string parameters
@adon-at-work adon-at-work force-pushed the whitelist-url-filters branch 2 times, most recently from fe7ad57 to e4c897e Compare November 9, 2015 03:15
@adon-at-work adon-at-work force-pushed the whitelist-url-filters branch 2 times, most recently from 878b878 to 6ff981c Compare November 9, 2015 07:44
/*
findLastFile-a: 964ms
findLastFile-b: 2583ms
findQueryOrFragment-a: 523ms
findQueryOrFragment-b: 729ms
findFragment-a: 690ms
findFragment-b: 362ms
symbol-a: 58ms
symbol-b: 64ms
*/

var basePath = '/..#?';
var pos = -1;

console.time('findLastFile-a');
var re = /(?:[\/\\](?!(?:\.|%2[eE]){2})[^\/\\?#]*)?(?:$|[?#])/g;
for (var i = 0; i < 10000000; i++) {
	pos = re.exec(basePath).index;
}
console.timeEnd('findLastFile-a');

console.time('findLastFile-b');
var _resolvePathDoubleDots = /^(?:\.|%2[eE]){2}$/;
var pathEnd, t;
for (var i = 0; i < 10000000; i++) {
	var qPos = basePath.indexOf('?'), hashPos = basePath.indexOf('#');
	pos = (qPos === -1 || hashPos !== -1 && hashPos < qPos) ? hashPos :
qPos;
	pathEnd = pos === -1 ? undefined : pos;

	// _composeOriginSchemePath() normalized path to have at least the
first /
	t = Math.max(basePath.lastIndexOf('/', pathEnd),
			basePath.lastIndexOf('\\', pathEnd));

	// update pos as t only when the filename (after slash and until ?/#)
is not .. or equiv.
	!_resolvePathDoubleDots.test(basePath.slice(t + 1, pathEnd)) && (pos =
t);
}
console.timeEnd('findLastFile-b');

console.time('findQueryOrFragment-a');
var t, _reQueryOrFragment = /[?#]/;
for (var i = 0; i < 10000000; i++) {
	(t = _reQueryOrFragment.exec(basePath)) && (pos = t.index);
}
console.timeEnd('findQueryOrFragment-a');

console.time('findQueryOrFragment-b');
for (var i = 0; i < 10000000; i++) {
	var qPos = basePath.indexOf('?'), hashPos = basePath.indexOf('#');
	pos = (qPos === -1 || hashPos !== -1 && hashPos < qPos) ? hashPos :
qPos;
}
console.timeEnd('findQueryOrFragment-b');

console.time('findFragment-a');
var t, _reFragment = /#/;
for (var i = 0; i < 10000000; i++) {
	(t = _reFragment.exec(basePath)) && (pos = t.index);
}
console.timeEnd('findFragment-a');

console.time('findFragment-b');
for (var i = 0; i < 10000000; i++) {
	pos = basePath.indexOf('#');
}
console.timeEnd('findFragment-b');

var len = basePath.length;
console.time('symbol-a');
function symbolB (path, i) {
    switch(path.charCodeAt(i)) {
        case 47: case 92: return 1;
        case 35: case 63: return 2;
    }
    return 0;
}
for (var i = 0; i < 10000000; i++) {
	symbolB(basePath, i % len);
}
console.timeEnd('symbol-a');

console.time('symbol-b');
function symbolA (path, i) {
    var charCode = path.charCodeAt(i);
    return charCode === 47 || charCode === 92 ? 1 :
        charCode === 35 || charCode === 63 ? 2 :
        0;
}
for (var i = 0; i < 10000000; i++) {
	symbolA(basePath, i % len);
}
console.timeEnd('symbol-b');
@adon-at-work adon-at-work force-pushed the whitelist-url-filters branch from 6ff981c to 490d444 Compare November 9, 2015 08:27
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants