-
Notifications
You must be signed in to change notification settings - Fork 137
URL whitelist filter and baseURL processing #53
base: improve-performance
Are you sure you want to change the base?
Conversation
b786772
to
2767614
Compare
hi @maditya , pls see if you have early comments on the whitelist xss filtering design. finally, |
(!reHosts || (reHosts && reHosts.test(url.slice(result[0].length)))))) { | ||
return url; | ||
} | ||
return 'x-' + url; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we may consider returning 'unsafe:' + url
, otherwise we'd actually disable a relative url
what if a user wants to allow https://www.yahoo.com but disallow http://www.yahoo.com ? |
have we considered implemeting it with a state machine as mentioned in the specs? with regexes, we might miss out on something. |
similarly, can you suggest an interface to config this? I can think of:
|
I've thought of that too. the most favorable reason of using regexp is that it consumes much less code size, since this lib can be used on client-side. The current whitelist-based regexp let us assert a stricter grammar than the specs. Say, the specs allows ipv6, but I suspect we may not need that in the near future, and feel like that's what we can miss out. thoughts? |
7809b14
to
d11a4e4
Compare
d11a4e4
to
b542b7e
Compare
@maditya 747f938 handled the comments. some example use cases:
|
allow protocol config, allow subdomain for hosts
b542b7e
to
747f938
Compare
9370e7e
to
3d337da
Compare
b867bfd
to
d8654ca
Compare
- enhanced yuwlFactory to take an optional callback to process scheme, host, port, and path - enabled safe image data URIs - supported ipv6 - added relPath and relPathOnly options
d8654ca
to
1a2deb4
Compare
- added url filter - added base url resolver - broken down functions into separate files - removed browserify
1a2deb4
to
04041bf
Compare
Hi @maditya, here's the major update to the URL processor. The URL filter now observes the URL spec, and is well-documented (for what has been simplified, and the non-standard behaviors the are required) in this latest commit.
This covers all the URL use cases so far you shared with me. pls check/review. thanks :) p.s. This PR is designed for reviews only, and should be merged separately with a new PR to the master branch. |
3582b52
to
3944f6c
Compare
- added polyfill for Array.prototype.indexOf during tests - added polyfill for JSON.stringify during tests - added deleteCount for array.splice - changed urlFilterFactory's callbacks to always have string parameters
fe7ad57
to
e4c897e
Compare
878b878
to
6ff981c
Compare
/* findLastFile-a: 964ms findLastFile-b: 2583ms findQueryOrFragment-a: 523ms findQueryOrFragment-b: 729ms findFragment-a: 690ms findFragment-b: 362ms symbol-a: 58ms symbol-b: 64ms */ var basePath = '/..#?'; var pos = -1; console.time('findLastFile-a'); var re = /(?:[\/\\](?!(?:\.|%2[eE]){2})[^\/\\?#]*)?(?:$|[?#])/g; for (var i = 0; i < 10000000; i++) { pos = re.exec(basePath).index; } console.timeEnd('findLastFile-a'); console.time('findLastFile-b'); var _resolvePathDoubleDots = /^(?:\.|%2[eE]){2}$/; var pathEnd, t; for (var i = 0; i < 10000000; i++) { var qPos = basePath.indexOf('?'), hashPos = basePath.indexOf('#'); pos = (qPos === -1 || hashPos !== -1 && hashPos < qPos) ? hashPos : qPos; pathEnd = pos === -1 ? undefined : pos; // _composeOriginSchemePath() normalized path to have at least the first / t = Math.max(basePath.lastIndexOf('/', pathEnd), basePath.lastIndexOf('\\', pathEnd)); // update pos as t only when the filename (after slash and until ?/#) is not .. or equiv. !_resolvePathDoubleDots.test(basePath.slice(t + 1, pathEnd)) && (pos = t); } console.timeEnd('findLastFile-b'); console.time('findQueryOrFragment-a'); var t, _reQueryOrFragment = /[?#]/; for (var i = 0; i < 10000000; i++) { (t = _reQueryOrFragment.exec(basePath)) && (pos = t.index); } console.timeEnd('findQueryOrFragment-a'); console.time('findQueryOrFragment-b'); for (var i = 0; i < 10000000; i++) { var qPos = basePath.indexOf('?'), hashPos = basePath.indexOf('#'); pos = (qPos === -1 || hashPos !== -1 && hashPos < qPos) ? hashPos : qPos; } console.timeEnd('findQueryOrFragment-b'); console.time('findFragment-a'); var t, _reFragment = /#/; for (var i = 0; i < 10000000; i++) { (t = _reFragment.exec(basePath)) && (pos = t.index); } console.timeEnd('findFragment-a'); console.time('findFragment-b'); for (var i = 0; i < 10000000; i++) { pos = basePath.indexOf('#'); } console.timeEnd('findFragment-b'); var len = basePath.length; console.time('symbol-a'); function symbolB (path, i) { switch(path.charCodeAt(i)) { case 47: case 92: return 1; case 35: case 63: return 2; } return 0; } for (var i = 0; i < 10000000; i++) { symbolB(basePath, i % len); } console.timeEnd('symbol-a'); console.time('symbol-b'); function symbolA (path, i) { var charCode = path.charCodeAt(i); return charCode === 47 || charCode === 92 ? 1 : charCode === 35 || charCode === 63 ? 2 : 0; } for (var i = 0; i < 10000000; i++) { symbolA(basePath, i % len); } console.timeEnd('symbol-b');
6ff981c
to
490d444
Compare
relaxed ipv6 to take percent-encoded input too
first step to facilitate further discussions