-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
url: fix surrogate handling in encodeAuth() #11161
Conversation
const hexTable = []; | ||
for (var i = 0; i < 256; ++i) | ||
hexTable[i] = '%' + ((i < 16 ? '0' : '') + i.toString(16)).toUpperCase(); | ||
|
||
// These characters do not need escaping when generating query strings: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about moving other private functions to internal/querystring
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only moved this array because it is used in multiple places (querystring and url). The only other entity in querystring that is somewhat shared is ParsedQueryString
, a copy of which is in url (as ParsedQueryString
as well) and in internal/url (as StorageObject
). I've moved this as well. (There is also an analog of it in events as EventHandlers
, but I kept it as it is since it'd be weird for that module to import from internal/querystring
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think placing private bits intointernal
would make the boundaries clearer, whether its shared or not, like BufferList
for _stream_readable.js
(only used once there).
lib/url.js
Outdated
@@ -948,3 +948,67 @@ function spliceOne(list, index) { | |||
list[i] = list[k]; | |||
list.pop(); | |||
} | |||
|
|||
function encodeAuth(str) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this originally came from querystring.escape()
, but that function's implementation has recently improved performance-wise and we should probably incorporate those changes here (specifically the ASCII value checking -- it uses a lookup table now).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. 3% improvement on url/legacy-vs-whatwg-url-serialize.js
with type=auth method=legacy n=5e5
.
lib/querystring.js
Outdated
@@ -15,6 +15,7 @@ const QueryString = module.exports = { | |||
decode: parse | |||
}; | |||
const Buffer = require('buffer').Buffer; | |||
const hexTable = require('internal/querystring').hexTable; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be more consistent about how we import single items like this. Here we're using the pre-ES6 style, but in other places we are destructuring like const { hexTable } = require('internal/querystring');
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially followed the style of existing imports in the file (see Buffer
above), but since I'm importing multiple items from this module now I started using destructuring.
@joyeecheung and @mscdex, comments addressed. New CI: https://ci.nodejs.org/job/node-test-pull-request/6213/ @mscdex, I also factored out the entire percent encoding function so that it can be shared between querystring's Benchmark results
|
lib/internal/querystring.js
Outdated
} | ||
} | ||
|
||
// TODO support returning arbitrary buffers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Off-topic: what does this TODO mean? Take a buffer and escape its content?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume it means using a user-provided buffer. However, at the same time, this comment goes all the way back to 57d8172 (v0.3.2), so whether this TODO is still applicable isn't clear by any means.
Anyone is welcome to initiate a formal deprecation process of these exports in another PR.
lib/internal/querystring.js
Outdated
|
||
// QueryString.escape() replaces encodeURIComponent() | ||
// http://www.ecma-international.org/ecma-262/5.1/#sec-15.1.3.4 | ||
function percentEscape(noEscape, str) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: s/noEscape/noEscapeTable
would be more intuitive, noEscape
sounds like a boolean..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks.
lib/querystring.js
Outdated
StorageObject, | ||
percentEscape, | ||
unescapeBuffer | ||
} = require('internal/querystring'); | ||
const QueryString = module.exports = { | ||
unescapeBuffer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Off-topic: unescapeBuffer
, encode
and decode
are all removed from the docs, mind adding a comment?
(Not sure about which stage of deprecation they are in)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm..mind adding a comment about that they have never been documented, so technically not public APIs, just exposed by accident?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd almost rather just document them IMHO, at least unescapeBuffer()
since it's not an alias to an already documented method. I've personally had uses for unescapeBuffer()
in my own projects. *shrug*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to deprecations or anything, just now that we are refactoring things, might as well :P
I just benchmarked the current changes in this PR and there is a 3-4% performance regression (with high "confidence" ratings) in the querystring-stringify benchmarks. With that in mind, I think we should avoid extracting the logic into a separate function ( |
Ouch, yeah, with numbers like that I have to agree. |
@mscdex, okay for now I left percent encoding alone in their respective files. |
lib/internal/querystring.js
Outdated
StorageObject.prototype = Object.create(null); | ||
|
||
// a safe fast alternative to decodeURIComponent | ||
function unescapeBuffer(s, decodeSpaces) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be moved back to lib/querystring.js as well, since it's not used anywhere else, along with unhexTable
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one reason to keep it in internal
is that unescapeBuffer
is exported to user land by accident, putting it in internal
should make it clear(if there is no performance regression caused by this though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think we should just document unescapeBuffer()
, since it can be useful (and I've used it in some of my projects before for decoding binary/latin1 strings).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so probably move it back to lib/querystring.js
for now, and let another PR sort this issue out, no matter we want to make it really internal or start to document it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear, this PR does not modify the exports on querystring
: unescapeBuffer()
is still exported on require('querystring')
, just the physical location is changed.
That said, I thought this function would be of interest for other modules like url as well, which was why I moved it to internal/url.js
in the first place. As it doesn't seem like this decision is popular, I've reverted it. PTAL.
lib/internal/querystring.js
Outdated
return out.slice(0, outIndex); | ||
} | ||
|
||
exports.hexTable = hexTable; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be switched to the module.exports = { ... }
syntax instead (for faster property access)?
@mscdex, any more comments? Note, I am not against documenting |
@TimothyGu I recognize that, I was just advocating for returning it back to lib/querystring.js. Current changes LGTM. |
Also factor out common parts in querystring and url. PR-URL: nodejs#11161 Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Brian White <mscdex@mscdex.net>
Landed in c6b12d0. |
Also factor out common parts in querystring and url. Backport-of: nodejs#11161
Also factor out common parts in querystring and url. Backport-of: nodejs#11161
Also factor out common parts in querystring and url. Backport-of: #11161
Currently, the legacy URL stringifier miscalculates the offset of an extra surrogate, causing the high surrogate to be included unescaped: > url.format({ auth: '\uD83D\uDE00', hostname: 'a' }) '//%F0%9F%98%80�@A' PR-URL: nodejs#11387 Backport-of: nodejs#11161
Currently, the legacy URL stringifier miscalculates the offset of an extra surrogate, causing the high surrogate to be included unescaped:
Also included in the PR is a commit that moves the function, which is only used in the legacy
url
module, to that file. Certain generic percent encoding handling functions and tables are also moved to a separate lib/internal/querystring.js.CI: https://ci.nodejs.org/job/node-test-pull-request/6212/Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passesAffected core subsystem(s)
url, querystring