Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: some refactoring of node_url.cc #17470

Closed
wants to merge 4 commits into from

Conversation

addaleax
Copy link
Member

@addaleax addaleax commented Dec 5, 2017

  • src: minor cleanups to node_url.cc

    • Remove pointless pointers
    • Make WriteHost take a const argument so that it’s functionality
      is clear from the signature
    • Make FindLongestZeroSequence templated to accommodate the constness
      in WriteHost and because using uint16_t is an articifial,
      unnecessary restriction
    • Remove string copying when no copies are needed
    • Make PercentDecode just return its return value
    • Make ParseHost (string-only version) take its constant argument
      as a constant reference
  • src: move url internals into anonymous namespace

    This helps because static doesn’t work for C++ classes,
    but refactoring url_host into a proper C++ class seems the
    most reasonable soluation for the memory leak fixed by the next commit.

  • src: make url host a proper C++ class

    • Gives URLHost a proper destructor that clears memory
      depending on the type of the host (This fixes a memory leak)
    • Hide the host type enums and class layout as implementation details
    • Make the Parse methods members of URLHost
    • Turn WriteHost into a ToString() method on the URLHost class
    • Verify that at the beginning of a parse attempt, the type is set
      to “failed”
    • Remove a lot of gotos from the source code 🐢🚀

    Fixes: Why use new URL() may lead to memory leak ? #17448

  • src: use correct OOB check for IPv6 parsing

    last_piece pointed to the end of the 8×16 bit array,
    so piece_pointer == last_piece already means that the pointer
    is not writable any longer.

    Previously, this still worked most of the time but could
    result in an out-of-bounds-write.

    Also, rename last_piece to buffer_end to avoid this pitfall.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • commit message follows commit guidelines
Affected core subsystem(s)

src/node_url.cc

Not sure how to write tests but it makes valgrind happy about @TimothyGu’s example from #17448

@addaleax addaleax added c++ Issues and PRs that require attention from people who are familiar with C++. whatwg-url Issues and PRs related to the WHATWG URL implementation. labels Dec 5, 2017
@addaleax addaleax requested a review from TimothyGu December 5, 2017 13:56
@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. dont-land-on-v4.x whatwg-url Issues and PRs related to the WHATWG URL implementation. labels Dec 5, 2017
- Remove pointless pointers
- Make `WriteHost` take a const argument so that it’s functionality
  is clear from the signature
- Make `FindLongestZeroSequence` templated to accommodate the constness
  in `WriteHost` and because using `uint16_t` is an articifial,
  unnecessary restriction
- Remove string copying when no copies are needed
- Make `PercentDecode` just return its return value
- Make `ParseHost` (string-only version) take its constant argument
  as a constant reference
This helps because `static` doesn’t work for C++ classes,
but refactoring `url_host` into a proper C++ class seems the
most reasonable soluation for the memory leak fixed by the next commit.
@addaleax
Copy link
Member Author

addaleax commented Dec 5, 2017

@addaleax
Copy link
Member Author

addaleax commented Dec 5, 2017

Some of the test failures are definitely related & a bit curious … going to look into those later

- Gives `URLHost` a proper destructor that clears memory
  depending on the type of the host (This fixes a memory leak)
- Hide the host type enums and class layout as implementation details
- Make the `Parse` methods members of `URLHost`
- Turn `WriteHost` into a `ToString()` method on the `URLHost` class
- Verify that at the beginning of a parse attempt, the type is set
  to “failed”
- Remove a lot of `goto`s from the source code 🐢🚀

Fixes: nodejs#17448
`last_piece` pointed to the end of the 8×16 bit array,
so `piece_pointer == last_piece` already means that the pointer
is not writable any longer.

Previously, this still worked most of the time but could
result in an out-of-bounds-write.

Also, rename `last_piece` to `buffer_end` to avoid this pitfall.
@addaleax
Copy link
Member Author

addaleax commented Dec 5, 2017

Copy link
Member

@TimothyGu TimothyGu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splendid work! Good to see this whole thing properly C++-ized, and the memory leak fixed. LGTM.

pointer += 2;
ch = pointer < end ? pointer[0] : kEOL;
piece_pointer++;
compress_pointer = piece_pointer;
}
while (ch != kEOL) {
if (piece_pointer > last_piece)
goto end;
if (piece_pointer >= buffer_end)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, nice catch!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TimothyGu well, it was breaking CI for me, soooo... ;)

Wouldn't have seen it without the tests breaking.

@addaleax addaleax added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Dec 8, 2017
BridgeAR pushed a commit to BridgeAR/node that referenced this pull request Dec 12, 2017
- Remove pointless pointers
- Make `WriteHost` take a const argument so that it’s functionality
  is clear from the signature
- Make `FindLongestZeroSequence` templated to accommodate the constness
  in `WriteHost` and because using `uint16_t` is an articifial,
  unnecessary restriction
- Remove string copying when no copies are needed
- Make `PercentDecode` just return its return value
- Make `ParseHost` (string-only version) take its constant argument
  as a constant reference

PR-URL: nodejs#17470
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
BridgeAR pushed a commit to BridgeAR/node that referenced this pull request Dec 12, 2017
This helps because `static` doesn’t work for C++ classes,
but refactoring `url_host` into a proper C++ class seems the
most reasonable soluation for the memory leak fixed by the next commit.

PR-URL: nodejs#17470
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
BridgeAR pushed a commit to BridgeAR/node that referenced this pull request Dec 12, 2017
- Gives `URLHost` a proper destructor that clears memory
  depending on the type of the host (This fixes a memory leak)
- Hide the host type enums and class layout as implementation details
- Make the `Parse` methods members of `URLHost`
- Turn `WriteHost` into a `ToString()` method on the `URLHost` class
- Verify that at the beginning of a parse attempt, the type is set
  to “failed”
- Remove a lot of `goto`s from the source code 🐢🚀

PR-URL: nodejs#17470
Fixes: nodejs#17448
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
BridgeAR pushed a commit to BridgeAR/node that referenced this pull request Dec 12, 2017
`last_piece` pointed to the end of the 8×16 bit array,
so `piece_pointer == last_piece` already means that the pointer
is not writable any longer.

Previously, this still worked most of the time but could
result in an out-of-bounds-write.

Also, rename `last_piece` to `buffer_end` to avoid this pitfall.

PR-URL: nodejs#17470
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
@BridgeAR
Copy link
Member

Landed in e8a5f7b, 9236dfe, 89b3746, 2050bab

@BridgeAR BridgeAR closed this Dec 12, 2017
@addaleax addaleax removed the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Dec 13, 2017
@addaleax addaleax deleted the url-refactor branch December 13, 2017 05:20
@MylesBorins
Copy link
Contributor

@TimothyGu does this need to be included in the backport?

@gibfahn
Copy link
Member

gibfahn commented Dec 20, 2017

@trygve-lie , in reply to #17774 (comment)

Normally we don't backport to LTS until things have been in a current release (9.x) for at least two weeks. This hasn't gone into a 9.x release yet, so we'd be talking 6 weeks before this could land in v8.x.

If this fixes a serious bug we should consider including it in the v8.9.4 release candidate build going out today, so that it goes out with v8.9.4 in two weeks.

What would be helpful is an idea of:

  1. How much pain this bug causes
  2. How many people are going to hit this
  3. How minor the fix is (how much risk is there in rushing it)

Thoughts @nodejs/lts ?

MylesBorins pushed a commit that referenced this pull request Feb 5, 2018
- Remove pointless pointers
- Make `WriteHost` take a const argument so that it’s functionality
  is clear from the signature
- Make `FindLongestZeroSequence` templated to accommodate the constness
  in `WriteHost` and because using `uint16_t` is an articifial,
  unnecessary restriction
- Remove string copying when no copies are needed
- Make `PercentDecode` just return its return value
- Make `ParseHost` (string-only version) take its constant argument
  as a constant reference

Backport-PR-URL: #18324
PR-URL: #17470
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
MylesBorins pushed a commit that referenced this pull request Feb 5, 2018
This helps because `static` doesn’t work for C++ classes,
but refactoring `url_host` into a proper C++ class seems the
most reasonable soluation for the memory leak fixed by the next commit.

Backport-PR-URL: #18324
PR-URL: #17470
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
MylesBorins pushed a commit that referenced this pull request Feb 5, 2018
- Gives `URLHost` a proper destructor that clears memory
  depending on the type of the host (This fixes a memory leak)
- Hide the host type enums and class layout as implementation details
- Make the `Parse` methods members of `URLHost`
- Turn `WriteHost` into a `ToString()` method on the `URLHost` class
- Verify that at the beginning of a parse attempt, the type is set
  to “failed”
- Remove a lot of `goto`s from the source code 🐢🚀

Backport-PR-URL: #18324
PR-URL: #17470
Fixes: #17448
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
MylesBorins pushed a commit that referenced this pull request Feb 5, 2018
`last_piece` pointed to the end of the 8×16 bit array,
so `piece_pointer == last_piece` already means that the pointer
is not writable any longer.

Previously, this still worked most of the time but could
result in an out-of-bounds-write.

Also, rename `last_piece` to `buffer_end` to avoid this pitfall.

Backport-PR-URL: #18324
PR-URL: #17470
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
@MylesBorins MylesBorins mentioned this pull request Feb 7, 2018
MylesBorins pushed a commit that referenced this pull request Feb 11, 2018
- Remove pointless pointers
- Make `WriteHost` take a const argument so that it’s functionality
  is clear from the signature
- Make `FindLongestZeroSequence` templated to accommodate the constness
  in `WriteHost` and because using `uint16_t` is an articifial,
  unnecessary restriction
- Remove string copying when no copies are needed
- Make `PercentDecode` just return its return value
- Make `ParseHost` (string-only version) take its constant argument
  as a constant reference

Backport-PR-URL: #18324
PR-URL: #17470
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
MylesBorins pushed a commit that referenced this pull request Feb 11, 2018
This helps because `static` doesn’t work for C++ classes,
but refactoring `url_host` into a proper C++ class seems the
most reasonable soluation for the memory leak fixed by the next commit.

Backport-PR-URL: #18324
PR-URL: #17470
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
MylesBorins pushed a commit that referenced this pull request Feb 11, 2018
- Gives `URLHost` a proper destructor that clears memory
  depending on the type of the host (This fixes a memory leak)
- Hide the host type enums and class layout as implementation details
- Make the `Parse` methods members of `URLHost`
- Turn `WriteHost` into a `ToString()` method on the `URLHost` class
- Verify that at the beginning of a parse attempt, the type is set
  to “failed”
- Remove a lot of `goto`s from the source code 🐢🚀

Backport-PR-URL: #18324
PR-URL: #17470
Fixes: #17448
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
MylesBorins pushed a commit that referenced this pull request Feb 11, 2018
`last_piece` pointed to the end of the 8×16 bit array,
so `piece_pointer == last_piece` already means that the pointer
is not writable any longer.

Previously, this still worked most of the time but could
result in an out-of-bounds-write.

Also, rename `last_piece` to `buffer_end` to avoid this pitfall.

Backport-PR-URL: #18324
PR-URL: #17470
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
MylesBorins pushed a commit that referenced this pull request Feb 12, 2018
- Remove pointless pointers
- Make `WriteHost` take a const argument so that it’s functionality
  is clear from the signature
- Make `FindLongestZeroSequence` templated to accommodate the constness
  in `WriteHost` and because using `uint16_t` is an articifial,
  unnecessary restriction
- Remove string copying when no copies are needed
- Make `PercentDecode` just return its return value
- Make `ParseHost` (string-only version) take its constant argument
  as a constant reference

Backport-PR-URL: #18324
PR-URL: #17470
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
MylesBorins pushed a commit that referenced this pull request Feb 12, 2018
This helps because `static` doesn’t work for C++ classes,
but refactoring `url_host` into a proper C++ class seems the
most reasonable soluation for the memory leak fixed by the next commit.

Backport-PR-URL: #18324
PR-URL: #17470
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
MylesBorins pushed a commit that referenced this pull request Feb 12, 2018
- Gives `URLHost` a proper destructor that clears memory
  depending on the type of the host (This fixes a memory leak)
- Hide the host type enums and class layout as implementation details
- Make the `Parse` methods members of `URLHost`
- Turn `WriteHost` into a `ToString()` method on the `URLHost` class
- Verify that at the beginning of a parse attempt, the type is set
  to “failed”
- Remove a lot of `goto`s from the source code 🐢🚀

Backport-PR-URL: #18324
PR-URL: #17470
Fixes: #17448
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
MylesBorins pushed a commit that referenced this pull request Feb 12, 2018
`last_piece` pointed to the end of the 8×16 bit array,
so `piece_pointer == last_piece` already means that the pointer
is not writable any longer.

Previously, this still worked most of the time but could
result in an out-of-bounds-write.

Also, rename `last_piece` to `buffer_end` to avoid this pitfall.

Backport-PR-URL: #18324
PR-URL: #17470
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
MylesBorins pushed a commit that referenced this pull request Feb 13, 2018
- Remove pointless pointers
- Make `WriteHost` take a const argument so that it’s functionality
  is clear from the signature
- Make `FindLongestZeroSequence` templated to accommodate the constness
  in `WriteHost` and because using `uint16_t` is an articifial,
  unnecessary restriction
- Remove string copying when no copies are needed
- Make `PercentDecode` just return its return value
- Make `ParseHost` (string-only version) take its constant argument
  as a constant reference

Backport-PR-URL: #18324
PR-URL: #17470
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
MylesBorins pushed a commit that referenced this pull request Feb 13, 2018
This helps because `static` doesn’t work for C++ classes,
but refactoring `url_host` into a proper C++ class seems the
most reasonable soluation for the memory leak fixed by the next commit.

Backport-PR-URL: #18324
PR-URL: #17470
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
MylesBorins pushed a commit that referenced this pull request Feb 13, 2018
- Gives `URLHost` a proper destructor that clears memory
  depending on the type of the host (This fixes a memory leak)
- Hide the host type enums and class layout as implementation details
- Make the `Parse` methods members of `URLHost`
- Turn `WriteHost` into a `ToString()` method on the `URLHost` class
- Verify that at the beginning of a parse attempt, the type is set
  to “failed”
- Remove a lot of `goto`s from the source code 🐢🚀

Backport-PR-URL: #18324
PR-URL: #17470
Fixes: #17448
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
MylesBorins pushed a commit that referenced this pull request Feb 13, 2018
`last_piece` pointed to the end of the 8×16 bit array,
so `piece_pointer == last_piece` already means that the pointer
is not writable any longer.

Previously, this still worked most of the time but could
result in an out-of-bounds-write.

Also, rename `last_piece` to `buffer_end` to avoid this pitfall.

Backport-PR-URL: #18324
PR-URL: #17470
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. whatwg-url Issues and PRs related to the WHATWG URL implementation.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Why use new URL() may lead to memory leak ?
8 participants