Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: refactor EndsInANumber in node_url.cc and adds IsIPv4NumberValid #46227

Merged
merged 14 commits into from
Jan 23, 2023
Merged
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 51 additions & 14 deletions src/node_url.cc
Original file line number Diff line number Diff line change
Expand Up @@ -407,29 +407,66 @@ int64_t ParseIPv4Number(const char* start, const char* end) {
return strtoll(start, nullptr, R);
}

// https://url.spec.whatwg.org/#ipv4-number-parser
bool IsIPv4NumberValid(std::string_view input) {
anonrig marked this conversation as resolved.
Show resolved Hide resolved
if (input.empty()) {
return false;
}

if (input.size() >= 2 && input[0] == '0') {
anonrig marked this conversation as resolved.
Show resolved Hide resolved
if (input[1] == 'X' || input[1] == 'x') {
if (input.size() == 2) {
return true;
}

return input.find_first_not_of("0123456789abcdefABCDEF", 2) ==
anonrig marked this conversation as resolved.
Show resolved Hide resolved
std::string_view::npos;

} else {
anonrig marked this conversation as resolved.
Show resolved Hide resolved
if (input.size() == 1) {
anonrig marked this conversation as resolved.
Show resolved Hide resolved
return true;
}

return input.find_first_not_of("01234567", 1) == std::string_view::npos;
}
}

return std::all_of(input.begin(), input.end(), ::isdigit);
anonrig marked this conversation as resolved.
Show resolved Hide resolved
}

// https://url.spec.whatwg.org/#ends-in-a-number-checker
bool EndsInANumber(const std::string& input) {
std::vector<std::string> parts = SplitString(input, '.', false);
bool EndsInANumber(const std::string_view input) {
anonrig marked this conversation as resolved.
Show resolved Hide resolved
if (input.empty()) {
return false;
}

char delimiter = '.';
auto pointer_start = input.begin();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use std::string_view::iterator it might solve the issues related to std::string_view construct on windows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows build is still failing : /

D:\a\node\node\src\node_url.cc(465,66): error C2440: '<function-style-cast>': cannot convert from 'initializer list' to 'std::string_view' [D:\a\node\node\libnode.vcxproj]
D:\a\node\node\src\node_url.cc(465,66): message : No constructor could take the source type, or constructor overload resolution was ambiguous [D:\a\node\node\libnode.vcxproj]
  node_util.cc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to know if I can go with &(*it); again

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@addaleax any recommendations?

Copy link
Contributor Author

@miguelteixeiraa miguelteixeiraa Jan 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'd look like this:
std::string_view(&(*pointer_start), pointer_end - pointer_start));
just to make the build pass on windows

Copy link
Contributor Author

@miguelteixeiraa miguelteixeiraa Jan 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What i understand is that iterators don't need to be pointers although they could be implemented as one. And some implementations of the STL (MSVC++, Mingw) implement interators with special classes (with no pre-difined casts to pointer), so that we cannot use them directly in the string_view constructor that we are using, which needs a charT*.

then we could use * which is overloaded to do what logically mean: convert the iterator type to its pointed-to object, then pass the address of this &(*it).

https://en.cppreference.com/w/cpp/named_req/RandomAccessIterator
https://stackoverflow.com/questions/32654108/c-stdvectoriterator-is-not-a-pointer-why/64910815#64910815

auto pointer_end = input.end();

if (parts.empty()) return false;
uint8_t parts_size = std::count(pointer_start, pointer_end, delimiter);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why uint8_t?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's an IPv4, so there shouldn't be more parts than, lets say, 4. In short.. it would always be a small number (I think uint8_t still quite big for this)
but yeah.. maybe I should change to what std::count returns so it don't mess up the number..
Do you think it should be changed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uint8_t is correct. referencing spec:

A valid IPv4-address string must be four shortest possible strings of [ASCII digits]
(https://infra.spec.whatwg.org/#ascii-digit), representing a decimal number in the 
range 0 to 255, inclusive, separated from each other by U+002E (.).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe I should change to what std::count returns so it don't mess up the number..

I think that’s a good idea, yeah. Reading this code it’s really not obvious that anything in the code guarantees that the return value fits into an uint8_t, even if that’s a type that can hold the all valid results. As an extreme case, if input contains 256 dots, parts_size would be 0 here, and I assume we don’t want to treat that the same as a string that doesn’t contain any dots.

++parts_size;

if (parts.back().empty()) {
if (parts.size() == 1) return false;
parts.pop_back();
if (input.back() == delimiter) {
--pointer_end;
--parts_size;
}

const std::string& last = parts.back();
if (parts_size > 1) {
pointer_start +=
input.rfind(delimiter, std::distance(pointer_start, pointer_end) - 1) +
1;
}

// If last is non-empty and contains only ASCII digits, then return true
if (!last.empty() && std::all_of(last.begin(), last.end(), ::isdigit)) {
return true;
if (std::distance(pointer_start, pointer_end) == 0) {
return false;
}

const char* last_str = last.c_str();
int64_t num = ParseIPv4Number(last_str, last_str + last.size());
if (num >= 0) return true;
if (std::all_of(pointer_start, pointer_end, ::isdigit)) {
return true;
}

return false;
return IsIPv4NumberValid(std::string(pointer_start, pointer_end));
addaleax marked this conversation as resolved.
Show resolved Hide resolved
}

void URLHost::ParseIPv4Host(const char* input, size_t length) {
Expand Down