Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reject octal zeros in IPv4 addresses #86984

Merged
merged 6 commits into from
Oct 21, 2021
Merged

Conversation

syvb
Copy link
Contributor

@syvb syvb commented Jul 8, 2021

This fixes #86964 by rejecting octal zeros in IP addresses, such that 192.168.00.00000000 is rejected with a parse error, since having leading zeros in front of another zero indicates it is a zero written in octal notation, which is not allowed in the strict mode specified by RFC 6943 3.1.1. Octal rejection was implemented in #83652, but due to the way it was implemented octal zeros were still allowed.

@rust-highfive
Copy link
Collaborator

r? @Mark-Simulacrum

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 8, 2021
@cuviper
Copy link
Member

cuviper commented Jul 8, 2021

I understand we're currently a bit "loose" here, but is there any practical difference? It's a numeric 0 either way you look at it. So with your change, we would stop accepting octal zeros just for the sake of strictness?

@zacknewman
Copy link

zacknewman commented Jul 8, 2021

The documentation for std::net::Ipv4Addr pretty clearly states that the octets are in "decimal notation"; thus I personally don't even understand why the octet 001 is rejected since that is a perfectly reasonable base-10 representation of the integer 1. While that octet can be interpreted in any numeral system (including base-8), the octet 127 can also be interpreted in any numeral system starting from base-8. To me, this is inconsistent and very much confusing. It's like the parser says "127 can be in base-8, base-9, base-10, etc.; but we will default to interpreting it as base-10"; but for some reason, the parser doesn't apply that same logic to the octet 001.

The documentation also clearly cites IETF RFC 6943, but Section 3.1.1. in that RFC defines the "strict" form—the form that forbids other numeral systems (e.g., base-8)—as 4 octets in decimal notation separated by a "." such that each octet is between one and three digits. The current behavior doesn't conform to any obvious standard.

Apparently other languages default to interpreting any integer with padded 0s as base-8, but that is not true in most areas of Rust except here (e.g., assert_eq!("010".parse::<u32>().unwrap(), 10u32);).

To me if for some reason we want to forbid 0-padded octets, then it should be applied consistently for all integers not all integers except 0.

@cuviper
Copy link
Member

cuviper commented Jul 9, 2021

AIUI, it was defined in POSIX that a leading 0 in an IPv4 address should indicate octal. Rust was always parsing as decimal, but this could lead to bugs, even security issues, if you have system that combines Rust and POSIX ideas of a given input address. You might validate "010" in one mode and then use it in the other with a different interpretation. The Rust parser was changed to reject octal to avoid this confusion altogether.

You could debate whether "0" alone really has a leading zero to make it octal rather than decimal, but it usually doesn't matter. Longer strings of zeros are more plausibly octal, I guess, but it still seems pedantic to reject it.

@zacknewman
Copy link

zacknewman commented Jul 9, 2021

Hm, maybe this is yet another area where my math background is biting me in the butt because it's rather foreign to me to treat leading 0s as "more likely" to be base-8 than any other base since all numeral systems are based on the same fundamental algorithm. Seems overly restrictive to assume leading 0s means base-8 (with the exception of 0 apparently) too since clearly something like 008 is not base-8 due to the 8 not being a symbol in the base-8 numeral system; and since the documentation states each octet is in base-10, the problem of "how to interpret each octet" is already answered: base-10. I'm clearly in the minority with this point-of-view though, so I'll drop it.

@cuviper
Copy link
Member

cuviper commented Jul 9, 2021

Octal 0... goes back to C, at least: https://en.cppreference.com/w/c/language/integer_constant

  • decimal-constant is a non-zero decimal digit (1, 2, 3, 4, 5, 6, 7, 8, 9), followed by zero or more decimal digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
  • octal-constant is the digit zero (0) followed by zero or more octal digits (0, 1, 2, 3, 4, 5, 6, 7)

(which does mean a lone 0 is technically octal as well in C, not that it matters)

@RalfJung
Copy link
Member

RalfJung commented Jul 9, 2021

The documentation for std::net::Ipv4Addr pretty clearly states that the octets are in "decimal notation"; thus I personally don't even understand why the octet 001 is rejected since that is a perfectly reasonable base-10 representation of the integer 1.

The docs should probably be clarified: the octets are in decimal notation, and leading 0s are disallowed (except, as usual, for a plain 0).

This almost matches current behavior, except for 0000. So with this PR, those docs would be correct. Without this PR, the docs should say: the octets are in decimal notation, and leading 0s are disallowed, except for the number 0 itself which may have an arbitrary amount of leading 0s.

(I don't have an opinion between these two options, but this PR does not update the docs so IMO it is incomplete either way.)

Seems overly restrictive to assume leading 0s means base-8

I'm not disagreeing, but this is what other parts of the ecosystem do. It is much better for Rust to reject 011 in IP addresses than for both Rust and POSIX to accept it and interpret it differently -- would you agree to that? Rust does not exist in a vacuum, we have to take into account the surrounding reality here, and reality often uses a leading 0 to indicate octal notation (and yes this makes no mathematical sense; whoever decided that decades ago didn't ask mathematicians it seems, but without a time machine no amount of good arguments is going to change this). We could probably stretch the limits here and reject 011 but accept 008 (since POSIX could not accept the latter), but that seems even more surprising and weird than just rejecting leading 0s outright.

The docs possibly should say that this is why leading 0s are rejected.

@zacknewman
Copy link

zacknewman commented Jul 9, 2021

The docs should probably be clarified: the octets are in decimal notation, and leading 0s are disallowed (except, as usual, for a plain 0).

That is perfectly OK to me. While I would prefer something like 001.001.001.001 to be parseable because that is technically valid base-10 (at least mathematically)—I know my Nintendo Switch requires exactly 3 base-10 digits when typing in the IP—I realize that POSIX compatibility is very important.

I'm not disagreeing, but this is what other parts of the ecosystem do. It is much better for Rust to reject 011 in IP addresses than for both Rust and POSIX to accept it and interpret it differently -- would you agree to that? Rust does not exist in a vacuum, we have to take into account the surrounding reality here, and reality often uses a leading 0 to indicate octal notation (and yes this makes no mathematical sense; whoever decided that decades ago didn't ask mathematicians it seems, but without a time machine no amount of good arguments is going to change this). We could probably stretch the limits here and reject 011 but accept 008 (since POSIX could not accept the latter), but that seems even more surprising and weird than just rejecting leading 0s outright.

I absolutely agree. I was not aware POSIX stated that leading 0s meant base-8 until @cuviper pointed that out (i.e., my "complaint" would have been purely about the documentation and nothing else had I known this). I admit that I am probably in the minority when it comes to my background as a Rustacean. I've never written a line of assembly, C, C++, or kernel code in my life—something I'm not proud of, but c'est la vie—so it very well could be the case that "decimal notation" obviously means "mathematical base-10 with the added restriction that leading 0s are not allowed with the exception of 0 itself which is allowed to have any non-negative integer amount of leading 0s" to the majority of other Rustaceans.

Also want to add that what's considered "pedantry" to one may not be "pedantry" to another. While my 30+ years on this planet has shown me that my brain is wired very differently than most, I don't think this is an example of my brain thinking that differently. Case in point, I have a .CSV file of IPv4 addresses. Each row contains exactly 16 UTF-8 code units—with the last Unicode scalar value being a newline (i.e., U+000A). Based on the documentation, I wrote code that used this parsing algorithm; but of course, I was dismayed when I saw it fail. Ideally when one reads the documentation, they should have a good sense of the invariants and what's supposed to work and not. Obviously it didn't take long for me to change my implementation, but it would have been nice if I knew something like 001.001.001.001 would lead to a parsing failure a priori. Honestly even if two examples were added to the documentation that highlighted something like 001.001.001.001 leading to a parsing failure and 0000000.0.0000.00 leading to a parsing success would be enough for someone like me to deduce what "decimal" notation means in this context.

@RalfJung
Copy link
Member

RalfJung commented Jul 9, 2021

so it very well could be the case that "decimal notation" obviously means "mathematical base-10 with the added restriction that leading 0s are not allowed with the exception of 0 itself which is allowed to have any non-negative integer amount of leading 0s" to the majority of other Rustaceans.

I don't think it does. :) Leading 0s being code for "base 8" is a weird C thing that Rust did not do for its own integer literals, but that doesn't help us here. And either way the docs should be understandable for people with lots of different backgrounds, so if it is confusing for some, we should fix that! It's near impossible for one person (whoever writes the docs) to foresee all the ways it can be misleading, which is why documentation feedback and bugreports are extremely valuable. :)

So, we are in full agreement that the docs should document this more clearly. Usually, lading 0s are allowed, as you pointed out in your bugreport.

@RalfJung
Copy link
Member

RalfJung commented Jul 9, 2021

While I would prefer something like 001.001.001.001 to be parseable because that is technically valid base-10 (at least mathematically)—I know my Nintendo Switch requires exactly 3 base-10 digits when typing in the IP—I realize that POSIX compatibility is very important.

What I could imagine being useful (but I am not a libs person) is a parsing function that explicitly is not POSIX compatible, and that does interpret 010.010.010.010 entirely in decimal. But that should not be the default parsing function. I am not sure if the Rust stdlib is the right place for this, but it might be worth suggesting.

I am honestly quite shocked that POSIX would interpret this as 8.8.8.8, and I did not know that this is the case... but my ping agrees:

$ ping 010.010.010.010
PING 010.010.010.010 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=118 time=11.5 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=118 time=10.4 ms

The way octal literals are written in IP addresses differs from the way
they are written in Rust code, so the way that octal/hex literals in IPs
are written is explictly mentioned.
@syvb
Copy link
Contributor Author

syvb commented Jul 9, 2021

I've updated the documentation for Ipv4Addr to be more clear as what octal/hex literals are, and added some examples of which ones are rejected.

Now that there can't be a bunch of leading zeros, parsing can be
optimized a bit.
@zacknewman
Copy link

zacknewman commented Jul 9, 2021

Now this probably is a pedantic request—thus something I am not that passionate about and OK with not doing—but an example like the parsing of 0000.0.0.0 leading to a successful result despite having leading 0s is informative as that result does not obviously follow from the documentation (also has the property that more than 3 digits are allowed despite going against the "strict" form mentioned in RFC 6943 Section 3.1.1.).

Once I know how to do "pull requests", I won't ask others to do such a thing.

@marmeladema
Copy link
Contributor

Does this require libs team signoff or an fcp or something? This is another breaking change after the one that rejected the octal notation. I've encountered the regression at work and will probably encounter this one too when its available in stable.

That being said, I think the PR does make things more consistent and coherent.

@Mark-Simulacrum
Copy link
Member

r? @joshtriplett for libs-api(?) examination of this issue. I think it probably merits an FCP, at least...

@camelid camelid added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Jul 29, 2021
Co-authored-by: Cheng XU <3105373+xu-cheng@users.noreply.github.com>
@camelid camelid added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 29, 2021
@joshtriplett joshtriplett added the relnotes Marks issues that should be documented in the release notes of the next release. label Oct 4, 2021
@joshtriplett
Copy link
Member

I think we should go ahead with this change.

@rfcbot merge

@rfcbot
Copy link

rfcbot commented Oct 4, 2021

Team member @joshtriplett has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Oct 4, 2021
@joshtriplett joshtriplett added S-waiting-on-fcp Status: PR is in FCP and is awaiting for FCP to complete. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 4, 2021
@BurntSushi
Copy link
Member

BurntSushi commented Oct 5, 2021

I was personally somewhat on the side of just interpreting octets as decimal, even with a leading zero, POSIX be damned. But one thing that convinced me otherwise was the fact that, well, those leading zeroes are indeed interpreted as indicating octal base in many other places. So if our routine accepts the same IP but gives a different result than others, that could indeed be quite surprising. So I think it's the right decision to fail here. (Or alternatively, just implement octal support.)

(I am not aware of the history of rejecting octal in this routine, so I may just be re-treading old ground, but wanted to mention it since there was some discussion above.)

@rfcbot rfcbot added the final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. label Oct 5, 2021
@rfcbot
Copy link

rfcbot commented Oct 5, 2021

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot removed the proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. label Oct 5, 2021
@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this PR / Issue. to-announce Announce this issue on triage meeting and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Oct 15, 2021
@rfcbot
Copy link

rfcbot commented Oct 15, 2021

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

@m-ou-se
Copy link
Member

m-ou-se commented Oct 20, 2021

@bors r+

@bors
Copy link
Contributor

bors commented Oct 20, 2021

📌 Commit 403d269 has been approved by m-ou-se

@bors bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Oct 20, 2021
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Oct 20, 2021
Reject octal zeros in IPv4 addresses

This fixes rust-lang#86964 by rejecting octal zeros in IP addresses, such that `192.168.00.00000000` is rejected with a parse error, since having leading zeros in front of another zero indicates it is a zero written in octal notation, which is not allowed in the strict mode specified by RFC 6943 3.1.1. Octal rejection was implemented in rust-lang#83652, but due to the way it was implemented octal zeros were still allowed.
JohnTitor added a commit to JohnTitor/rust that referenced this pull request Oct 21, 2021
Reject octal zeros in IPv4 addresses

This fixes rust-lang#86964 by rejecting octal zeros in IP addresses, such that `192.168.00.00000000` is rejected with a parse error, since having leading zeros in front of another zero indicates it is a zero written in octal notation, which is not allowed in the strict mode specified by RFC 6943 3.1.1. Octal rejection was implemented in rust-lang#83652, but due to the way it was implemented octal zeros were still allowed.
bors added a commit to rust-lang-ci/rust that referenced this pull request Oct 21, 2021
Rollup of 14 pull requests

Successful merges:

 - rust-lang#86984 (Reject octal zeros in IPv4 addresses)
 - rust-lang#87440 (Remove unnecessary condition in Barrier::wait())
 - rust-lang#88644 (`AbstractConst` private fields)
 - rust-lang#89292 (Stabilize CString::from_vec_with_nul[_unchecked])
 - rust-lang#90010 (Avoid overflow in `VecDeque::with_capacity_in()`.)
 - rust-lang#90029 (Add test for debug logging during incremental compilation)
 - rust-lang#90031 (config: add the option to enable LLVM tests)
 - rust-lang#90048 (Add test for line-number setting)
 - rust-lang#90071 (Remove hir::map::blocks and use FnKind instead)
 - rust-lang#90074 (2229 migrations small cleanup)
 - rust-lang#90077 (Make `From` impls of NonZero integer const.)
 - rust-lang#90097 (Add test for duplicated sidebar entries for reexported macro)
 - rust-lang#90098 (Add test to ensure that the missing_doc_code_examples is not triggered on foreign trait implementations)
 - rust-lang#90099 (Fix MIRI UB in `Vec::swap_remove`)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 09de34c into rust-lang:master Oct 21, 2021
@rustbot rustbot added this to the 1.58.0 milestone Oct 21, 2021
@apiraino apiraino removed the to-announce Announce this issue on triage meeting label Oct 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. relnotes Marks issues that should be documented in the release notes of the next release. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. S-waiting-on-fcp Status: PR is in FCP and is awaiting for FCP to complete. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

std::net::Ipv4Addr parsing violates the "strict" form in IETF RFC 6943 Section 3.1.1.