-
Notifications
You must be signed in to change notification settings - Fork 36.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restore atoi64/GetIntArg saturating behavior #23841
Conversation
Nice catch. CScript::ParseScript shouldn't be consensus-critical though. |
Some discussion from IRC:
|
Indeed good catch, I wasn't aware it is safe and saturating to produce an overflow. Still, an explicit error would be better than silent saturation. See also https://github.com/bitcoin/bitcoin/pull/17385/files#diff-19427b0dd1a791adc728c82e88f267751ba4f1c751e19262cac03cccd2822216R509 |
I've pushed a change that restores compatibility by introducing a new function, I've also added tests for When this is merged I'll make a follow-up issue to surface InitErrors instead of saturating, which would be a good first issue I think. |
76242dd
to
6c7803d
Compare
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
If you want this merged, please make sure that each commit compiles and passes all tests. Otherwise it will break git bisect. |
6c7803d
to
c0be750
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can squash the remaining test commit into the feature commit. No need for a fresh commit.
Also, left a question.
(Note to myself: commit 5b78055 was removed in the latest rebase)
src/test/getarg_tests.cpp
Outdated
ResetArgs("-foo -bar"); | ||
BOOST_CHECK_EQUAL(m_local_args.GetIntArg("-foo", 11), 0); | ||
BOOST_CHECK_EQUAL(m_local_args.GetIntArg("-bar", 11), 0); | ||
BOOST_CHECK_EQUAL(m_local_args.GetIntArg("-foo", 0), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this different from the line preceding it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, this looks like a mistaken edit.
The new locale-independent atoi64 method introduced in bitcoin#20452 behaves differently for values passed which are greater than the uint64_t max. This commit is proof of that, meant to spur discussion on how to handle such an incompatibility. Introduce LocaleIndependentAtoi64 which behaves the same way that previous versions of Bitcoin Core has when faced with under- and overflow. This behavior was implicitly changed in bitcoin#20452, but has not yet been included in a release. Attempts to use LocaleIndependentAtoi for int64_t return values will result in a compilation error.
c0be750
to
ac1a5b1
Compare
Good catch. But concept NACK on reintroducing |
@@ -144,6 +145,11 @@ BOOST_AUTO_TEST_CASE(intarg) | |||
BOOST_CHECK_EQUAL(m_local_args.GetIntArg("-foo", 11), 0); | |||
BOOST_CHECK_EQUAL(m_local_args.GetIntArg("-bar", 11), 0); | |||
|
|||
// Check under-/overflow behavior. | |||
ResetArgs("-foo=-9223372036854775809 -bar=9223372036854775808"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not overflowing. It is equal to the min/max?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10:29:37 james@slug james/tmp % cat int64max.cpp
#include <limits>
#include <iostream>
#include <cinttypes>
int main() {
std::cout << std::numeric_limits<int64_t>::max() << '\n';
std::cout << std::numeric_limits<int64_t>::min() << '\n';
}
10:29:42 james@slug james/tmp % ./a.out
9223372036854775807
-9223372036854775808
@laanwj I think there is some confusion here: nowhere in the live code do I actually use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK ac1a5b1
Edited title |
@laanwj Are you still NACK on this? Otherwise I think this is rfm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review ACK ac1a5b1
According to Microsoft docs:
In the case of overflow with large positive integral values, _atoi64 returns I64_MAX and I64_MIN in the case of overflow with large negative, integral values.
However, this was not true for the LocaleIndependentAtoi<int64_t> function, which intended to replicate _atoi64 behavior. This function returns 0 both in case of underflow and overflow.
This PR fixes it by introducing a slightly modified LocaleIndependentAtoi function specifically for int64_t which correctly replicates _atoi64 behavior of return INT64_MAX in case of overflow, and INT64_MIN in case of the underflow.
The added tests make sense and work perfectly, which, as expected, failed on the Master branch.
Thanks for catching this and preventing a potentially major future issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ~0. I don't understand why this PR is forking the parse function and only removing the overflow footgun specifically in the int64_t
case and letting it remain in other cases, instead of parsing sanely in all cases. I also think the commit descriptions and function descriptions are handwavy and confusing. They should just say what actual behaviors are instead of gesturing vaguely at other functions and PRs without saying what the differences are.
Code review ACK ac1a5b1, though. This does seem like the smallest possible change that will fix a few int64 parses, and it has good test coverage.
* LocaleIndependentAtoi64 is provided for backwards compatibility reasons. | ||
* Since its behavior differs from standard atoi functionality, it is not a | ||
* template-specialization of the above function. | ||
* | ||
* New code should use ToIntegral or the ParseInt* functions which provide | ||
* parse error feedback. | ||
* | ||
* This aims to replicate the exact behaviour of atoi64 in previous versions of | ||
* Bitcoin Core. The old atoi64 utility function actually made use of `strtoll` | ||
* in its implementation (or Microsoft's `_atoi64`), which saturates over- and | ||
* underflows. That behavior is reproduced here. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "Restore atoi64 compatibility with old versions of Bitcoin Core" (ac1a5b1)
I find this description hard to follow. What is the point of referencing all these other functions and vaguely handwaving about what this function does of simply saying "Parse a decimal integer string. On positive overflow, return the maximum integer value, on negative overflow return the minimum integer value, and on error, return 0."?
template <typename T> | ||
// of atoi as it behaves under the "C" locale. | ||
// | ||
// Prevent use of LocaleIndependentAtoi for int64_t, since the function below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "Restore atoi64 compatibility with old versions of Bitcoin Core" (ac1a5b1)
I don't understand why this function exists at all. What use could there be for a parsing function that returns 0 in the case of overflow? It just seems like a footgun. Returning the biggest possible value when a bigger value is given, and the smallest possible value when a smaller value is given seems like a more useful and less dangerous way to parse values.
I guess it precedes this PR, but the naming also seems strange. Why call this one util function locale independent, when basically every util function we have should be locale independent? If we had use for a parsing function that did vary by locale it would make more sense to draw attention and mention locales in that function's name, not this function's name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that the name is long and ugly. It makes developers less likely to pick it, when there are better named and safer alternatives available that come with error reporting instead of implicitly mapping errors to random (in-range) return values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: #23841 (comment)
I like that the name is long and ugly. It makes developers less likely to pick it, when there are better named and safer alternatives available that come with error reporting instead of implicitly mapping errors to random (in-range) return values.
It's just seems a weird way to make the function name long by adding a random fact about locale independence. You could call it FrogsAreGreenAtoi
and that would be better name than LocaleIndependentAtoi
because it wouldn't falsely suggest that other functions depend on the current locale and this function doesn't. I think an ideal name would be something more like ParseIntUnchecked
or AtoiIgnoreErrors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FrogsAreGreenAtoi
Ryan, not all frogs are green. I'd suggest to use the name MyFrogIsGreenAtoi
or ParseSaturatedInt
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to MyFrogIsGreenAtoi
@@ -115,6 +118,21 @@ T LocaleIndependentAtoi(const std::string& str) | |||
return result; | |||
} | |||
|
|||
/** | |||
* LocaleIndependentAtoi64 is provided for backwards compatibility reasons. | |||
* Since its behavior differs from standard atoi functionality, it is not a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "Restore atoi64 compatibility with old versions of Bitcoin Core" (ac1a5b1)
I can't figure how this function is not a valid implementation of the standard function https://en.cppreference.com/w/cpp/string/byte/atoi. If its behavior is actually nonstandard, it would be good to say how.
In the int case it previously used Though, given that it is UB I can also see the argument to prefer less code and turn the UB into saturation. (In master it was turned to return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: #23841 (comment)
Though, given that it is UB I can also see the argument to prefer less code and turn the UB into saturation. (In master it was turned to return
0
IIRC).
Yes, that's what I'm trying to figure out. I don't understand why you would return 0 when the provided value is bigger than the maximum value, instead of returning the maximum value. Also why you would want two error-ignoring parsing functions with different and not clearly documented behavior instead of one error-ignoring parsing function with clearly documented behavior? And is there actually something different about the int64_t type that it needs to be treated differently than all the other integer types?
Anyway, I acked this PR since it seems OK to merge, but I think it would be nicer if the two functions were consolidated, given nicer behavior and documentation, and probably renamed.
template <typename T> | ||
// of atoi as it behaves under the "C" locale. | ||
// | ||
// Prevent use of LocaleIndependentAtoi for int64_t, since the function below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: #23841 (comment)
I like that the name is long and ugly. It makes developers less likely to pick it, when there are better named and safer alternatives available that come with error reporting instead of implicitly mapping errors to random (in-range) return values.
It's just seems a weird way to make the function name long by adding a random fact about locale independence. You could call it FrogsAreGreenAtoi
and that would be better name than LocaleIndependentAtoi
because it wouldn't falsely suggest that other functions depend on the current locale and this function doesn't. I think an ideal name would be something more like ParseIntUnchecked
or AtoiIgnoreErrors
No, not that I am aware of. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Withdraw my ACK. Seems more minimalistic to have one method instead of two.
I think there is some navel-gazing going on here. We are in a situation where a change has inadvertently broken compatibility with a prior release (albeit in a minor way). This change proposes to remedy that and is well-tested. If there are style concerns, I think they can be done in a follow-up. Otherwise I have other things to do and someone else can fix this in a way that reviewers find preferable. Edit:
The prior behavior of Bitcoin was to saturate only for int64s - are you proposing to special case this single method on the basis of type? I think having a single method that either saturates or doesn't on the basis of type is very misleading. |
No, simply make the existing method saturate instead of creating yet another one. See #23841 (review) If not done here, there will be a follow-up doing it. And I think that we should avoid follow-ups that completely remove code that was added previously. (They are fine when only a subset is followed-up on). If you don't have time or energy to fix it that is fine, just let us know so that it can be picked up. |
The code here is non-trivial to review, based on the bug that was accidentally introduced. So I don't think it is navel-gazing to consolidate the tricky code to just one function instead of two. Also, I don't think it is navel-gazing to prefer a +63-17 patch over a two patches (each of which is larger than the single one), first +88-13, then +42-71. |
Closing in favor of #24041 |
if i understand it correctly, i do prefer the approach here as a stop-gap compared to #24041. What I think makes the most sense is to do both of these (this PR first) and then over the course of a few releases:
Perhaps it's more trouble than it's worth to DTRT here, but I think it is, personally, since config files breaking is a big deal. |
I don't think there are any differences between this PR and #24041 with respect to config files. Config behavior unintentionally changed in #20452, and both this PR and #24041 restore the previous behavior. |
ok i was reading off of #24041 (comment) but it seems that is not the case, in which case i'll concept ack #24041 |
No, I'm not, using it in the tests is fine sorry for misunderstanding it, my brain doesn't work anymore. |
b5c9bb5 util: Restore GetIntArg saturating behavior (James O'Beirne) Pull request description: The new locale-independent atoi64 method introduced in #20452 parses large integer values higher than maximum representable value as 0 instead of the maximum value, which breaks backwards compatibility. This commit restores compatibility and adds test coverage for this case in terms of the related GetIntArg and strtoll functions. Specifically, command line or bitcoin.conf integer values greater than `9223372036854775807` (`2**63-1`) used to be parsed as `9223372036854775807` before #20452. Then #20452 caused them to be parsed as `0`. And after this PR they will be parsed as `9223372036854775807` again. This change is a stripped-down alternative version of #23841 by jamesob ACKs for top commit: jamesob: Github ACK b5c9bb5 vincenzopalazzo: ACK b5c9bb5 MarcoFalke: review ACK b5c9bb5 🌘 Tree-SHA512: 4e8abdbabf3cf4713cf5a7c5169539159f359ab4109a4e7e644cc2e9b2b0c3c532fad9f6b772daf015e1c5340ce59280cd9a41f2730afda6099cbf636b7d23ae
The new locale-independent atoi64 method introduced in #20452 behaves
differently for values passed which are greater than the uint64_t max.
This commit is proof of that (the attached test fails), and is meant to spur discussion
on how to handle such an incompatibility.
The change as committed in #20542 may break some end-usages of
bitcoind, but more than that may have consensus implications as this
deserialization mechanism is used in CScript::ParseScript.
I think thismeans it's possible that #20542 could have been an accidental soft fork?
Edit:
CScript::ParseScript
isn't consensus-critical, but is used in e.g. script descriptor parsing.