Restore atoi64/GetIntArg saturating behavior #23841

jamesob · 2021-12-22T17:18:54Z

The new locale-independent atoi64 method introduced in #20452 behaves
differently for values passed which are greater than the uint64_t max.
This commit is proof of that (the attached test fails), and is meant to spur discussion
on how to handle such an incompatibility.

The change as committed in #20542 may break some end-usages of
bitcoind, but more than that may have consensus implications as this
deserialization mechanism is used in CScript::ParseScript. I think this
means it's possible that #20542 could have been an accidental soft fork?

Edit: CScript::ParseScript isn't consensus-critical, but is used in e.g. script descriptor parsing.

sipa · 2021-12-22T17:25:13Z

Nice catch.

CScript::ParseScript shouldn't be consensus-critical though.

jamesob · 2021-12-22T17:29:55Z

Some discussion from IRC:

12:21 <MarcoFalke> jamesob: Integer overflow inside atoi is UB or at least unspecified behavior, last time I checked
12:22 <MarcoFalke> That is: Parsing 999999999999 can give you any value (it doesn't have to be 0 or max or 999999999999 mod max)
12:23 <MarcoFalke> jamesob: It is also not used in consensus code, so this is not a consensus change
12:24 <jamesob> this is not consensus? https://github.com/bitcoin/bitcoin/pull/20452/files#diff-1db27ed1bfbf61ea0fe64447413ef9f24238be710e7ca4ae9c7bc7a5c994eca0L72-R72
12:26 <jamesob> so here's the tricky thing: our atoi64 wasn't actually using atoi, but strtoll - for which oveflow behavior is well defined: "The strtol() function returns the result of the conversion, unless the value would underflow or overflow. If an underflow occurs, strtol() returns LONG_MIN. If an overflow occurs, strtol() returns LONG_MAX."
12:26 <jamesob> ("The strtoll() function works just like the strtol() function but returns a long long integer value.")
12:28 <MarcoFalke> Yeah, not consensus. Could be used by bitcoin-util or some RPC function.
12:29 <jamesob> Phew

maflcko · 2021-12-22T17:36:36Z

Indeed good catch, I wasn't aware it is safe and saturating to produce an overflow. Still, an explicit error would be better than silent saturation. See also https://github.com/bitcoin/bitcoin/pull/17385/files#diff-19427b0dd1a791adc728c82e88f267751ba4f1c751e19262cac03cccd2822216R509

jamesob · 2021-12-22T19:25:46Z

I've pushed a change that restores compatibility by introducing a new function, LocaleIndependentAtoi64. This is specifically not a template specialization of its more generic cousin because the under-/overflow behavior differs (as was the case previously with our atoi64 implementation). A compile error is rendered if someone attempts to use the generic version with int64_t types.

I've also added tests for GetIntArg that assert the saturation behavior.

When this is merged I'll make a follow-up issue to surface InitErrors instead of saturating, which would be a good first issue I think.

DrahtBot · 2021-12-23T07:03:04Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#24041 (Restore atoi compatibility with old versions of Bitcoin Core by ryanofsky)
#17783 (util: Fix -norpcwhitelist, -norpcallowip, and similar corner case behavior by ryanofsky)
#17581 (refactor: Remove settings merge reverse precedence code by ryanofsky)
#17580 (refactor: Add ALLOW_LIST flags and enforce usage in CheckArgFlags by ryanofsky)
#17493 (util: Forbid ambiguous multiple assignments in config file by ryanofsky)
#16545 (refactor: Implement missing error checking for ArgsManager flags by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

maflcko · 2022-01-03T14:59:36Z

If you want this merged, please make sure that each commit compiles and passes all tests. Otherwise it will break git bisect.

maflcko

I think you can squash the remaining test commit into the feature commit. No need for a fresh commit.

Also, left a question.

(Note to myself: commit 5b78055 was removed in the latest rebase)

maflcko · 2022-01-04T07:59:34Z

src/test/getarg_tests.cpp

-    ResetArgs("-foo -bar");
-    BOOST_CHECK_EQUAL(m_local_args.GetIntArg("-foo", 11), 0);
-    BOOST_CHECK_EQUAL(m_local_args.GetIntArg("-bar", 11), 0);
+    BOOST_CHECK_EQUAL(m_local_args.GetIntArg("-foo", 0), 0);


How is this different from the line preceding it?

Oops, this looks like a mistaken edit.

The new locale-independent atoi64 method introduced in bitcoin#20452 behaves differently for values passed which are greater than the uint64_t max. This commit is proof of that, meant to spur discussion on how to handle such an incompatibility. Introduce LocaleIndependentAtoi64 which behaves the same way that previous versions of Bitcoin Core has when faced with under- and overflow. This behavior was implicitly changed in bitcoin#20452, but has not yet been included in a release. Attempts to use LocaleIndependentAtoi for int64_t return values will result in a compilation error.

laanwj · 2022-01-05T15:15:00Z

Good catch. But concept NACK on reintroducing strtoll usage. I would prefer implementing the desired compatible behavior in an explicit way using new functions, instead of delegating it to the C library's potentially locale-dependent function.
(or maybe even better: explicit parse error feedback in case of out-of-range)

maflcko · 2022-01-05T15:22:20Z

src/test/getarg_tests.cpp

@@ -144,6 +145,11 @@ BOOST_AUTO_TEST_CASE(intarg)
    BOOST_CHECK_EQUAL(m_local_args.GetIntArg("-foo", 11), 0);
    BOOST_CHECK_EQUAL(m_local_args.GetIntArg("-bar", 11), 0);

+    // Check under-/overflow behavior.
+    ResetArgs("-foo=-9223372036854775809 -bar=9223372036854775808");


This is not overflowing. It is equal to the min/max?

10:29:37 james@slug james/tmp % cat int64max.cpp #include <limits> #include <iostream> #include <cinttypes> int main() { std::cout << std::numeric_limits<int64_t>::max() << '\n'; std::cout << std::numeric_limits<int64_t>::min() << '\n'; } 10:29:42 james@slug james/tmp % ./a.out 9223372036854775807 -9223372036854775808

jamesob · 2022-01-05T15:22:46Z

@laanwj I think there is some confusion here: nowhere in the live code do I actually use strtoll; it's just used in tests to model compatibility with the old implementation in previous versions.

maflcko

ack

src/util/strencodings.cpp

vincenzopalazzo

ACK ac1a5b1

maflcko · 2022-01-10T07:55:00Z

Edited title

maflcko · 2022-01-10T07:55:19Z

@laanwj Are you still NACK on this? Otherwise I think this is rfm

shaavan

Code Review ACK ac1a5b1

According to Microsoft docs:

In the case of overflow with large positive integral values, _atoi64 returns I64_MAX and I64_MIN in the case of overflow with large negative, integral values.

However, this was not true for the LocaleIndependentAtoi<int64_t> function, which intended to replicate _atoi64 behavior. This function returns 0 both in case of underflow and overflow.

This PR fixes it by introducing a slightly modified LocaleIndependentAtoi function specifically for int64_t which correctly replicates _atoi64 behavior of return INT64_MAX in case of overflow, and INT64_MIN in case of the underflow.

The added tests make sense and work perfectly, which, as expected, failed on the Master branch.
Thanks for catching this and preventing a potentially major future issue.

ryanofsky

Concept ~0. I don't understand why this PR is forking the parse function and only removing the overflow footgun specifically in the int64_t case and letting it remain in other cases, instead of parsing sanely in all cases. I also think the commit descriptions and function descriptions are handwavy and confusing. They should just say what actual behaviors are instead of gesturing vaguely at other functions and PRs without saying what the differences are.

Code review ACK ac1a5b1, though. This does seem like the smallest possible change that will fix a few int64 parses, and it has good test coverage.

ryanofsky · 2022-01-10T17:07:06Z

src/util/strencodings.h

+ * LocaleIndependentAtoi64 is provided for backwards compatibility reasons.
+ * Since its behavior differs from standard atoi functionality, it is not a
+ * template-specialization of the above function.
+ *
+ * New code should use ToIntegral or the ParseInt* functions which provide
+ * parse error feedback.
+ *
+ * This aims to replicate the exact behaviour of atoi64 in previous versions of
+ * Bitcoin Core. The old atoi64 utility function actually made use of `strtoll`
+ * in its implementation (or Microsoft's `_atoi64`), which saturates over- and
+ * underflows. That behavior is reproduced here.
+ */


In commit "Restore atoi64 compatibility with old versions of Bitcoin Core" (ac1a5b1)

I find this description hard to follow. What is the point of referencing all these other functions and vaguely handwaving about what this function does of simply saying "Parse a decimal integer string. On positive overflow, return the maximum integer value, on negative overflow return the minimum integer value, and on error, return 0."?

ryanofsky · 2022-01-10T17:15:18Z

src/util/strencodings.h

-template <typename T>
+// of atoi as it behaves under the "C" locale.
+//
+// Prevent use of LocaleIndependentAtoi for int64_t, since the function below


In commit "Restore atoi64 compatibility with old versions of Bitcoin Core" (ac1a5b1)

I don't understand why this function exists at all. What use could there be for a parsing function that returns 0 in the case of overflow? It just seems like a footgun. Returning the biggest possible value when a bigger value is given, and the smallest possible value when a smaller value is given seems like a more useful and less dangerous way to parse values.

I guess it precedes this PR, but the naming also seems strange. Why call this one util function locale independent, when basically every util function we have should be locale independent? If we had use for a parsing function that did vary by locale it would make more sense to draw attention and mention locales in that function's name, not this function's name.

I like that the name is long and ugly. It makes developers less likely to pick it, when there are better named and safer alternatives available that come with error reporting instead of implicitly mapping errors to random (in-range) return values.

re: #23841 (comment)

I like that the name is long and ugly. It makes developers less likely to pick it, when there are better named and safer alternatives available that come with error reporting instead of implicitly mapping errors to random (in-range) return values.

It's just seems a weird way to make the function name long by adding a random fact about locale independence. You could call it FrogsAreGreenAtoi and that would be better name than LocaleIndependentAtoi because it wouldn't falsely suggest that other functions depend on the current locale and this function doesn't. I think an ideal name would be something more like ParseIntUnchecked or AtoiIgnoreErrors

FrogsAreGreenAtoi

Ryan, not all frogs are green. I'd suggest to use the name MyFrogIsGreenAtoi or ParseSaturatedInt.

+1 to MyFrogIsGreenAtoi

ryanofsky · 2022-01-10T17:29:39Z

src/util/strencodings.h

@@ -115,6 +118,21 @@ T LocaleIndependentAtoi(const std::string& str)
    return result;
 }

+/**
+ * LocaleIndependentAtoi64 is provided for backwards compatibility reasons.
+ * Since its behavior differs from standard atoi functionality, it is not a


In commit "Restore atoi64 compatibility with old versions of Bitcoin Core" (ac1a5b1)

I can't figure how this function is not a valid implementation of the standard function https://en.cppreference.com/w/cpp/string/byte/atoi. If its behavior is actually nonstandard, it would be good to say how.

maflcko · 2022-01-11T09:06:01Z

I don't understand why this PR is forking the parse function and only removing the overflow footgun specifically in the int64_t case

In the int case it previously used atoi, which runs into UB (not saturation). See #23841 (comment) . So I presume the goal was to make minimal changes?

Though, given that it is UB I can also see the argument to prefer less code and turn the UB into saturation. (In master it was turned to return 0 IIRC).

ryanofsky

re: #23841 (comment)

Though, given that it is UB I can also see the argument to prefer less code and turn the UB into saturation. (In master it was turned to return 0 IIRC).

Yes, that's what I'm trying to figure out. I don't understand why you would return 0 when the provided value is bigger than the maximum value, instead of returning the maximum value. Also why you would want two error-ignoring parsing functions with different and not clearly documented behavior instead of one error-ignoring parsing function with clearly documented behavior? And is there actually something different about the int64_t type that it needs to be treated differently than all the other integer types?

Anyway, I acked this PR since it seems OK to merge, but I think it would be nicer if the two functions were consolidated, given nicer behavior and documentation, and probably renamed.

ryanofsky · 2022-01-11T15:06:30Z

src/util/strencodings.h

-template <typename T>
+// of atoi as it behaves under the "C" locale.
+//
+// Prevent use of LocaleIndependentAtoi for int64_t, since the function below


re: #23841 (comment)

I like that the name is long and ugly. It makes developers less likely to pick it, when there are better named and safer alternatives available that come with error reporting instead of implicitly mapping errors to random (in-range) return values.

It's just seems a weird way to make the function name long by adding a random fact about locale independence. You could call it FrogsAreGreenAtoi and that would be better name than LocaleIndependentAtoi because it wouldn't falsely suggest that other functions depend on the current locale and this function doesn't. I think an ideal name would be something more like ParseIntUnchecked or AtoiIgnoreErrors

maflcko · 2022-01-11T15:37:27Z

is there actually something different about the int64_t type that it needs to be treated differently than all the other integer types?

No, not that I am aware of.

maflcko

Withdraw my ACK. Seems more minimalistic to have one method instead of two.

jamesob · 2022-01-11T16:21:33Z

I think there is some navel-gazing going on here. We are in a situation where a change has inadvertently broken compatibility with a prior release (albeit in a minor way). This change proposes to remedy that and is well-tested.

If there are style concerns, I think they can be done in a follow-up. Otherwise I have other things to do and someone else can fix this in a way that reviewers find preferable.

Edit:

Seems more minimalistic to have one method instead of two.

The prior behavior of Bitcoin was to saturate only for int64s - are you proposing to special case this single method on the basis of type? I think having a single method that either saturates or doesn't on the basis of type is very misleading.

maflcko · 2022-01-11T16:56:54Z

are you proposing to special case this single method on the basis of type?

No, simply make the existing method saturate instead of creating yet another one. See #23841 (review)

If not done here, there will be a follow-up doing it. And I think that we should avoid follow-ups that completely remove code that was added previously. (They are fine when only a subset is followed-up on).

If you don't have time or energy to fix it that is fine, just let us know so that it can be picked up.

ryanofsky · 2022-01-11T17:08:31Z

I think there is some navel-gazing going on here.

It's true, and to make the navel-gazing even worse, it was mostly about the already merged PR #20452, not even this PR!

Anyway posted #24041 as an alternative

maflcko · 2022-01-11T17:15:58Z

The code here is non-trivial to review, based on the bug that was accidentally introduced.

So I don't think it is navel-gazing to consolidate the tricky code to just one function instead of two.

Also, I don't think it is navel-gazing to prefer a +63-17 patch over a two patches (each of which is larger than the single one), first +88-13, then +42-71.

jamesob · 2022-01-11T17:58:27Z

Closing in favor of #24041

JeremyRubin · 2022-01-11T18:35:16Z

if i understand it correctly, i do prefer the approach here as a stop-gap compared to #24041.

What I think makes the most sense is to do both of these (this PR first) and then over the course of a few releases:

Log Warn if the parsing will differ in a future release
Error + Shutdown if the parsing will differ in a future release
Change the parsing.

Perhaps it's more trouble than it's worth to DTRT here, but I think it is, personally, since config files breaking is a big deal.

ryanofsky · 2022-01-11T20:05:11Z

Perhaps it's more trouble than it's worth to DTRT here, but I think it is, personally, since config files breaking is a big deal.

I don't think there are any differences between this PR and #24041 with respect to config files. Config behavior unintentionally changed in #20452, and both this PR and #24041 restore the previous behavior.

JeremyRubin · 2022-01-11T22:23:13Z

ok i was reading off of #24041 (comment) but it seems that is not the case, in which case i'll concept ack #24041

laanwj · 2022-01-12T11:59:22Z

@laanwj I think there is some confusion here: nowhere in the live code do I actually use strtoll; it's just used in tests to model compatibility with the old implementation in previous versions.

@laanwj Are you still NACK on this? Otherwise I think this is rfm

No, I'm not, using it in the tests is fine sorry for misunderstanding it, my brain doesn't work anymore.

b5c9bb5 util: Restore GetIntArg saturating behavior (James O'Beirne) Pull request description: The new locale-independent atoi64 method introduced in #20452 parses large integer values higher than maximum representable value as 0 instead of the maximum value, which breaks backwards compatibility. This commit restores compatibility and adds test coverage for this case in terms of the related GetIntArg and strtoll functions. Specifically, command line or bitcoin.conf integer values greater than `9223372036854775807` (`2**63-1`) used to be parsed as `9223372036854775807` before #20452. Then #20452 caused them to be parsed as `0`. And after this PR they will be parsed as `9223372036854775807` again. This change is a stripped-down alternative version of #23841 by jamesob ACKs for top commit: jamesob: Github ACK b5c9bb5 vincenzopalazzo: ACK b5c9bb5 MarcoFalke: review ACK b5c9bb5 🌘 Tree-SHA512: 4e8abdbabf3cf4713cf5a7c5169539159f359ab4109a4e7e644cc2e9b2b0c3c532fad9f6b772daf015e1c5340ce59280cd9a41f2730afda6099cbf636b7d23ae

maflcko added this to the 23.0 milestone Dec 22, 2021

maflcko added Brainstorming Utils/log/libs labels Dec 22, 2021

jamesob mentioned this pull request Dec 22, 2021

ArgsManager::GetIntArg should error on out-of-bounds value #23843

Closed

jamesob force-pushed the 2021-12-atoi64-incompat branch from 76242dd to 6c7803d Compare December 22, 2021 19:43

DrahtBot mentioned this pull request Dec 23, 2021

refactor: Implement missing error checking for ArgsManager flags #16545

Draft

jamesob force-pushed the 2021-12-atoi64-incompat branch from 6c7803d to c0be750 Compare January 3, 2022 15:29

maflcko reviewed Jan 4, 2022

View reviewed changes

jamesob force-pushed the 2021-12-atoi64-incompat branch from c0be750 to ac1a5b1 Compare January 4, 2022 18:08

maflcko reviewed Jan 5, 2022

View reviewed changes

maflcko approved these changes Jan 5, 2022

View reviewed changes

JeremyRubin reviewed Jan 5, 2022

View reviewed changes

src/util/strencodings.cpp Show resolved Hide resolved

maflcko removed the Brainstorming label Jan 6, 2022

vincenzopalazzo approved these changes Jan 9, 2022

View reviewed changes

vincenzopalazzo mentioned this pull request Jan 9, 2022

[WIP] utils: introduce a runtime error in case of overflow in GetArgInt #24016

Closed

maflcko changed the title ~~New atoi64 behavior is not compatible with legacy behavior~~ Restore atoi64/GetIntArg saturating behavior Jan 10, 2022

shaavan approved these changes Jan 10, 2022

View reviewed changes

ryanofsky approved these changes Jan 10, 2022

View reviewed changes

ryanofsky reviewed Jan 11, 2022

View reviewed changes

maflcko suggested changes Jan 11, 2022

View reviewed changes

ryanofsky mentioned this pull request Jan 11, 2022

util: Restore GetIntArg saturating behavior #24041

Merged

jamesob closed this Jan 11, 2022

bitcoin locked and limited conversation to collaborators Jan 12, 2023

Restore atoi64/GetIntArg saturating behavior #23841

Restore atoi64/GetIntArg saturating behavior #23841

Conversation

jamesob commented Dec 22, 2021 • edited Loading

sipa commented Dec 22, 2021

jamesob commented Dec 22, 2021

maflcko commented Dec 22, 2021

jamesob commented Dec 22, 2021

DrahtBot commented Dec 23, 2021 • edited Loading

Conflicts

maflcko commented Jan 3, 2022

maflcko left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

laanwj commented Jan 5, 2022 • edited Loading

Choose a reason for hiding this comment

jamesob Jan 5, 2022 • edited Loading

Choose a reason for hiding this comment

jamesob commented Jan 5, 2022

maflcko left a comment

Choose a reason for hiding this comment

vincenzopalazzo left a comment

Choose a reason for hiding this comment

maflcko commented Jan 10, 2022

maflcko commented Jan 10, 2022

shaavan left a comment • edited Loading

Choose a reason for hiding this comment

ryanofsky left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maflcko commented Jan 11, 2022

ryanofsky left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maflcko commented Jan 11, 2022

maflcko left a comment

Choose a reason for hiding this comment

jamesob commented Jan 11, 2022 • edited Loading

maflcko commented Jan 11, 2022

ryanofsky commented Jan 11, 2022

maflcko commented Jan 11, 2022

jamesob commented Jan 11, 2022

JeremyRubin commented Jan 11, 2022

ryanofsky commented Jan 11, 2022

JeremyRubin commented Jan 11, 2022

laanwj commented Jan 12, 2022

jamesob commented Dec 22, 2021 •

edited

Loading

DrahtBot commented Dec 23, 2021 •

edited

Loading

laanwj commented Jan 5, 2022 •

edited

Loading

jamesob Jan 5, 2022 •

edited

Loading

shaavan left a comment •

edited

Loading

jamesob commented Jan 11, 2022 •

edited

Loading