Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore atoi64/GetIntArg saturating behavior #23841

Closed
wants to merge 1 commit into from

Conversation

jamesob
Copy link
Contributor

@jamesob jamesob commented Dec 22, 2021

The new locale-independent atoi64 method introduced in #20452 behaves
differently for values passed which are greater than the uint64_t max.
This commit is proof of that (the attached test fails), and is meant to spur discussion
on how to handle such an incompatibility.

The change as committed in #20542 may break some end-usages of
bitcoind, but more than that may have consensus implications as this
deserialization mechanism is used in CScript::ParseScript. I think this
means it's possible that #20542 could have been an accidental soft fork?

Edit: CScript::ParseScript isn't consensus-critical, but is used in e.g. script descriptor parsing.

@sipa
Copy link
Member

sipa commented Dec 22, 2021

Nice catch.

CScript::ParseScript shouldn't be consensus-critical though.

@jamesob
Copy link
Contributor Author

jamesob commented Dec 22, 2021

Some discussion from IRC:

12:21 <MarcoFalke> jamesob: Integer overflow inside atoi is UB or at least unspecified behavior, last time I checked
12:22 <MarcoFalke> That is: Parsing 999999999999 can give you any value (it doesn't have to be 0 or max or 999999999999 mod max)
12:23 <MarcoFalke> jamesob: It is also not used in consensus code, so this is not a consensus change
12:24 <jamesob> this is not consensus? https://github.com/bitcoin/bitcoin/pull/20452/files#diff-1db27ed1bfbf61ea0fe64447413ef9f24238be710e7ca4ae9c7bc7a5c994eca0L72-R72
12:26 <jamesob> so here's the tricky thing: our atoi64 wasn't actually using atoi, but strtoll - for which oveflow behavior is well defined: "The strtol() function returns the result of the conversion, unless the value would underflow or overflow. If an underflow occurs, strtol() returns LONG_MIN. If an overflow occurs, strtol() returns LONG_MAX."
12:26 <jamesob> ("The strtoll() function works just like the strtol() function but returns a long long integer value.")
12:28 <MarcoFalke> Yeah, not consensus. Could be used by bitcoin-util or some RPC function.
12:29 <jamesob> Phew

@maflcko
Copy link
Member

maflcko commented Dec 22, 2021

Indeed good catch, I wasn't aware it is safe and saturating to produce an overflow. Still, an explicit error would be better than silent saturation. See also https://github.com/bitcoin/bitcoin/pull/17385/files#diff-19427b0dd1a791adc728c82e88f267751ba4f1c751e19262cac03cccd2822216R509

@jamesob
Copy link
Contributor Author

jamesob commented Dec 22, 2021

I've pushed a change that restores compatibility by introducing a new function, LocaleIndependentAtoi64. This is specifically not a template specialization of its more generic cousin because the under-/overflow behavior differs (as was the case previously with our atoi64 implementation). A compile error is rendered if someone attempts to use the generic version with int64_t types.

I've also added tests for GetIntArg that assert the saturation behavior.

When this is merged I'll make a follow-up issue to surface InitErrors instead of saturating, which would be a good first issue I think.

@DrahtBot
Copy link
Contributor

DrahtBot commented Dec 23, 2021

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #24041 (Restore atoi compatibility with old versions of Bitcoin Core by ryanofsky)
  • #17783 (util: Fix -norpcwhitelist, -norpcallowip, and similar corner case behavior by ryanofsky)
  • #17581 (refactor: Remove settings merge reverse precedence code by ryanofsky)
  • #17580 (refactor: Add ALLOW_LIST flags and enforce usage in CheckArgFlags by ryanofsky)
  • #17493 (util: Forbid ambiguous multiple assignments in config file by ryanofsky)
  • #16545 (refactor: Implement missing error checking for ArgsManager flags by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@maflcko
Copy link
Member

maflcko commented Jan 3, 2022

If you want this merged, please make sure that each commit compiles and passes all tests. Otherwise it will break git bisect.

@jamesob jamesob force-pushed the 2021-12-atoi64-incompat branch from 6c7803d to c0be750 Compare January 3, 2022 15:29
Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can squash the remaining test commit into the feature commit. No need for a fresh commit.

Also, left a question.

(Note to myself: commit 5b78055 was removed in the latest rebase)

ResetArgs("-foo -bar");
BOOST_CHECK_EQUAL(m_local_args.GetIntArg("-foo", 11), 0);
BOOST_CHECK_EQUAL(m_local_args.GetIntArg("-bar", 11), 0);
BOOST_CHECK_EQUAL(m_local_args.GetIntArg("-foo", 0), 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different from the line preceding it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, this looks like a mistaken edit.

The new locale-independent atoi64 method introduced in bitcoin#20452 behaves
differently for values passed which are greater than the uint64_t max.
This commit is proof of that, meant to spur discussion on how to handle
such an incompatibility.

Introduce LocaleIndependentAtoi64 which behaves the same way that
previous versions of Bitcoin Core has when faced with under- and
overflow.

This behavior was implicitly changed in bitcoin#20452, but has not yet
been included in a release.

Attempts to use LocaleIndependentAtoi for int64_t return values
will result in a compilation error.
@jamesob jamesob force-pushed the 2021-12-atoi64-incompat branch from c0be750 to ac1a5b1 Compare January 4, 2022 18:08
@laanwj
Copy link
Member

laanwj commented Jan 5, 2022

Good catch. But concept NACK on reintroducing strtoll usage. I would prefer implementing the desired compatible behavior in an explicit way using new functions, instead of delegating it to the C library's potentially locale-dependent function.
(or maybe even better: explicit parse error feedback in case of out-of-range)

@@ -144,6 +145,11 @@ BOOST_AUTO_TEST_CASE(intarg)
BOOST_CHECK_EQUAL(m_local_args.GetIntArg("-foo", 11), 0);
BOOST_CHECK_EQUAL(m_local_args.GetIntArg("-bar", 11), 0);

// Check under-/overflow behavior.
ResetArgs("-foo=-9223372036854775809 -bar=9223372036854775808");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not overflowing. It is equal to the min/max?

Copy link
Contributor Author

@jamesob jamesob Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10:29:37 james@slug james/tmp % cat int64max.cpp
#include <limits>
#include <iostream>
#include <cinttypes>

int main() {
    std::cout << std::numeric_limits<int64_t>::max() << '\n';
    std::cout << std::numeric_limits<int64_t>::min() << '\n';
}
10:29:42 james@slug james/tmp % ./a.out
9223372036854775807
-9223372036854775808

@jamesob
Copy link
Contributor Author

jamesob commented Jan 5, 2022

@laanwj I think there is some confusion here: nowhere in the live code do I actually use strtoll; it's just used in tests to model compatibility with the old implementation in previous versions.

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

Copy link

@vincenzopalazzo vincenzopalazzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK ac1a5b1

@maflcko maflcko changed the title New atoi64 behavior is not compatible with legacy behavior Restore atoi64/GetIntArg saturating behavior Jan 10, 2022
@maflcko
Copy link
Member

maflcko commented Jan 10, 2022

Edited title

@maflcko
Copy link
Member

maflcko commented Jan 10, 2022

@laanwj Are you still NACK on this? Otherwise I think this is rfm

Copy link
Contributor

@shaavan shaavan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review ACK ac1a5b1

According to Microsoft docs:

In the case of overflow with large positive integral values, _atoi64 returns I64_MAX and I64_MIN in the case of overflow with large negative, integral values.

However, this was not true for the LocaleIndependentAtoi<int64_t> function, which intended to replicate _atoi64 behavior. This function returns 0 both in case of underflow and overflow.

This PR fixes it by introducing a slightly modified LocaleIndependentAtoi function specifically for int64_t which correctly replicates _atoi64 behavior of return INT64_MAX in case of overflow, and INT64_MIN in case of the underflow.

The added tests make sense and work perfectly, which, as expected, failed on the Master branch.
Thanks for catching this and preventing a potentially major future issue.

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ~0. I don't understand why this PR is forking the parse function and only removing the overflow footgun specifically in the int64_t case and letting it remain in other cases, instead of parsing sanely in all cases. I also think the commit descriptions and function descriptions are handwavy and confusing. They should just say what actual behaviors are instead of gesturing vaguely at other functions and PRs without saying what the differences are.

Code review ACK ac1a5b1, though. This does seem like the smallest possible change that will fix a few int64 parses, and it has good test coverage.

Comment on lines +122 to +133
* LocaleIndependentAtoi64 is provided for backwards compatibility reasons.
* Since its behavior differs from standard atoi functionality, it is not a
* template-specialization of the above function.
*
* New code should use ToIntegral or the ParseInt* functions which provide
* parse error feedback.
*
* This aims to replicate the exact behaviour of atoi64 in previous versions of
* Bitcoin Core. The old atoi64 utility function actually made use of `strtoll`
* in its implementation (or Microsoft's `_atoi64`), which saturates over- and
* underflows. That behavior is reproduced here.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "Restore atoi64 compatibility with old versions of Bitcoin Core" (ac1a5b1)

I find this description hard to follow. What is the point of referencing all these other functions and vaguely handwaving about what this function does of simply saying "Parse a decimal integer string. On positive overflow, return the maximum integer value, on negative overflow return the minimum integer value, and on error, return 0."?

template <typename T>
// of atoi as it behaves under the "C" locale.
//
// Prevent use of LocaleIndependentAtoi for int64_t, since the function below
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "Restore atoi64 compatibility with old versions of Bitcoin Core" (ac1a5b1)

I don't understand why this function exists at all. What use could there be for a parsing function that returns 0 in the case of overflow? It just seems like a footgun. Returning the biggest possible value when a bigger value is given, and the smallest possible value when a smaller value is given seems like a more useful and less dangerous way to parse values.

I guess it precedes this PR, but the naming also seems strange. Why call this one util function locale independent, when basically every util function we have should be locale independent? If we had use for a parsing function that did vary by locale it would make more sense to draw attention and mention locales in that function's name, not this function's name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that the name is long and ugly. It makes developers less likely to pick it, when there are better named and safer alternatives available that come with error reporting instead of implicitly mapping errors to random (in-range) return values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #23841 (comment)

I like that the name is long and ugly. It makes developers less likely to pick it, when there are better named and safer alternatives available that come with error reporting instead of implicitly mapping errors to random (in-range) return values.

It's just seems a weird way to make the function name long by adding a random fact about locale independence. You could call it FrogsAreGreenAtoi and that would be better name than LocaleIndependentAtoi because it wouldn't falsely suggest that other functions depend on the current locale and this function doesn't. I think an ideal name would be something more like ParseIntUnchecked or AtoiIgnoreErrors

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FrogsAreGreenAtoi

Ryan, not all frogs are green. I'd suggest to use the name MyFrogIsGreenAtoi or ParseSaturatedInt.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to MyFrogIsGreenAtoi

@@ -115,6 +118,21 @@ T LocaleIndependentAtoi(const std::string& str)
return result;
}

/**
* LocaleIndependentAtoi64 is provided for backwards compatibility reasons.
* Since its behavior differs from standard atoi functionality, it is not a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "Restore atoi64 compatibility with old versions of Bitcoin Core" (ac1a5b1)

I can't figure how this function is not a valid implementation of the standard function https://en.cppreference.com/w/cpp/string/byte/atoi. If its behavior is actually nonstandard, it would be good to say how.

@maflcko
Copy link
Member

maflcko commented Jan 11, 2022

I don't understand why this PR is forking the parse function and only removing the overflow footgun specifically in the int64_t case

In the int case it previously used atoi, which runs into UB (not saturation). See #23841 (comment) . So I presume the goal was to make minimal changes?

Though, given that it is UB I can also see the argument to prefer less code and turn the UB into saturation. (In master it was turned to return 0 IIRC).

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #23841 (comment)

Though, given that it is UB I can also see the argument to prefer less code and turn the UB into saturation. (In master it was turned to return 0 IIRC).

Yes, that's what I'm trying to figure out. I don't understand why you would return 0 when the provided value is bigger than the maximum value, instead of returning the maximum value. Also why you would want two error-ignoring parsing functions with different and not clearly documented behavior instead of one error-ignoring parsing function with clearly documented behavior? And is there actually something different about the int64_t type that it needs to be treated differently than all the other integer types?

Anyway, I acked this PR since it seems OK to merge, but I think it would be nicer if the two functions were consolidated, given nicer behavior and documentation, and probably renamed.

template <typename T>
// of atoi as it behaves under the "C" locale.
//
// Prevent use of LocaleIndependentAtoi for int64_t, since the function below
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #23841 (comment)

I like that the name is long and ugly. It makes developers less likely to pick it, when there are better named and safer alternatives available that come with error reporting instead of implicitly mapping errors to random (in-range) return values.

It's just seems a weird way to make the function name long by adding a random fact about locale independence. You could call it FrogsAreGreenAtoi and that would be better name than LocaleIndependentAtoi because it wouldn't falsely suggest that other functions depend on the current locale and this function doesn't. I think an ideal name would be something more like ParseIntUnchecked or AtoiIgnoreErrors

@maflcko
Copy link
Member

maflcko commented Jan 11, 2022

is there actually something different about the int64_t type that it needs to be treated differently than all the other integer types?

No, not that I am aware of.

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Withdraw my ACK. Seems more minimalistic to have one method instead of two.

@jamesob
Copy link
Contributor Author

jamesob commented Jan 11, 2022

I think there is some navel-gazing going on here. We are in a situation where a change has inadvertently broken compatibility with a prior release (albeit in a minor way). This change proposes to remedy that and is well-tested.

If there are style concerns, I think they can be done in a follow-up. Otherwise I have other things to do and someone else can fix this in a way that reviewers find preferable.

Edit:

Seems more minimalistic to have one method instead of two.

The prior behavior of Bitcoin was to saturate only for int64s - are you proposing to special case this single method on the basis of type? I think having a single method that either saturates or doesn't on the basis of type is very misleading.

@maflcko
Copy link
Member

maflcko commented Jan 11, 2022

are you proposing to special case this single method on the basis of type?

No, simply make the existing method saturate instead of creating yet another one. See #23841 (review)

If not done here, there will be a follow-up doing it. And I think that we should avoid follow-ups that completely remove code that was added previously. (They are fine when only a subset is followed-up on).

If you don't have time or energy to fix it that is fine, just let us know so that it can be picked up.

@ryanofsky
Copy link
Contributor

I think there is some navel-gazing going on here.

It's true, and to make the navel-gazing even worse, it was mostly about the already merged PR #20452, not even this PR!

Anyway posted #24041 as an alternative

@maflcko
Copy link
Member

maflcko commented Jan 11, 2022

The code here is non-trivial to review, based on the bug that was accidentally introduced.

So I don't think it is navel-gazing to consolidate the tricky code to just one function instead of two.

Also, I don't think it is navel-gazing to prefer a +63-17 patch over a two patches (each of which is larger than the single one), first +88-13, then +42-71.

@jamesob
Copy link
Contributor Author

jamesob commented Jan 11, 2022

Closing in favor of #24041

@jamesob jamesob closed this Jan 11, 2022
@JeremyRubin
Copy link
Contributor

if i understand it correctly, i do prefer the approach here as a stop-gap compared to #24041.

What I think makes the most sense is to do both of these (this PR first) and then over the course of a few releases:

  1. Log Warn if the parsing will differ in a future release
  2. Error + Shutdown if the parsing will differ in a future release
  3. Change the parsing.

Perhaps it's more trouble than it's worth to DTRT here, but I think it is, personally, since config files breaking is a big deal.

@ryanofsky
Copy link
Contributor

Perhaps it's more trouble than it's worth to DTRT here, but I think it is, personally, since config files breaking is a big deal.

I don't think there are any differences between this PR and #24041 with respect to config files. Config behavior unintentionally changed in #20452, and both this PR and #24041 restore the previous behavior.

@JeremyRubin
Copy link
Contributor

ok i was reading off of #24041 (comment) but it seems that is not the case, in which case i'll concept ack #24041

@laanwj
Copy link
Member

laanwj commented Jan 12, 2022

@laanwj I think there is some confusion here: nowhere in the live code do I actually use strtoll; it's just used in tests to model compatibility with the old implementation in previous versions.

@laanwj Are you still NACK on this? Otherwise I think this is rfm

No, I'm not, using it in the tests is fine sorry for misunderstanding it, my brain doesn't work anymore.

maflcko pushed a commit that referenced this pull request Jan 12, 2022
b5c9bb5 util: Restore GetIntArg saturating behavior (James O'Beirne)

Pull request description:

  The new locale-independent atoi64 method introduced in #20452 parses large integer values higher than maximum representable value as 0 instead of the maximum value, which breaks backwards compatibility. This commit restores compatibility and adds test coverage for this case in terms of the related GetIntArg and strtoll functions.

  Specifically, command line or bitcoin.conf integer values greater than `9223372036854775807` (`2**63-1`) used to be parsed as `9223372036854775807` before #20452. Then #20452 caused them to be parsed as `0`. And after this PR they will be parsed as `9223372036854775807` again.

  This change is a stripped-down alternative version of #23841 by jamesob

ACKs for top commit:
  jamesob:
    Github ACK b5c9bb5
  vincenzopalazzo:
    ACK b5c9bb5
  MarcoFalke:
    review ACK b5c9bb5 🌘

Tree-SHA512: 4e8abdbabf3cf4713cf5a7c5169539159f359ab4109a4e7e644cc2e9b2b0c3c532fad9f6b772daf015e1c5340ce59280cd9a41f2730afda6099cbf636b7d23ae
@bitcoin bitcoin locked and limited conversation to collaborators Jan 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants