Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String conversion from decimal (#108) #122

Merged
merged 51 commits into from
Feb 8, 2023
Merged

Conversation

elee1766
Copy link
Contributor

hi, i was in need of two things

  1. scanner + valuer interfaces for postgresql Numeric
  2. base 10 string -> uint256.Int

i saw that issue #108 was a thing, and so I created a fromBase10 and incomplete SetString(s string, base int) (err error), since it can't do binary (0b prefix)

not really sure where to post this, but i thought a PR would be good at least so you could see the code changes. i don't really know what interface you are targeting, maybe setstring should return a bool to match big.Int?

big.Int's setString has an alloc in it, so I thought i would get rid of it since i saw the library was promising 0 alloc. i did a super stupid and naive implementation for base 10 parsing, since it was something i needed in another project i was working on, and I just needed something that worked. i've been using this fork and it's been working fine (at least for now)

if you are curious, the benchmarks are here. so it's better than std but it's still not great... i'll be working on improving it, will probably require a rethink lol

base16 is the existing hex implementation.

go test -run=String -bench=String

goos: linux
goarch: amd64
pkg: github.com/holiman/uint256
cpu: Intel(R) Core(TM) i9-9900KF CPU @ 3.60GHz
BenchmarkStringBase10BigInt/generic-8             154038              7792 ns/op            1024 B/op         32 allocs/op
BenchmarkStringBase10/generic-8                   587526              2013 ns/op               0 B/op          0 allocs/op
BenchmarkStringBase16/generic-8                 10830475               110.0 ns/op             0 B/op          0 allocs/op
name                          time/op
StringBase10BigInt/generic-8  7.77µs ± 0%
StringBase10/generic-8        2.01µs ± 0%
StringBase16/generic-8         110ns ± 0%

@holiman
Copy link
Owner

holiman commented Nov 26, 2022

Sorry, I missed that this PR existed. It is still marked wip, but I guess it's more or less done?

@elee1766
Copy link
Contributor Author

elee1766 commented Nov 26, 2022

@holiman no problem, thanks for taking the time.

yeah, it's marked as WIP because I am not exactly sure if this is what everyone prefers (in terms of the sql stuff), and performance isn't super optimized.

if you're fine with that for now, I can mark as done and finalize. otherwise let me know if there is anything you would like me to change

Copy link
Owner

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking ok. I think we should also add a fuzzer which compares the SetString in Int against big.Int SetString.

conversion.go Outdated
// Value implements the database/sql/driver Valuer interface.
// It encodes a string, because that is what postgres uses for its numeric type
func (src Int) Value() (driver.Value, error) {
return string(src.ToBig().String()) + "e0", nil
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why string(..) on a string ? And why the concatenation of e0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no clue why i cast a string to string, will fix lol.

The e0 is to ensure that a large number is properly passed and understood as as "numeric/decimal". postgres wasn't happy when i fed it very large base 10 integer strings without the e0. postgres is able to downcast integers with exponential notation that are below size int64 to bigint transparently, so this felt like the most inclusive solution.

https://www.postgresql.org/docs/current/sql-syntax-lexical.html section 4.1.2.6. Numeric Constants

that said, I don't use any SQL-style DB's that aren't postgresql compatible. Not sure how MariaDB/Spanner will treat this

conversion.go Show resolved Hide resolved
base10_test.go Outdated Show resolved Hide resolved
@elee1766
Copy link
Contributor Author

elee1766 commented Nov 28, 2022

@holiman

i've added a fuzz test for base10 using go's built in fuzzer. It runs when go test runs.

It checks that

  1. fail when SetString fails
  2. fail when number is negative
  3. fail when n umber is too large
  4. succeeds otherwise
  5. is equal in value to the big.Int when ToInt().Cmp is used
  6. the base 10 strings 10 match
  7. the valuer can Value() when the base10 string is valid
  8. the Value() is equal to the base 10 string with an "e0" suffix

I've also changed SetString to closer match the big.Int interface, and changed the errors returned by FromBase10

let me know if you want me to change / add / fix anything!

conversion.go Outdated Show resolved Hide resolved
conversion.go Outdated
Comment on lines 116 to 118
if b == nil {
return true
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to get a verdict, I checked what big does:

func TestFoobar(t *testing.T) {
	new(big.Int).Set(nil)
}

==>

--- FAIL: TestFoobar (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference

So, this is just defensive programming, let's remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool. just removed.

@holiman
Copy link
Owner

holiman commented Dec 15, 2022

Sorry for being so slow on this, and thanks for keeping it up!

Regarding the API, with this PR, it becomes

  • Public "convenience" constructors
    • FromBig(*big.Int)
    • FromHex(string)
  • Instance setters:
    • (z *Int) FromBase10(string)
    • (z *Int) SetFromBig(*big.Int)
    • (z *Int) SetBytes([]byte)
    • (z *Int) SetString(string, int)

And a few more. The odd one out is FromBase10, being an instance-method. I think it would be more consistent to have

  • Public "convenience" constructors
    • FromBig(*big.Int)
    • FromHex(string)
    • FromBase10(string)
  • Instance setters:
    • (z *Int) SetFromBase10(string) (possibly)
    • (z *Int) SetFromBig(*big.Int)
    • (z *Int) SetBytes([]byte)
    • (z *Int) SetString(string, int)

Also, isn't dec better than base10 ? As in, SetFromDecimal instead of SetFromBase10 ?

@holiman
Copy link
Owner

holiman commented Dec 15, 2022

I pushed a minor change to make the API more coherent

base10.go Outdated
Comment on lines 86 to 88
iv := 19
c := 0
if len(bs) >= (iv * 4) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a bit of description what happens inside this method -- what algorithm you're following?
Doesn't have to be a lot, but a brief description.
Also, what does iv mean? I associate the term with cryptographic input vector, but this is probably something else?
Also, is the unrolled loop really worth it? If there's no tangible gain, I'd prefer to have it in a shorter form

Copy link

@gfxlabs gfxlabs Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol. i think iv originally meant "index value" or something stupid like that.

I've cleaned up the function and added some comments - let me know if it's still unclear

base10_test.go Outdated
}
// if its too large, ignore it also
if val.Cmp(max256.ToBig()) > 0 {
return
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If >256, we should have a non-nil err. Should check that

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added this

Copy link
Owner

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good, I might do some finishing touches before merge

base10.go Outdated Show resolved Hide resolved
base10.go Outdated Show resolved Hide resolved
base10.go Outdated
// however, the last number will always be below 19 characters, so i=0 is dealt with as special case
for i := 4; i >= 1; i-- {
// check if the length of the string is larger than cutLength * i
if len(bs) >= (cutLength * i) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverse the check, so you can unindent the clause

if len(bs) < (cutLength * i){
    continue
}

Although, I wonder if it wouldn't make sense to start a better index. Something like this (might be off-by-one):

for i := len(bs) / 19; ...

Also, cutLength is a constant. It's ok to have it as a var, but I don't know, we could just also inline 19. I don't know which I prefer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch on the loop, my bad! switched it to use continue and start at len(bs)/cutLength

I don't think it's off by one since if it's below 19, we would want it to be 0, and ignore the loop altogether, while otherwise, we are looking for the floor. if it's 19, it will perfectly match, and the stragglers case will get short circuited

cutLength is only used in this function - to me it makes more sense to leave it inside the function, unless other things start reusing that number 19.

elee1766 and others added 2 commits December 15, 2022 12:39
Co-authored-by: Martin Holst Swende <martin@swende.se>
Co-authored-by: Martin Holst Swende <martin@swende.se>
@elee1766
Copy link
Contributor Author

i was thinking about this a little bit.

@holiman what do you feel about parsing strings with "-" prefix as the two's complement? (so remove the -, parse the number, then Neg it)

it would be convenient, and i don't feel it is out of scope for the package. (it already supports sdiv, smod, neg, abs, etc)

of course, there is no way to tell if a string should be marshaled into negative or not, so i think it should always marshal to string as positive, but being able to parse negative string input doesn't seem like a super awful idea.

@holiman
Copy link
Owner

holiman commented Dec 29, 2022

@holiman what do you feel about parsing strings with "-" prefix as the two's complement? (so remove the -, parse the number, then Neg it)

I'm not keen on it. uint256 is "natively" unsigned. Yes, there are some ways to work with "negative" numbers, but those are clearly "special cases". I don't think we should parse negative numbers, it just adds complexity.

Also, I think encoders/decoders should come in pairs, and as you say, "there is no way to tell if a string should be marshaled into negative or not, so i think it should always marshal to string as positive" -- I take that as a sign that it's not a good fit.

Also, I saw that big.Int also implements a Scanner, namely the fmt.Scanner interface:

// Scanner is implemented by any value that has a Scan method, which scans
// the input for the representation of a value and stores the result in the
// receiver, which must be a pointer to be useful. The Scan method is called
// for any argument to Scan, Scanf, or Scanln that implements it.
type Scanner interface {
	Scan(state ScanState, verb rune) error
}

So if we merge this PR, then we can never implement that interface, because they are mutually exclusive.... :/ But I think that's ok, just wanted to highlight it

@holiman holiman changed the title [wip] SetString (#108) String conversion from decimal (#108) Dec 29, 2022
@holiman
Copy link
Owner

holiman commented Dec 29, 2022

The coverage is now back at 100%, and I changed the circle-ci fuzzer to use golang native fuzzing. TODO there is to make our old fuzzer (fuzzTernaryData etc) compatible with the native format, so it's run on circle.

And then to make them all work with oss-fuzz, and we can scrap the dvyakov fuzzers. But that's not for this PR.

@chfast want to take a look before we merge this?

@holiman
Copy link
Owner

holiman commented Dec 29, 2022

@elee1766
Copy link
Contributor Author

@holiman what do you feel about parsing strings with "-" prefix as the two's complement? (so remove the -, parse the number, then Neg it)

I'm not keen on it. uint256 is "natively" unsigned. Yes, there are some ways to work with "negative" numbers, but those are clearly "special cases". I don't think we should parse negative numbers, it just adds complexity.

Also, I think encoders/decoders should come in pairs, and as you say, "there is no way to tell if a string should be marshaled into negative or not, so i think it should always marshal to string as positive" -- I take that as a sign that it's not a good fit.

good points - totally agree. I think where negatives are needed - application level can deal with it. (i went through our codebase, it was only 3 places of, so not a big deal)

Comment on lines +577 to +579
if splt[1] == "0" {
return nil
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1e0 should be 1, not 0 ... ?
Not sure what 0e0 is defined as, need to check that.

Copy link
Contributor Author

@elee1766 elee1766 Jan 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't that what this will return? assume X = splt[0], Y = splt[1]

I think XeY woudl be X * 10 ^ Y so 0e0 should be 0 * 10^0 = 0

In the previous lines, i do dst.SetFromDecimal(X)

then if Y == 0, then i should just return, since X*10^0=X

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, yes you're right

conversion.go Outdated
Comment on lines 585 to 586
exp.Exp(NewInt(10), exp)
dst.Mul(dst, exp)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Exp we have no overflow checks,

// Exp sets z = base**exponent mod 2**256, and returns z.

For Mul, we do have MulOverflow, so we could check that and return error if it does not fit. But it feels a bit wonky to just silently fail the conversion (because of the Exp) without signalling an error.

IMO, it would have been ok if the failure-mode had been:

  • The result is X mod 2^256, where X is the correct result.

But I think the possible failures are different .. (?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you are right here, they will be different.

i think the popular postgresql scanner will error anyways if the value is too large to be scanned in, so I think that it is fine to do the same.

@holiman
Copy link
Owner

holiman commented Feb 8, 2023

I am not really fond of the whole scientific notation-parsing in Scan, but OTOH it's isolated into a new method, and most callers won't use it (unless they strictly need it for database reasons). So I can live with that there.

@holiman holiman merged commit c6c6f8d into holiman:master Feb 8, 2023
@holiman
Copy link
Owner

holiman commented Feb 8, 2023

Thanks for the hard work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants