String conversion from decimal (#108) #122

elee1766 · 2022-09-25T12:08:04Z

hi, i was in need of two things

scanner + valuer interfaces for postgresql Numeric
base 10 string -> uint256.Int

i saw that issue #108 was a thing, and so I created a fromBase10 and incomplete SetString(s string, base int) (err error), since it can't do binary (0b prefix)

not really sure where to post this, but i thought a PR would be good at least so you could see the code changes. i don't really know what interface you are targeting, maybe setstring should return a bool to match big.Int?

big.Int's setString has an alloc in it, so I thought i would get rid of it since i saw the library was promising 0 alloc. i did a super stupid and naive implementation for base 10 parsing, since it was something i needed in another project i was working on, and I just needed something that worked. i've been using this fork and it's been working fine (at least for now)

if you are curious, the benchmarks are here. so it's better than std but it's still not great... i'll be working on improving it, will probably require a rethink lol

base16 is the existing hex implementation.

go test -run=String -bench=String

goos: linux
goarch: amd64
pkg: github.com/holiman/uint256
cpu: Intel(R) Core(TM) i9-9900KF CPU @ 3.60GHz
BenchmarkStringBase10BigInt/generic-8             154038              7792 ns/op            1024 B/op         32 allocs/op
BenchmarkStringBase10/generic-8                   587526              2013 ns/op               0 B/op          0 allocs/op
BenchmarkStringBase16/generic-8                 10830475               110.0 ns/op             0 B/op          0 allocs/op

name                          time/op
StringBase10BigInt/generic-8  7.77µs ± 0%
StringBase10/generic-8        2.01µs ± 0%
StringBase16/generic-8         110ns ± 0%

holiman · 2022-11-26T12:12:49Z

Sorry, I missed that this PR existed. It is still marked wip, but I guess it's more or less done?

elee1766 · 2022-11-26T12:22:12Z

@holiman no problem, thanks for taking the time.

yeah, it's marked as WIP because I am not exactly sure if this is what everyone prefers (in terms of the sql stuff), and performance isn't super optimized.

if you're fine with that for now, I can mark as done and finalize. otherwise let me know if there is anything you would like me to change

holiman

Looking ok. I think we should also add a fuzzer which compares the SetString in Int against big.Int SetString.

holiman · 2022-11-26T12:18:49Z

conversion.go

+// Value implements the database/sql/driver Valuer interface.
+// It encodes a string, because that is what postgres uses for its numeric type
+func (src Int) Value() (driver.Value, error) {
+	return string(src.ToBig().String()) + "e0", nil


Why string(..) on a string ? And why the concatenation of e0?

no clue why i cast a string to string, will fix lol.

The e0 is to ensure that a large number is properly passed and understood as as "numeric/decimal". postgres wasn't happy when i fed it very large base 10 integer strings without the e0. postgres is able to downcast integers with exponential notation that are below size int64 to bigint transparently, so this felt like the most inclusive solution.

https://www.postgresql.org/docs/current/sql-syntax-lexical.html section 4.1.2.6. Numeric Constants

that said, I don't use any SQL-style DB's that aren't postgresql compatible. Not sure how MariaDB/Spanner will treat this

conversion.go

base10_test.go

elee1766 · 2022-11-28T01:53:26Z

@holiman

i've added a fuzz test for base10 using go's built in fuzzer. It runs when go test runs.

It checks that

fail when SetString fails
fail when number is negative
fail when n umber is too large
succeeds otherwise
is equal in value to the big.Int when ToInt().Cmp is used
the base 10 strings 10 match
the valuer can Value() when the base10 string is valid
the Value() is equal to the base 10 string with an "e0" suffix

I've also changed SetString to closer match the big.Int interface, and changed the errors returned by FromBase10

let me know if you want me to change / add / fix anything!

conversion.go

holiman · 2022-12-06T11:33:27Z

conversion.go

+	if b == nil {
+		return true
+	}


In order to get a verdict, I checked what big does:

func TestFoobar(t *testing.T) { new(big.Int).Set(nil) }

==>

--- FAIL: TestFoobar (0.00s) panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference

So, this is just defensive programming, let's remove it.

cool. just removed.

fix scan

Update conversion.go

holiman · 2022-12-15T08:20:38Z

Sorry for being so slow on this, and thanks for keeping it up!

Regarding the API, with this PR, it becomes

Public "convenience" constructors
- FromBig(*big.Int)
- FromHex(string)
Instance setters:
- (z *Int) FromBase10(string)
- (z *Int) SetFromBig(*big.Int)
- (z *Int) SetBytes([]byte)
- (z *Int) SetString(string, int)

And a few more. The odd one out is FromBase10, being an instance-method. I think it would be more consistent to have

Public "convenience" constructors
- FromBig(*big.Int)
- FromHex(string)
- FromBase10(string)
Instance setters:
- (z *Int) SetFromBase10(string) (possibly)
- (z *Int) SetFromBig(*big.Int)
- (z *Int) SetBytes([]byte)
- (z *Int) SetString(string, int)

Also, isn't dec better than base10 ? As in, SetFromDecimal instead of SetFromBase10 ?

holiman · 2022-12-15T08:34:48Z

I pushed a minor change to make the API more coherent

holiman · 2022-12-15T08:37:43Z

base10.go

+	iv := 19
+	c := 0
+	if len(bs) >= (iv * 4) {


Could you add a bit of description what happens inside this method -- what algorithm you're following?
Doesn't have to be a lot, but a brief description.
Also, what does iv mean? I associate the term with cryptographic input vector, but this is probably something else?
Also, is the unrolled loop really worth it? If there's no tangible gain, I'd prefer to have it in a shorter form

lol. i think iv originally meant "index value" or something stupid like that.

I've cleaned up the function and added some comments - let me know if it's still unclear

holiman · 2022-12-15T08:40:11Z

base10_test.go

+		}
+		// if its too large, ignore it also
+		if val.Cmp(max256.ToBig()) > 0 {
+			return


If >256, we should have a non-nil err. Should check that

holiman

Looking pretty good, I might do some finishing touches before merge

base10.go

holiman · 2022-12-15T18:03:05Z

base10.go

+	// however, the last number will always be below 19 characters, so i=0 is dealt with as special case
+	for i := 4; i >= 1; i-- {
+		// check if the length of the string is larger than cutLength * i
+		if len(bs) >= (cutLength * i) {


Reverse the check, so you can unindent the clause

if len(bs) < (cutLength * i){ continue }

Although, I wonder if it wouldn't make sense to start a better index. Something like this (might be off-by-one):

for i := len(bs) / 19; ...

Also, cutLength is a constant. It's ok to have it as a var, but I don't know, we could just also inline 19. I don't know which I prefer.

good catch on the loop, my bad! switched it to use continue and start at len(bs)/cutLength

I don't think it's off by one since if it's below 19, we would want it to be 0, and ignore the loop altogether, while otherwise, we are looking for the floor. if it's 19, it will perfectly match, and the stragglers case will get short circuited

cutLength is only used in this function - to me it makes more sense to leave it inside the function, unless other things start reusing that number 19.

Co-authored-by: Martin Holst Swende <martin@swende.se>

elee1766 · 2022-12-29T03:35:16Z

i was thinking about this a little bit.

@holiman what do you feel about parsing strings with "-" prefix as the two's complement? (so remove the -, parse the number, then Neg it)

it would be convenient, and i don't feel it is out of scope for the package. (it already supports sdiv, smod, neg, abs, etc)

of course, there is no way to tell if a string should be marshaled into negative or not, so i think it should always marshal to string as positive, but being able to parse negative string input doesn't seem like a super awful idea.

holiman · 2022-12-29T07:56:04Z

@holiman what do you feel about parsing strings with "-" prefix as the two's complement? (so remove the -, parse the number, then Neg it)

I'm not keen on it. uint256 is "natively" unsigned. Yes, there are some ways to work with "negative" numbers, but those are clearly "special cases". I don't think we should parse negative numbers, it just adds complexity.

Also, I think encoders/decoders should come in pairs, and as you say, "there is no way to tell if a string should be marshaled into negative or not, so i think it should always marshal to string as positive" -- I take that as a sign that it's not a good fit.

Also, I saw that big.Int also implements a Scanner, namely the fmt.Scanner interface:

// Scanner is implemented by any value that has a Scan method, which scans
// the input for the representation of a value and stores the result in the
// receiver, which must be a pointer to be useful. The Scan method is called
// for any argument to Scan, Scanf, or Scanln that implements it.
type Scanner interface {
	Scan(state ScanState, verb rune) error
}

So if we merge this PR, then we can never implement that interface, because they are mutually exclusive.... :/ But I think that's ok, just wanted to highlight it

holiman · 2022-12-29T08:37:44Z

The coverage is now back at 100%, and I changed the circle-ci fuzzer to use golang native fuzzing. TODO there is to make our old fuzzer (fuzzTernaryData etc) compatible with the native format, so it's run on circle.

And then to make them all work with oss-fuzz, and we can scrap the dvyakov fuzzers. But that's not for this PR.

@chfast want to take a look before we merge this?

holiman · 2022-12-29T08:53:56Z

@chfast also, I've done something wrong re the fuzz corpus cache, https://app.circleci.com/pipelines/github/holiman/uint256/399/workflows/050524d7-5ac0-4342-8764-e88ef5e73dce/jobs/1196?invite=true#step-109-1 Edit: nevermind, fixed

elee1766 · 2022-12-30T03:33:27Z

@holiman what do you feel about parsing strings with "-" prefix as the two's complement? (so remove the -, parse the number, then Neg it)

I'm not keen on it. uint256 is "natively" unsigned. Yes, there are some ways to work with "negative" numbers, but those are clearly "special cases". I don't think we should parse negative numbers, it just adds complexity.

Also, I think encoders/decoders should come in pairs, and as you say, "there is no way to tell if a string should be marshaled into negative or not, so i think it should always marshal to string as positive" -- I take that as a sign that it's not a good fit.

good points - totally agree. I think where negatives are needed - application level can deal with it. (i went through our codebase, it was only 3 places of, so not a big deal)

holiman · 2023-01-20T21:16:32Z

conversion.go

+		if splt[1] == "0" {
+			return nil
+		}


1e0 should be 1, not 0 ... ?
Not sure what 0e0 is defined as, need to check that.

isn't that what this will return? assume X = splt[0], Y = splt[1]

I think XeY woudl be X * 10 ^ Y so 0e0 should be 0 * 10^0 = 0

In the previous lines, i do dst.SetFromDecimal(X)

then if Y == 0, then i should just return, since X*10^0=X

My bad, yes you're right

holiman · 2023-01-26T12:42:08Z

conversion.go

+		exp.Exp(NewInt(10), exp)
+		dst.Mul(dst, exp)


For Exp we have no overflow checks,

// Exp sets z = base**exponent mod 2**256, and returns z.

For Mul, we do have MulOverflow, so we could check that and return error if it does not fit. But it feels a bit wonky to just silently fail the conversion (because of the Exp) without signalling an error.

IMO, it would have been ok if the failure-mode had been:

The result is X mod 2^256, where X is the correct result.

But I think the possible failures are different .. (?)

i think you are right here, they will be different.

i think the popular postgresql scanner will error anyways if the value is too large to be scanned in, so I think that it is fine to do the same.

holiman · 2023-02-08T12:58:50Z

I am not really fond of the whole scientific notation-parsing in Scan, but OTOH it's isolated into a new method, and most callers won't use it (unless they strictly need it for database reasons). So I can live with that there.

holiman · 2023-02-08T13:44:22Z

Thanks for the hard work!

a added 5 commits September 25, 2022 06:32

relaxed marshaling, slow base 10 parsing

bb5f382

update version

fb12295

add control to benchmark

6866965

make the ci thing less angry

55f7ffc

make the ci even less angy

5689315

holiman reviewed Nov 26, 2022

View reviewed changes

add fuzz tests

3c7131f

elee1766 added 2 commits November 27, 2022 19:56

change map

5a106b0

magic

b296df7

holiman reviewed Nov 28, 2022

View reviewed changes

conversion.go Outdated Show resolved Hide resolved

clear z on SetFrombig, even if b is nil

cb2c910

holiman reviewed Dec 6, 2022

View reviewed changes

elee1766 and others added 3 commits December 6, 2022 06:28

Update conversion.go

df6b926

Update conversion.go

bd117d9

fix scan

Merge pull request #1 from Tjudice/patch-1

048efc0

Update conversion.go

base10: minor api change, added docs

9a76ab7

holiman reviewed Dec 15, 2022

View reviewed changes

holiman and others added 5 commits December 15, 2022 09:59

ci, go.mod: require go 1.18+

42313ad

squashme: circle fixes

7e0467a

properly cgheck for overflow in fuzz

7f61ee9

number overflow test

1d8f726

added comments to explain fromBase10Long

f4e6c19

holiman reviewed Dec 15, 2022

View reviewed changes

elee1766 and others added 2 commits December 15, 2022 12:39

Update base10.go

84510f7

Co-authored-by: Martin Holst Swende <martin@swende.se>

Update base10.go

6e4f914

Co-authored-by: Martin Holst Swende <martin@swende.se>

elee1766 and others added 6 commits December 28, 2022 21:36

remove extra struct def

b5a4906

testing: try to get 100% coverage again

fb7ceef

lintfix

4ef03d7

fuzzing: fix up fuzzer for string conversion

5587dd6

circle: try to get circleci fuzzing going

7ff0d8f

conversion: more coverage + fix in Scan

5019647

decimal_test: more coverage

2f0b0b7

holiman force-pushed the elee branch from 8114d9a to bd32dc4 Compare December 29, 2022 08:15

go.mod: fuzzing dep

dcd75fd

holiman force-pushed the elee branch from 8eee78b to dcd75fd Compare December 29, 2022 08:35

holiman changed the title ~~[wip] SetString (#108)~~ String conversion from decimal (#108) Dec 29, 2022

conversion: consistent use of ptr receiver

0472e9b

holiman force-pushed the elee branch from cc86d97 to e8593b5 Compare December 29, 2022 08:56

circle: make use of restored corpus

d55274d

holiman force-pushed the elee branch from e8593b5 to d55274d Compare December 29, 2022 08:58

properly parse scientific notation

82fd325

holiman reviewed Jan 20, 2023

View reviewed changes

holiman reviewed Jan 26, 2023

View reviewed changes

elee1766 and others added 2 commits January 26, 2023 19:47

add some overflow cases for scan, and error on overflow

35d62f2

conversion minor nitpicks

88226a1

holiman added 2 commits February 8, 2023 14:00

conversion: fix test

5ff06b0

conversion_test: bring coverage back to 100

091c3f9

holiman merged commit c6c6f8d into holiman:master Feb 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

String conversion from decimal (#108) #122

String conversion from decimal (#108) #122

elee1766 commented Sep 25, 2022

holiman commented Nov 26, 2022

elee1766 commented Nov 26, 2022 •

edited

Loading

holiman left a comment

holiman Nov 26, 2022

elee1766 Nov 26, 2022

elee1766 commented Nov 28, 2022 •

edited

Loading

holiman Dec 6, 2022

elee1766 Dec 6, 2022

holiman commented Dec 15, 2022 •

edited

Loading

holiman commented Dec 15, 2022

holiman Dec 15, 2022

gfxlabs Dec 15, 2022 •

edited

Loading

holiman Dec 15, 2022

gfxlabs Dec 15, 2022

holiman left a comment

holiman Dec 15, 2022

elee1766 Dec 15, 2022

elee1766 commented Dec 29, 2022

holiman commented Dec 29, 2022

holiman commented Dec 29, 2022

holiman commented Dec 29, 2022 •

edited

Loading

elee1766 commented Dec 30, 2022

holiman Jan 20, 2023

elee1766 Jan 20, 2023 •

edited

Loading

holiman Jan 26, 2023

holiman Jan 26, 2023

elee1766 Jan 27, 2023

holiman commented Feb 8, 2023

holiman commented Feb 8, 2023

String conversion from decimal (#108) #122

String conversion from decimal (#108) #122

Conversation

elee1766 commented Sep 25, 2022

holiman commented Nov 26, 2022

elee1766 commented Nov 26, 2022 • edited Loading

holiman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elee1766 commented Nov 28, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

holiman commented Dec 15, 2022 • edited Loading

holiman commented Dec 15, 2022

Choose a reason for hiding this comment

gfxlabs Dec 15, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

holiman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elee1766 commented Dec 29, 2022

holiman commented Dec 29, 2022

holiman commented Dec 29, 2022

holiman commented Dec 29, 2022 • edited Loading

elee1766 commented Dec 30, 2022

Choose a reason for hiding this comment

elee1766 Jan 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

holiman commented Feb 8, 2023

holiman commented Feb 8, 2023

elee1766 commented Nov 26, 2022 •

edited

Loading

elee1766 commented Nov 28, 2022 •

edited

Loading

holiman commented Dec 15, 2022 •

edited

Loading

gfxlabs Dec 15, 2022 •

edited

Loading

holiman commented Dec 29, 2022 •

edited

Loading

elee1766 Jan 20, 2023 •

edited

Loading