-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jv: Add some support for 64 bit ints in a very conservative way (ALTERNATIVE) #1327
base: master
Are you sure you want to change the base?
Conversation
I'm not a fan of the JQ_OMIT_INTS ifdefs, they smell like premature optimizaition, probably worth benchmarking to see if they are justified. You should probably change the title of this PR because it's not very conservative, heh. In particular, messing with the union was something i explicitly wanted to avoid - my implementation only affected display and didn't have to deal with other parts of the code. But everything seems to go through Having signed 64 bit ints too is neat. Only skimmed, haven't reviewed much. Also haven't tested. |
Arithmetic (and math) continues to go through |
As to the |
@nicowilliams - No doubt I'm missing something, but using @dequis's test case, it seems this change entails a step backwards:
By contrast:
|
@pkoppstein Yeah, and that's not the only thing that's wrong. Try |
cdf23ea
to
cd3569d
Compare
Tests still needed. |
@wtlangford Can you review this? I think it's done. Also, I'm thinking of taking this approach a bit further. I'm thinking of adding
|
Ah, so, EDIT: No, it's done already. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing here seems too concerning, other than the overflow checking.
We appear to continue to do double math with this change. Do we want to consider being subkind-aware for adding and subtracting (and maybe multiplication?). Division should, of course, use double behavior, since we don't want integer division semantics in jq.
I'm envisioning this problem (which is a floating-point precision error):
9223372036854774784 + 1
#9223372036854774784
Of course, jq still prints that as 9223372036854775000
because of a bug somewhere in jv_dtoa
...
Is inconsistent integer semantics worse than surprising double precision errors?
Also, we ought to use in our constant folding.
Also, tests. Lots of them.
return jv_int64(i); | ||
} else if (d < UINT64_MAX) { | ||
uint64_t u = d; | ||
if (d == (double)u) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto. Here we know that d
is positive, and if it's also less than UINT64_MAX
then we can represent it as a uint64_t
.
double d = jv_number_value(input); | ||
if (d < 0 && d >= INT64_MIN) { | ||
int64_t i = d; | ||
if (i < 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's happening here? Is this a guard against the cast turning -0.5
into 0
instead of -1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's just for deciding whether to return a signed 64-bit integer representation or unsigned 64-bit integer representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if the check is false, the next code that runs is return jv_number(nearbyint(d));
, which is a double representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These checks should use the nextafter
stuff from below for INT64_MIN
and UINT64_MAX
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh? No, there's an else if
at 377.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I misunderstand.
Line 373 is checking to see if d is negative AND d fits into an int64_t
Line 374 does the cast
What does line 375 check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, i'm the one who's confused here. I need to s/else // here :)
#ifndef JQ_OMIT_INTS | ||
jv j = {JV_KIND_NUMBER, JV_SUBKIND_INT64, 0, 0, {.int64 = x}}; | ||
#else | ||
jv j = {JV_KIND_NUMBER, JV_SUBKIND_INT64, 0, 0, {.number = x}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This uses .number
, but has a JV_SUBKIND_INT64
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops! (It would have no effect, given that other code that would check it would be omitted. But it's still a bug.)
return (int64_t)j.u.uint64; | ||
return INT64_MAX; | ||
} | ||
if (j.u.number > 0 && (int64_t)j.u.number < 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned this might be processor-specific behavior and prone to breaking.
How about this instead:
if (j.u.number > nextafter(INT64_MAX, 0))
return INT64_MAX;
if (j.u.number < nextafter(INT64_MIN, 0))
return INT64_MIN;
Basically, we get the next double in the direction of zero from whatever we get after converting INT64_MAX
to double.
If we're farther from 0
than that value, then we had to be at least INT64_MAX
.
The same logic holds for INT64_MIN
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this! Thanks!
} | ||
if (j.u.number < 0) | ||
return 0; | ||
if (j.u.number > (double)UINT64_MAX) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, perhaps something like this:
if (j.u.number > nextafter(UINT64_MAX, 0))
return UINT64_MAX;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
double x = jv_number_value(j); | ||
/* XXX Check against actual double min/max integers */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I seem to repeat myself. 😜
Assuming that x != x
is for NaN detection...
if (isnan(j.u.number) || j.u.number > nextafter(INT_MAX, 0) || j.u.number < nextafter(INT_MIN, 0))
return 0;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough.
struct jv_refcnt; | ||
|
||
/* All of the fields of this struct are private. | ||
Really. Do not play with them. */ | ||
typedef struct { | ||
unsigned char kind_flags; | ||
unsigned char pad_; | ||
unsigned char subkind_flags; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And now our padding is gone. 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, yes, we lose the padding. I suppose I could encode sub-kind into kind_flags. We could always do that later if ever we want to use the padding for something else.
In my version of this, that was explicitly intended behavior. The idea was to fix the common case of using jq as a pretty-printer and/or filter, so as long as the numbers aren't touched they keep their old values. |
Ah, then I have no complaints about the double-precision math. Additionally, maintaining existing behavior is easier than changing it and then deciding to change it back later; so we can always revisit in the future. |
Eventually we might add branches to basic arithmetic operators to avoid double-precision math when possible. But I'm a bit loathe to do that right now. The goal here really is to allow We could also fix the formatter to never use scientific notation for doubles with no non-zero decimal part. But we do also get calls for larger integer ranges than -2^52..2^52, which I think is fair enough. |
Thank you all for JQ. It would be great if you could address this issue. |
This is a variant of #1246 that doesn't change the size of a
jv
.@dequis What do you think?