-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implements a JSON.parse that transforms unsafe numbers to bigints #3133
Conversation
|
This stack of pull requests is managed by Graphite. Learn more about stacking. Join @lorisleiva and the rest of your teammates on Graphite |
6102a4d
to
addd8e9
Compare
Interesting. I would have presumed that use serde_json::json;
fn main() {
println!("{}", json!({"thing": 9_223_372_036_854_775_807 as i64}));
}
I convinced myself that this is the case by looking at the
I think I see what your suggestion is here. Being this the case, create this fixer-upper for the current RPC, but then make RPCv2 actually output a value object. web3.js 2.0 will work with either, from the beginning. |
JSON.parse(
"{\"hi\":9223372036854775807,\"bye\":\"content\"}",
(_, value, { source }) =>
source?.match(/^\d+$/)
? BigInt(source)
: value,
); Sorry. :) Edit: Ah shit, fuck you, Safari (and Firefox). https://caniuse.com/mdn-javascript_builtins_json_parse_reviver_parameter_context_argument |
function transformUnquotedSegments(json: string, transform: (value: string) => string): string { | ||
/** | ||
* This regex matches any part of a JSON string that isn't wrapped in double quotes. | ||
* | ||
* For instance, in the string `{"age":42,"name":"Alice \"The\" 2nd"}`, it would the | ||
* following parts: `{`, `:42,`, `:`, `}`. Notice the whole "Alice \"The\" 2nd" string | ||
* is not matched as it is wrapped in double quotes and contains escaped double quotes. | ||
* | ||
* The regex is composed of two parts: | ||
* | ||
* 1. The first part `^([^"]+)` matches any character until we reach the first double quote. | ||
* 2. The second part `("(?:\\"|[^"])+")([^"]+)` matches any double quoted string that may | ||
* and any unquoted segment that follows it. To match a double quoted string, we use the | ||
* `(?:\\"|[^"])` regex to match any character that isn't a double quote whilst allowing | ||
* escaped double quotes. | ||
*/ | ||
const unquotedSegmentsRegex = /^([^"]+)|("(?:\\"|[^"])+")([^"]+)/g; | ||
|
||
return json.replaceAll(unquotedSegmentsRegex, (_, firstGroup, secondGroup, thirdGroup) => { | ||
// If the first group is matched, it means we are at the | ||
// beginning of the JSON string and we have an unquoted segment. | ||
if (firstGroup) return transform(firstGroup); | ||
|
||
// Otherwise, we have a double quoted string followed by an unquoted segment. | ||
return `${secondGroup}${transform(thirdGroup)}`; | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be really tempted to implement this as a parser instead of a regex. Walking every character:
- Toggle
inQuote
on and off when you encounter a non-escaped"
. - If
inQuote
isfalse
and you encounter a contiguous run of numeric digits without a.
in the middle, then wrap 'em.
const str = '{"hi":9223372036854775807,"there":123.4,"now":2E32,"then":1.2E32,"bye":"content is \\"1337\\""}';
let out = [];
let inQuote = false;
for (let ii = 0; ii < str.length; ii++) {
let isEscaped = false;
if (str[ii] === '\\') {
out.push(str[ii++]);
isEscaped = !isEscaped;
}
if (str[ii] === '"') {
out.push(str[ii++]);
if (!isEscaped) {
inQuote = !inQuote;
}
}
if (!inQuote) {
let consumedNumber = '';
while (str[ii]?.match(/[\d\.Ee]/)) {
consumedNumber += str[ii++];
}
if (consumedNumber.length) {
if (consumedNumber.includes('.')) {
out.push(consumedNumber);
} else {
out.push(`{"$n":"${consumedNumber}"}`)
}
}
}
out.push(str[ii]);
}
console.log(out.join(''));
// {"hi":{"$n":"9223372036854775807"},"there":123.4,"now":{"$n":"2E32"},"then":1.2E32,"bye":"content is \"1337\""}
Still need to do some work to convert 2E32
to a bigint; it's not as straightforward as BigInt('2E32')
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't disagree. Regexes always feel a bit hacky but at the same time I was struggling to come up with examples that would not work with this implementation so I though it might be the least intrusive way to implement this. That being said, we still need a shit loads more test to make sure we're not messing up with the integrity of the response data.
I'm up for giving the parser a go though. I don't know if you've seen this library but it could be useful to get some inspiration from. Tbh, I started looking at this parsing logic first and though: hmm maybe a parser is more complicated than it looks like and moved to regexes instead haha. But I think this library does a bit too much for our need anyway.
Anyway, to summary my chaotic thoughts, we can either:
- Continue to explore regexes.
- Move towards a parsing solution that only transforms the string (what you're suggesting).
- Move towards a full parsing solution that outputs parsed data directly by forking/editing this library.
In any case, we need lots more tests.
I lean towards solution 2, it seems like a clean and robust approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String-to-string parsing will be way more resilient, I think. It's already resilient to annoying shit like newlines, spaces, and infinite \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
escape sequences, without added complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've implemented solution 2 in two different ways:
- One that uses a simple regex to consume numbers.
- One that uses plain loops to consume numbers.
The while (str[ii]?.match(/[\d\.Ee]/))
wasn't enough to correctly match numbers — e.g. it was matching true
because of the e
and returning {"result":tru{"$n":"e"}}
.
I've also left the old regex-only implementation to compare in the benchmarks but I'm happy to remove it because the parsing is much more robust.
Damn that's a shame about the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome @lorisleiva - nice work! I'm stoked to get rid of all the UnsafeBeyond2Pow53Minus1
now. 🚀
I think we should strongly consider benchmarking this parser, though, before cutting a release with it in the response transformer. Something like taking some huge JSON payloads that could come back from the RPC (think getProgramAccounts
) and comparing native JSON.parse
(or .json()
) to this modded version with the reviver. Maybe @steveluscher knows some slick ways to do these benches?
bd6725e
to
86ed7d4
Compare
86ed7d4
to
845a4a3
Compare
As discussed offline, let's revisit this after a significant RPC refactoring that returns a |
Because there has been no activity on this PR for 14 days since it was merged, it has been automatically locked. Please open a new issue if it requires a follow up. |
This PR implements and uses a custom
JSON.parse
implementation that supports large unsafe JavaScript integers by wrapping them inbigints
.This means we can start accepting RPC values as
bigint
right now without loss of precision.It does so by first transforming the JSON string such that any integer outside of quoted strings will be wrapped in a
BitIngValueObject
if and only ifNumber.isSafeInteger
returnsfalse
for that number.It then uses a
JSON.parse
reviver to identify that special object wrapper and casts that to abigint
without loss of information.There are still a lot more tests I'd like to add to this PR but wanted to gather early feedback before continuing my work.