Implements a JSON.parse that transforms unsafe numbers to bigints #3133

lorisleiva · 2024-08-21T17:33:05Z

This PR implements and uses a custom JSON.parse implementation that supports large unsafe JavaScript integers by wrapping them in bigints.

This means we can start accepting RPC values as bigint right now without loss of precision.

It does so by first transforming the JSON string such that any integer outside of quoted strings will be wrapped in a BitIngValueObject if and only if Number.isSafeInteger returns false for that number.

It then uses a JSON.parse reviver to identify that special object wrapper and casts that to a bigint without loss of information.

There are still a lot more tests I'd like to add to this PR but wanted to gather early feedback before continuing my work.

changeset-bot · 2024-08-21T17:33:09Z

⚠️ No Changeset found

Latest commit: 845a4a3

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

lorisleiva · 2024-08-21T17:33:32Z

Implements a JSON.parse that transforms unsafe numbers to bigints #3133 👈
master

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @lorisleiva and the rest of your teammates on Graphite

steveluscher · 2024-08-21T20:53:01Z

Interesting. I would have presumed that serde_json would not produce invalid JSON strings, but it does.

Sandbox link.

use serde_json::json;

fn main() {
    println!("{}", json!({"thing": 9_223_372_036_854_775_807 as i64}));
}

{"thing":9223372036854775807}

I convinced myself that this is the case by looking at the rentEpoch output of the production RPC.

{"rentEpoch":18446744073709551615}

I think I see what your suggestion is here. Being this the case, create this fixer-upper for the current RPC, but then make RPCv2 actually output a value object. web3.js 2.0 will work with either, from the beginning.

packages/rpc-transport-http/src/json-parse-with-bigint.ts

steveluscher · 2024-08-21T21:08:49Z

JSON.parse(
    "{\"hi\":9223372036854775807,\"bye\":\"content\"}",
    (_, value, { source }) =>
        source?.match(/^\d+$/)
            ? BigInt(source)
            : value,
);

Sorry. :)

Edit: Ah shit, fuck you, Safari (and Firefox). https://caniuse.com/mdn-javascript_builtins_json_parse_reviver_parameter_context_argument

steveluscher · 2024-08-21T21:52:48Z

packages/rpc-transport-http/src/json-parse-with-bigint.ts

+function transformUnquotedSegments(json: string, transform: (value: string) => string): string {
+    /**
+     * This regex matches any part of a JSON string that isn't wrapped in double quotes.
+     *
+     * For instance, in the string `{"age":42,"name":"Alice \"The\" 2nd"}`, it would the
+     * following parts: `{`, `:42,`, `:`, `}`. Notice the whole "Alice \"The\" 2nd" string
+     * is not matched as it is wrapped in double quotes and contains escaped double quotes.
+     *
+     * The regex is composed of two parts:
+     *
+     *   1. The first part `^([^"]+)` matches any character until we reach the first double quote.
+     *   2. The second part `("(?:\\"|[^"])+")([^"]+)` matches any double quoted string that may
+     *      and any unquoted segment that follows it. To match a double quoted string, we use the
+     *      `(?:\\"|[^"])` regex to match any character that isn't a double quote whilst allowing
+     *      escaped double quotes.
+     */
+    const unquotedSegmentsRegex = /^([^"]+)|("(?:\\"|[^"])+")([^"]+)/g;
+
+    return json.replaceAll(unquotedSegmentsRegex, (_, firstGroup, secondGroup, thirdGroup) => {
+        // If the first group is matched, it means we are at the
+        // beginning of the JSON string and we have an unquoted segment.
+        if (firstGroup) return transform(firstGroup);
+
+        // Otherwise, we have a double quoted string followed by an unquoted segment.
+        return `${secondGroup}${transform(thirdGroup)}`;
+    });
+}


I would be really tempted to implement this as a parser instead of a regex. Walking every character:

Toggle inQuote on and off when you encounter a non-escaped ".

If inQuote is false and you encounter a contiguous run of numeric digits without a . in the middle, then wrap 'em.

const str = '{"hi":9223372036854775807,"there":123.4,"now":2E32,"then":1.2E32,"bye":"content is \\"1337\\""}'; let out = []; let inQuote = false; for (let ii = 0; ii < str.length; ii++) { let isEscaped = false; if (str[ii] === '\\') { out.push(str[ii++]); isEscaped = !isEscaped; } if (str[ii] === '"') { out.push(str[ii++]); if (!isEscaped) { inQuote = !inQuote; } } if (!inQuote) { let consumedNumber = ''; while (str[ii]?.match(/[\d\.Ee]/)) { consumedNumber += str[ii++]; } if (consumedNumber.length) { if (consumedNumber.includes('.')) { out.push(consumedNumber); } else { out.push(`{"$n":"${consumedNumber}"}`) } } } out.push(str[ii]); } console.log(out.join('')); // {"hi":{"$n":"9223372036854775807"},"there":123.4,"now":{"$n":"2E32"},"then":1.2E32,"bye":"content is \"1337\""}

Still need to do some work to convert 2E32 to a bigint; it's not as straightforward as BigInt('2E32').

I don't disagree. Regexes always feel a bit hacky but at the same time I was struggling to come up with examples that would not work with this implementation so I though it might be the least intrusive way to implement this. That being said, we still need a shit loads more test to make sure we're not messing up with the integrity of the response data.

I'm up for giving the parser a go though. I don't know if you've seen this library but it could be useful to get some inspiration from. Tbh, I started looking at this parsing logic first and though: hmm maybe a parser is more complicated than it looks like and moved to regexes instead haha. But I think this library does a bit too much for our need anyway.

Anyway, to summary my chaotic thoughts, we can either:

Continue to explore regexes.

Move towards a parsing solution that only transforms the string (what you're suggesting).

Move towards a full parsing solution that outputs parsed data directly by forking/editing this library.

In any case, we need lots more tests.

I lean towards solution 2, it seems like a clean and robust approach.

String-to-string parsing will be way more resilient, I think. It's already resilient to annoying shit like newlines, spaces, and infinite \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ escape sequences, without added complexity.

I've implemented solution 2 in two different ways:

One that uses a simple regex to consume numbers.

One that uses plain loops to consume numbers.

The while (str[ii]?.match(/[\d\.Ee]/)) wasn't enough to correctly match numbers — e.g. it was matching true because of the e and returning {"result":tru{"$n":"e"}}.

I've also left the old regex-only implementation to compare in the benchmarks but I'm happy to remove it because the parsing is much more robust.

lorisleiva · 2024-08-21T22:21:36Z

Damn that's a shame about the source attribute. It would have been perfect.

buffalojoec

This is awesome @lorisleiva - nice work! I'm stoked to get rid of all the UnsafeBeyond2Pow53Minus1 now. 🚀

I think we should strongly consider benchmarking this parser, though, before cutting a release with it in the response transformer. Something like taking some huge JSON payloads that could come back from the RPC (think getProgramAccounts) and comparing native JSON.parse (or .json()) to this modded version with the reviver. Maybe @steveluscher knows some slick ways to do these benches?

packages/rpc-transport-http/src/json-parse-with-bigint.ts

socket-security · 2024-08-22T20:37:35Z

New and removed dependencies detected. Learn more about Socket for GitHub ↗︎

Package	New capabilities	Transitives	Size	Publisher
npm/jest-dev-server@10.0.0	environment Transitive: filesystem, shell	`+14`	505 kB	neoziro
npm/jest-environment-jsdom@30.0.0-alpha.6	Transitive: environment, eval, filesystem, network, shell, unsafe	`+68`	7.75 MB	simenb
npm/jest-runner-eslint@2.2.0	Transitive: environment, filesystem, shell	`+30`	562 kB	simenb
npm/jest-runner-prettier@1.0.0	Transitive: environment, eval, filesystem, network, shell, unsafe	`+193`	24.9 MB	keplersj
npm/jest-watch-master@1.0.0	Transitive: environment	`+26`	2.01 MB	rickhanlonii
npm/jest-watch-select-projects@2.0.0	None	`+7`	305 kB	simenb
npm/jest-watch-typeahead@2.2.2	Transitive: environment, filesystem, unsafe	`+37`	1.33 MB	simenb
npm/jest-websocket-mock@2.5.0	Transitive: environment	`+13`	1.21 MB	romgain
npm/jest@30.0.0-alpha.6	Transitive: environment, eval, filesystem, network, shell, unsafe	`+204`	20.6 MB	simenb
npm/jscodeshift@17.0.0	Transitive: environment, filesystem, unsafe	`+93`	16.4 MB	daniel15
npm/json-stable-stringify@1.1.1	Transitive: eval	`+9`	208 kB	ljharb
npm/pino-pretty@11.2.2	environment Transitive: filesystem	`+20`	1.35 MB	matteo.collina
npm/pino@9.3.2	environment, unsafe Transitive: eval	`+16`	1.66 MB	matteo.collina
npm/react-error-boundary@4.0.13	None	`+2`	306 kB	brianvaughn

🚮 Removed packages: npm/swr@2.2.5), npm/typescript@5.5.4), npm/whatwg-fetch@3.6.20)

View full report↗︎

lorisleiva · 2024-08-22T21:00:00Z

As discussed offline, let's revisit this after a significant RPC refactoring that returns a Response object instead of the parsed response data.

github-actions · 2024-09-09T08:02:24Z

Because there has been no activity on this PR for 14 days since it was merged, it has been automatically locked. Please open a new issue if it requires a follow up.

lorisleiva force-pushed the loris/json-parse-for-bigints branch from 6102a4d to addd8e9 Compare August 21, 2024 17:41

steveluscher reviewed Aug 21, 2024

View reviewed changes

packages/rpc-transport-http/src/json-parse-with-bigint.ts Outdated Show resolved Hide resolved

steveluscher reviewed Aug 21, 2024

View reviewed changes

buffalojoec reviewed Aug 21, 2024

View reviewed changes

packages/rpc-transport-http/src/json-parse-with-bigint.ts Outdated Show resolved Hide resolved

lorisleiva force-pushed the loris/json-parse-for-bigints branch 15 times, most recently from bd6725e to 86ed7d4 Compare August 22, 2024 14:38

Implements a JSON.parse that transforms unsafe numbers to bigints

845a4a3

lorisleiva force-pushed the loris/json-parse-for-bigints branch from 86ed7d4 to 845a4a3 Compare August 22, 2024 20:36

lorisleiva closed this Aug 22, 2024

github-actions bot locked as resolved and limited conversation to collaborators Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implements a JSON.parse that transforms unsafe numbers to bigints #3133

Implements a JSON.parse that transforms unsafe numbers to bigints #3133

lorisleiva commented Aug 21, 2024 •

edited

Loading

changeset-bot bot commented Aug 21, 2024 •

edited

Loading

lorisleiva commented Aug 21, 2024

steveluscher commented Aug 21, 2024 •

edited

Loading

steveluscher commented Aug 21, 2024 •

edited

Loading

steveluscher Aug 21, 2024 •

edited

Loading

lorisleiva Aug 21, 2024

steveluscher Aug 21, 2024

lorisleiva Aug 22, 2024

lorisleiva commented Aug 21, 2024

buffalojoec left a comment

socket-security bot commented Aug 22, 2024

lorisleiva commented Aug 22, 2024

github-actions bot commented Sep 9, 2024

Implements a JSON.parse that transforms unsafe numbers to bigints #3133

Implements a JSON.parse that transforms unsafe numbers to bigints #3133

Conversation

lorisleiva commented Aug 21, 2024 • edited Loading

changeset-bot bot commented Aug 21, 2024 • edited Loading

⚠️ No Changeset found

lorisleiva commented Aug 21, 2024

steveluscher commented Aug 21, 2024 • edited Loading

steveluscher commented Aug 21, 2024 • edited Loading

steveluscher Aug 21, 2024 • edited Loading

Choose a reason for hiding this comment

lorisleiva Aug 21, 2024

Choose a reason for hiding this comment

steveluscher Aug 21, 2024

Choose a reason for hiding this comment

lorisleiva Aug 22, 2024

Choose a reason for hiding this comment

lorisleiva commented Aug 21, 2024

buffalojoec left a comment

Choose a reason for hiding this comment

socket-security bot commented Aug 22, 2024

lorisleiva commented Aug 22, 2024

github-actions bot commented Sep 9, 2024

lorisleiva commented Aug 21, 2024 •

edited

Loading

changeset-bot bot commented Aug 21, 2024 •

edited

Loading

steveluscher commented Aug 21, 2024 •

edited

Loading

steveluscher commented Aug 21, 2024 •

edited

Loading

steveluscher Aug 21, 2024 •

edited

Loading