Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(marshal): encode capData in 1 level of JSON #1804

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

Conversation

dckc
Copy link
Contributor

@dckc dckc commented Oct 4, 2023

refs: #1558 , Agoric/agoric-sdk#7999

Description

encode capData to 1 level of JSON, much like #1558, but

  • using lastIndexOf rather than a regex
  • fastcheck testing

motivation: senders pay by the byte etc.

Security Considerations

careful review for confusion vulnerability is in order

Scaling Considerations

double-backslashes cost storage space

Documentation Considerations

This flatter format is easier to read, and so easier to document in some senses, though there's a mixing of levels that's somewhat subtle.

Testing Considerations

This has unit tests for specific examples plus fastcheck tests. Whether I stated the property exactly quite right is worth careful review.

Upgrade Considerations

This is a DRAFT, pending:

  • figure out the whole upgrade story

cc @erights @gibson042

Copy link
Contributor

@mhofman mhofman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would so much rather we properly split encoding from serialization for marshal, as discussed in #1478.

Also I would really prefer if we could find a way to partial parse JSON instead of relying of undocumented serialization constraints (body first, no spaces, etc.) I remember have a discussion with @gibson042 about what API we would need from JS to allow this.

assert(Array.isArray(slots));
const slotj = JSON.stringify(slots);
slotj.indexOf(':[') < 0 || Fail`expected simple slots`;
const body1 = body.replace(/^#/, '');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not check body[0] === '#' and do body.slice(1), I think that's a lot more efficient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to assume that the argument is a CapData record whose body is a "#"-prefixed JSON serialization of SmallCaps-encoded data, which would need a lot more explanation than appears here (and a better name).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

body.replace(/^#/, '') handles both smallCaps and qclass, no? (I haven't tested it, though).

Why is .slice(1) significantly more efficient?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

body.replace(/^#/, '') handles both smallCaps and qclass, no? (I haven't tested it, though).

I guess that depends upon what this function is expected to return. Regardless of the answer to that, though, CapData like { body: `{"@qclass":"bigint","digits":"0"}`, slots: [] } and { body: `#"+0"`, slots: [] } represent exactly the same data (0n) but would have distinct String('{"$body":{"@qclass":"bigint","digits":"0"},"slots":[]}') and String('{"$body":"+0","slots":[]}') return values (respectively) from the current implementation—which seems like a problem because there's no remaining signal differentiating smallcaps from the legacy encoding.

Why is .slice(1) significantly more efficient?

The answer is implementation-specific, but basically comes down to being zero-copy.

$ esbench --eshost-option '-h V8,*XS*' \
'const unprefixed="a".repeat(1000), prefixed = "#" + unprefixed' '{
  "unprefixed.replace": `result = unprefixed.replace(/^#/, "")`,
  "unprefixed.slice": `result = unprefixed.startsWith("#") ? unprefixed.slice(1) : unprefixed`,
  "prefixed.replace": `result = prefixed.replace(/^#/, "")`,
  "prefixed.slice": `result = prefixed.startsWith("#") ? prefixed.slice(1) : prefixed`,
}'
#### Moddable XS
unprefixed.replace: 0.06 ops/ms
unprefixed.slice: 3.56 ops/ms
prefixed.replace: 0.07 ops/ms
prefixed.slice: 0.95 ops/ms

#### V8
unprefixed.replace: 29.41 ops/ms
unprefixed.slice: 100.00 ops/ms
prefixed.replace: 19.61 ops/ms
prefixed.slice: 62.50 ops/ms

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...

Why is .slice(1) significantly more efficient?

The answer is implementation-specific, but basically comes down to being zero-copy.

I guess I trained my regex intuitions in perl where such things are optimized out the wazoo.

Thanks for the esbench details.

slotj.indexOf(':[') < 0 || Fail`expected simple slots`;
const body1 = body.replace(/^#/, '');
assertJSON(body1);
const json = `{"$body":${body1},"slots":${slotj}}`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #1478 (comment) I suggest body#


export const JSONToCapData = json => {
assert.typeof(json, 'string');
json.startsWith('{"$body":') || Fail`expected $body`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this only works when this body is first in the serialized JSON, not second?

Copy link
Contributor

@gibson042 gibson042 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just far too brittle for comfort, and doesn't feel like the right way to solve a "too much escaping" problem (assuming that is in fact what motivates it).

I would so much rather we properly split encoding from serialization for marshal, as discussed in #1478.

I agree.

Also I would really prefer if we could find a way to partial parse JSON instead of relying of undocumented serialization constraints (body first, no spaces, etc.) I remember have a discussion with @gibson042 about what API we would need from JS to allow this.

Yeah, but I don't know if we wrote it down (https://github.com/Agoric/agoric-private/issues/31#issuecomment-1494853056 is related but definitely distinct, as is Go-style hybrid decoding). At any rate, it's not difficult, although it would require going beyond the standard library.

assert(Array.isArray(slots));
const slotj = JSON.stringify(slots);
slotj.indexOf(':[') < 0 || Fail`expected simple slots`;
const body1 = body.replace(/^#/, '');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to assume that the argument is a CapData record whose body is a "#"-prefixed JSON serialization of SmallCaps-encoded data, which would need a lot more explanation than appears here (and a better name).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants