-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement JsonParserGenerator using smithy-json #498
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM.
Some high level things:
- A lot of functions in smithy-json should probably be
#[inline]
, it makes a shockingly big different in Rust in terms of performance. as_escaped_str()
doesn't communicate it's danger enough, IMO. Perhapsas_escaped_unchecked()
? And then a specific justification about why this usage is OK? I'm worried about silently sneaking\"
into someone's response etc.- It would be good to get a ballpark comparison in terms of wall clock time and
cargo llvm-lines
between this and serde_json for DynamoDB (I suspect, probably similar?) But just good to make sure it's not way worse.
private val smithyJson = CargoDependency.smithyJson(runtimeConfig).asType() | ||
private val codegenScope = arrayOf( | ||
"Error" to smithyJson.member("deserialize::Error"), | ||
"expect_blob_or_null" to smithyJson.member("deserialize::token::expect_blob_or_null"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
personally, I would probably have just made token
the namespace then done #{token}::expect_bool_or_null
in the code but, either way.
*codegenScope | ||
) | ||
rust("let result =") | ||
deserializeMember(member, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably create a Ctx
for this, partially just to follow the pattern of all the other deserializers & partially to make the code a little easier to read without jumping to definition to see what false
means
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may end up having to introduce context to correctly handle the non-sparse list nullability issue. Otherwise, it seemed like overkill for one context value. I should have used named arguments though to clarify the boolean value.
} | ||
if (!forceOptional && !symbol.isOptional()) { | ||
if (symbol.canUseDefault()) { | ||
rust(".unwrap_or_default()") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you shouldn't need to do this if you use the builder (but I could be wrong).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the result of having to create union variants that have zero types. For example:
"int" => Some(crate::model::Choice::Int(
smithy_json::deserialize::token::expect_number_or_null(tokens.next())?
.map(|v| v.to_i32())
.unwrap_or_default(),
)),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh right...unions don't have builders. OK.
} else { | ||
val name = escape(memberShape.memberName) | ||
rustTemplate( | ||
".ok_or_else(|| #{Error}::custom(\"value for '$name' cannot be null\"))?", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably just push this onto the builder so that it fails when you try to .build()
on the builder and it keeps this null checking in one place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a bug: it's going to reject null values in non-sparse lists when it should ignore them (see example below).
let mut items = Vec::new();
// ...
items.push(
deser_union_choice(tokens)?
.ok_or_else(|| Error::custom("value for 'member' cannot be null"))?
);
I'll fix this and add a protocol test for it.
val stringFn = if (isEnum) "expect_string_or_null" else "expect_unescaped_string_or_null" | ||
rustTemplate("#{$stringFn}(tokens.next())?", *codegenScope) | ||
if (isEnum) { | ||
rust(".map(|s| #T::from(s.as_escaped_str()))", symbolProvider.toSymbol(target)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably worth a comment that we don't need to escape here because enum values are restricted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
although actually I've seen some crazy enum values with quotes and stuff. Might need to unescape actually (worth adding a test).
rustTemplate("#{expect_number_or_null}(tokens.next())?.map(|v| v.to_#{T}())", "T" to symbol, *codegenScope) | ||
} | ||
|
||
private fun RustWriter.deserializeTimestamp(member: MemberShape) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still deciding if how I feel about pushing this stuff into smithy-json/smithy-xml. I think it's good? I guess it simplifies codegen a little bit.
rustBlock("match tokens.next().transpose()?") { | ||
rustBlockTemplate( | ||
""" | ||
Some(#{Token}::ValueNull { .. }) => return Ok(None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are unions nullable? I'm not positive they are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it depends:
structure SomeStruct {
unionValue: SomeUnion,
}
// Case 1: Top-level union value is null.
// Nulls shouldn't be explicit, per the spec, but we should handle
// this case gracefully and set unionValue to None in this case:
{ "unionValue": null }
// Case 2: One variant is explicitly set to null.
// This case is invalid because there has to be exactly
// one non-null variant in the union struct per the spec:
{ "unionValue": { "someVariant": null } }
// The above is basically equivalent to:
{ "unionValue": {} }
The match block in question is handling Case 1.
} | ||
} | ||
} | ||
// TODO: Handle unrecognized union variants |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TODO: Handle unrecognized union variants | |
// TODO: Handle unrecognized union variants https://github.com/awslabs/smithy-rs/issues/185 |
|
||
private fun RustWriter.objectKeyLoop(inner: RustWriter.() -> Unit) { | ||
rustBlock("loop") { | ||
rustBlock("match tokens.next().transpose()?") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it may be helpful to offer a wrapper around Peekable
inside of smithy-json that has a nicer interface to avoid all the transposition etc.
rustBlockTemplate( | ||
""" | ||
Some(#{Token}::ValueNull { .. }) => return Ok(None), | ||
Some(#{Token}::Start${StringUtils.capitalize(objectOrArray)} { .. }) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it's restricted to this private function maybe just pass "Array"
and "Object"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And seems like deserializeFunctionName
is generating an extra _
when called on lists which is causing CI to fail
It looks like this is actually because MediaPackage has names such as |
Did some crude benchmarking to get a rough idea of how these changes impact performance. Note: this isn't a completely fair comparison right now because the smithy-json version is still using serde to parse errors (that part hasn't been refactored as part of this PR). Repeating json-rpc10, json-rpc11, rest-json, and rest-json-extras protocol tests 100x in release mode (wall-clock time):
Dynamo release build time including dependencies (wall-clock time):
Running
I want to see the same comparisons after refactoring the error parsing, and also want to add a real-world Dynamo GetItem comparison. |
big +1 to both of those! I would make the integration test gradle job slightly smarter and allow us to add a |
Added a DynamoDB benchmark in #507. I went ahead and ran it against both main:
PR:
I'm not fully convinced this benchmark is giving me the information I want, but it does corroborate the earlier wall-clock time that wasn't really showing much of a difference. If we want to add benches to CI later, we probably want to use lai instead of criterion since that will measure instructions and memory accesses rather than time. |
Did some more benchmarking since simplifying the benchmarks in #507. It looks like performance is in the same ball park after this PR. main:
PR:
It looks like there may be some opportunity for improvement, but we're on the order of a 0-4 microsecond difference. The consistency of measurements of the PR version vs. what's in main is interesting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! a few comments inline
uri: "/escaped-string-values", | ||
method: "POST", | ||
protocol: "aws.protocols#restJson1", | ||
body: "{\"enum\":\"has\\\"quotes\",\"also\\\"has\\\"quotes\":\"test\"}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you prefer, smithy supports """
which might remove some \
""", | ||
*codegenScope | ||
) | ||
deserializeStructInner(errorShape.members()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this read until }
? we should make sure that extra key/value pairs are allowed/ignored
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, deserializeStructInner
consumes the final }
as well as skips values for keys it doesn't recognize.
More progress towards #161. This PR implements JSON deserialization codegen using the new smithy-json token streaming parser. Cleaning up the old Serde implementation is going to be a separate PR as it's going to require more refactoring.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.