Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[spec] Normative: Support [Serializable] for WebAssembly.Module #711

Merged
merged 7 commits into from
Jun 27, 2018

Conversation

littledan
Copy link
Collaborator

@littledan littledan commented Feb 21, 2018

This patchset adds support to WebAssembly.Module to be used in HTML serialization, including from postMessage and IndexedDB. The change is based on @flagxor 's earlier PR, with the following changes:

  • Rather than actually including the Module in the serialization, this PR includes only the binary data, as the Module cannot be referenced across agents (which each have their own separate store).
  • The same-agent-cluster limitation is removed, as it is nonsensical in the context of forStorage serialization, for example IndexedDB, which is permitted by this specification.
  • The Memory aspects are omitted in this PR; those will be pursued in a separate PR in the threads repository.
  • The bytes of the module are explicit sub-serialized/deserialized

@@ -2,7 +2,7 @@
Title: WebAssembly Web API
Shortname: wasm-web-api
Group: wasm
Status: FPWD
Status: w3c/FPWD
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was needed for me to be able to build locally

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you update your bikeshed version that should fix it (it will then know wasm is a w3c project).
The reason I changed these is that bikeshed rejects some of the normal stamping with the full string. I still owe Tab filing an issue on that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should I update it? I tried bikeshed update and it still gave this error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to actually update your Bikeshed source code, so e.g. go to the director you checked out bikeshed and run git pull.

@@ -125,6 +135,36 @@ Note: This algorithm accepts a {{Response}} object, or a
1. Return |returnValue|.
</div>

<h2 id="serialization">Serialization</h2>

On the Web, {{Module}} has the <code>[<a extended-attribute>Serializable</a>]</code> extended attribute.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be in its IDL? This seems somewhat unclear.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the JS API specification is trying to avoid referencing anything which is Web-specific. [Serializable] is defined in HTML. I'm not sure how to put this in its IDL without violating that layering.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these objects are not defined in IDL then you need to patch HTML directly as you did with BigNum.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I said as much on the original PR.)

Copy link
Collaborator Author

@littledan littledan Feb 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are defined in IDL, it's just that that IDL is supposed to not reference Web-specific specifications.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also has Exposed=(Window,Worker,Worklet) so I think you're failing somewhat?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@annevk Well, there's a plan for that: whatwg/webidl#468 . This layering is still a goal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I think you need to make this something like "Web user agents must augment the Module interface with the [Serializable] extended attribute." so it's an actual requirement and not a statement of fact that isn't based on any established facts...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the wording; I've switched to use it.


On the Web, {{Module}} has the <code>[<a extended-attribute>Serializable</a>]</code> extended attribute.

The [=serialization steps=], given |value|, |serialized| and |forStorage|, are:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oxford comma?


Given the above engine optimizations, structured serialization provides developers
explicit control over both compiled-code caching and cross-window/worker code
sharing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a note?

1. [=Compile a WebAssembly module=] from |bytes| and set |value|.\[[Module]] to the result.

The semantics of a structured serialization is as-if the binary source, from which the
{{Module}} was compiled, is serialized, then deserialized, and recompiled into the target realm.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a note as well.

@jfbastien
Copy link
Member

This is a spec addition, right? I think you need to follow the same process to get this addition into WebAssembly. Should be very straightfoward, but we can't commit it until the working draft ships (unless we want to re-spin a new public working draft, and all that entails).

@domenic
Copy link
Member

domenic commented Feb 21, 2018

Hmm, I sure hope this group has the ability to publish new editor's draft without going through the W3C working draft process stuff...

@domenic
Copy link
Member

domenic commented Feb 21, 2018

This PR is very confusing BTW because it says "Memory" several times (e.g. in the PR title) when I believe the actual spec text is for "Module".

@jfbastien
Copy link
Member

Hmm, I sure hope this group has the ability to publish new editor's draft without going through the W3C working draft process stuff...

Maybe I misunderstand, but IIUC this PR requires me to change code in WebKit. I agree with the change, but it's not editorial. It's a new capacity, and as such should go through the CG as documented in our process. Doesn't matter if it's small and obvious, we can't just commit non-editorial changes to the spec. All work would then iterate in a separate repo, which we'd merge into the spec when it's ready to be merged.

Separately, even if we go through the process, I'm worried that merging this change into the tip-of-tree of this repo does require going through the W3C working draft process. A way to avoid this is to work purely in a branch for each working draft. This change then wouldn't affect the draft.

@flagxor
Copy link
Member

flagxor commented Feb 21, 2018

This was content that was present in the design repo material before it got converted to our current spec. It was lost in transcription as the old verbiage was very informal (and Dan had the impression it got moved to the HTML spec).
https://github.com/WebAssembly/design/blob/master/JS.md#structured-clone-of-a-webassemblymodule
The PR was my attempt to transform it into something more formal, but I paused that once Dan took over drafting our JS spec.

We could certainly churn through the proposal process, but I would argue that's unnecessarily bureaucratic.
This certainly exceeds the "editorial" bar and is a "substantive" change as it does have bearing on implementations, however, I don't think that requires the full process as this goes back to the seed material for the spec. I.e. this would be similar to if Andreas had managed to forget an opcode in the core spec. It is something we need to track and note when publishing the next working draft, but because of its origin, it seems like overkill to run the whole proposal process.

@jfbastien
Copy link
Member

Brad's explanation makes sense. Agreed.

@littledan littledan changed the title [spec] Normative: Support [Serializable] for WebAssembly.Memory [spec] Normative: Support [Serializable] for WebAssembly.Module Feb 21, 2018
@littledan
Copy link
Collaborator Author

@domenic Oops re: Memory. Fixed up the cases I saw.

@jfbastien I think there are two discussions here and we're talking past each other. One question is whether this is a new feature. I don't think anyone is saying no to this. This definitely isn't an editorial change, it's a normative change and new feature (which might deserve to cut to the head of the line for @flagxor 's historical reasons). I'd be fine asking the Wasm CG for a repo here, but sounds you're not asking me to.

What @domenic is suggesting is that the WebAssembly CG may want to maintain two documents in parallel: A FWPD (what you've already sent out, leaving it frozen), and an "Editor's draft" (W3C speak for a document that you can just edit whenever you want). Basically, the standard that you put through the IP opt-out period and the other formal processes that follow it would be a fork of the editor's draft, which we'd occasionally backport editorial patches to or whatever else is urgent.

A nice thing about maintaining an Editor's Draft is that people working on new features don't have to block on W3C process to get their work done. At the same time, people who are focusing on W3C process have a stamp-ready, feature-filled document ready for them once the cycle opens up again--there's no need to "open the floodgates" and merge a bunch of things at a particular time.

Copy link
Member

@flagxor flagxor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but defer to folks that know the interop with the html spec better.

@jfbastien
Copy link
Member

@jfbastien I think there are two discussions here and we're talking past each other. One question is whether this is a new feature. I don't think anyone is saying no to this. This definitely isn't an editorial change, it's a normative change and new feature (which might deserve to cut to the head of the line for @flagxor 's historical reasons). I'd be fine asking the Wasm CG for a repo here, but sounds you're not asking me to.

Correct, if this is just putting in something that was agree on, and that was dropped, then I don't think you need to go through the process. I encourage you to provide this type of context next time, so non-editorial changes are easy to categorize.

What @domenic is suggesting is that the WebAssembly CG may want to maintain two documents in parallel: A FWPD (what you've already sent out, leaving it frozen), and an "Editor's draft" (W3C speak for a document that you can just edit whenever you want). Basically, the standard that you put through the IP opt-out period and the other formal processes that follow it would be a fork of the editor's draft, which we'd occasionally backport editorial patches to or whatever else is urgent.

A nice thing about maintaining an Editor's Draft is that people working on new features don't have to block on W3C process to get their work done. At the same time, people who are focusing on W3C process have a stamp-ready, feature-filled document ready for them once the cycle opens up again--there's no need to "open the floodgates" and merge a bunch of things at a particular time.

Agreed, it's simply more maintenance to keep two branches going.

@littledan
Copy link
Collaborator Author

OK, thanks for the reviews everyone! I'll put this on the agenda for the next Wasm CG meeting so we can discuss next steps.

@littledan
Copy link
Collaborator Author

It seems like web-platform-tests already has good tests for module serialization, which verify that

Two things from here:

  • Probably there should be a test that checks that IndexedDB is not allowed to serialize Modules from an insecure context.
  • Should we re-add the same agent cluster check? Does anyone remember why it was there in the first place? If we do add it back, I think we still want it disabled in the forStorage case (since it doesn't seem to make sense there). Otherwise, I can update the test to check that postMessage does actually work from a ServiceWorker.

@flagxor
Copy link
Member

flagxor commented Mar 1, 2018

So my recollection was that there was disagreement around requiring a secure context for IDB serialization (from JSC + Firefox)? Chrome has taken the position that persisting code from a potentially insecure context is problematic.

I think the agent cluster check just fell out of the assumption that code sharing was most essential between threads. I'd be happy dropping it if others don't care?

@littledan
Copy link
Collaborator Author

Given the past disagreement on this security question about storing code from insecure contexts, how should we proceed? Should the specification permit the various positions taken by different browsers, or should we come to a common, single semantic?

@jfbastien
Copy link
Member

jfbastien commented Mar 1, 2018

The security question is specifically about caching compiled code in IDB from an insecure context, not serializing, correct?

What do other web caching things do?

@domenic
Copy link
Member

domenic commented Mar 1, 2018

We must become interoperable on this web-observable behavior.

Other modern caching things (e.g. the service worker cache API) are not accessible from insecure contexts.

@littledan
Copy link
Collaborator Author

@jfbastien "Serializing" is the name in the HTML specification that describes how things are stored in IndexedDB as well as postMessage. There is a forStorage mode switch which specifications can use to differentiate those two cases.

@jfbastien
Copy link
Member

Other modern caching things (e.g. the service worker cache API) are not accessible from insecure contexts.

If other modern developer-controlled caches are spec'd to disallow caching from insecure contexts then it seems like we should follow the same rationale unless their motivation to do so doesn't apply to WebAssembly caching. Specifically, what's the bad stuff that an attacker can do if they can get a module cached? Can they permanently inject content into a site? If they decide on the IDB key then the answer is "yes", but if the key is specified as something they can't control then "no".

@domenic
Copy link
Member

domenic commented Mar 1, 2018

Right, it's about injecting content into the site. If I understand correctly the scenario, under insecure HTTP, an attacker can control everything, including the IDB key.

@jfbastien
Copy link
Member

As we discussed in the CG call today, I'd like our decision to mention HTTP / HTTPS to be a principled one, not just "hey we could do it for IDB". It sounds like at least Chrome wouldn't allow IDB serialization under HTTP. I'd rather spec any secure context restrictions separately, following a WebAssembly overall guidance. Maybe this can be an offshoot out the secure context discussion?

@littledan
Copy link
Collaborator Author

littledan commented Mar 6, 2018

We heard in the call that Chrome is unlikely to ship the ability to store Modules from insecure contexts in IndexedDB. If it's possible, I hope we can soon come to agreement among all the browsers that what we specify here will eventually be shipped.

To ensure compatibility, I'd prefer to aim to be only adding features over time, and not shipping additional restrictions on previously shipped features unless it is to correct a previous mistake. (If there isn't compatibility risk from a future restriction, it's because the feature isn't being used; in that case, why bother specifying and shipping it?)

I think additional restrictions are a decent conservative default given disagreement. In this case, I could see logic to the restriction--persistent storage of attacker code is worse than ephemerally running attacker code on a particular page load.

@jfbastien
Copy link
Member

We heard in the call that Chrome is unlikely to ship the ability to store Modules from insecure contexts in IndexedDB. If it's possible, I hope we can soon come to agreement among all the browsers that what we specify here will eventually be shipped.

To ensure compatibility, I'd prefer to aim to be only adding features over time, and not shipping additional restrictions on previously shipped features unless it is to correct a previous mistake. (If there isn't compatibility risk from a future restriction, it's because the feature isn't being used; in that case, why bother specifying and shipping it?)

I don't really see Chrome not shipping this for HTTP as a big issue since all sites have to handle IDB storage failure already, and not all browsers currently support IDB module storage. The incompatibility is indistinguishable from the current state of the web, as well as from random failure.

If you want to aim for specifying HTTPS for module serialization, that's also fine with me. As I've outlined I think we should do so in a principled manner, not ad-hoc for this proposal. We can block serialization of modules on figuring out a precise policy for how WebAssembly should act under an insecure context. Should we discuss this with webappsec alongside CSP, or as a separate thing?

@littledan
Copy link
Collaborator Author

I don't really see Chrome not shipping this for HTTP as a big issue since all sites have to handle IDB storage failure already, and not all browsers currently support IDB module storage. The incompatibility is indistinguishable from the current state of the web, as well as from random failure.

If we're OK with unpredictable performance, I don't understand the motivation for this feature. Browsers could just transparently cache modules based on a hash of the bitcode. I thought the idea of serializing modules in IndexedDB was that you'd be able to fairly reliably control when they are cached and read out.

There's a bit of difference between a permanent state of differences between browsers and other cases where this feature won't be available: when comparing to the current state of the web, this is hopefully a one-time upgrade, and then all browsers should support IDB persistence of modules reliably. For eviction from storage, this can be controlled by the persistent storage API. For random failures, these should be pretty rare and analogous to a first page load; it should be pretty rare for a second load immediately after that to face the same issue.

@jfbastien
Copy link
Member

@littledan I'm making a very narrow argument from the current state of which browser ships what. From that current state of things, there's little difference in a world where HTTP and HTTPS don't act the same, whether normative or not, because already different browsers don't act the same and already there's random eviction.

Longer term (say, later this year) we'll figure out how to handle HTTPS in this spec. I just don't think an ad-hoc solution for this specific feature is right for the long term, because I'm not sure we'll get the big picture right (we might, I'm simply not confident that we will). Otherwise we'd already have that big picture in the spec.

@littledan
Copy link
Collaborator Author

We discussed structured clone of modules in the April 4th WebAssembly CG call. To summarize the discussion:

  • The same agent cluster restriction for Module serialization through postMessage was justified for the following reasons:
    • This restriction means that Modules are not sent cross-origin, which may be helpful as a security invariant
    • There weren't clear cross-agent-cluster use cases.
    • In some browsers, restricting to a single agent cluster makes for a simpler implementation
  • We did not address or come to a conclusion on the secure context restriction for storage of WebAssembly Modules. This will have to be discussed in the future.

I'll follow up with a patch which maintains the secure context restriction and follows the other restrictions listed above.

@annevk
Copy link
Member

annevk commented Apr 12, 2018

This restriction means that Modules are not sent cross-origin, which may be helpful as a security invariant

Not cross-site, but they can go cross-origin.

@littledan
Copy link
Collaborator Author

@annevk Are you talking about the postMessage case? Can you give an example/reference for more details about the cross-site/cross-origin differences and how it relates to agent clusters? We were having trouble in the last CG call.

@annevk
Copy link
Member

annevk commented Apr 12, 2018

@littledan if example.com embeds site.example.com through a frame they'll be in the same similar-origin window agent. That's because if they both set document.domain to example.com they can access each other directly.

Basically, an agent cluster by definition can consists of multiple origins, but those origins always have the same eTLD+1 (registrable domain).

@littledan
Copy link
Collaborator Author

Thanks for the explanation. Seems like we should discuss in the CG whether we want a same or similar origin restriction here.

@annevk
Copy link
Member

annevk commented Apr 12, 2018

@littledan that would require storing origin data on the Module object, which I would strongly recommend against (and have already done elsewhere). You want a capability-based model for objects, not tagging.

@littledan
Copy link
Collaborator Author

@annevk What do you think about this current patch, which stores the agent cluster if it's not forStorage (similar to SharedArrayBuffer, but for different reasons).

@annevk
Copy link
Member

annevk commented Apr 12, 2018

@littledan I think that's the correct approach. Though writing that down I do now recall that folks raised a concern in private with me about that approach combined with MessageChannel not working well in implementations for SharedArrayBuffer. I've asked them last year to report an issue for this, but that hasn't yet happened.

@littledan
Copy link
Collaborator Author

Apologies for my delay. Chrome is considering shipping same-agent-cluster postMessage but not IndexedDB serialization of WebAssembly modules. I've updated this PR to match that proposal. See 314099c for more information.

The [=serialization steps=], given |value|, |serialized|, and |forStorage|, are:

1. If |forStorage| is true, throw a
"<a exception>DataCloneError</a>" {{DOMException}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "{{DataCloneError}}" is more conventional.

Copy link
Member

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good if there was a MessageChannel test where port2 was transferred cross-site. Then invoke port1's postMessage(). Then transfer port2 back same-site. Then invoke port1's start(). And some variations on that theme.

sharing.

Note: The web platform does not permit cross-origin sharing of serialized Modules:
Agent clusters in HTML do not span multiple origins.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. It does not allow sharing when two origins's hosts are not https://url.spec.whatwg.org/#host-same-site or do not have the same scheme. Otherwise sharing is possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is also why we call it the similar-origin window agent, not just window agent.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything linkable that I could reference for "similar origin", or is this just something that's used in the air but not documents?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just removed the note.

…serialization

This restriction was non-sensical because there is practical no way to serialize
an agent cluster forStorage.
…odule deserialization"

The same agent cluster restriction is present in the case of serializing
within memory. Also add a note about how this restricts serialization to be
within an origin.
Based on discussions in the WebAssembly CG, it sounds like IndexedDB
serialization of WebAssembly Modules has the following problems:
- On some engines, it's impractical to serialize more than just the
  binary module data
- On some engines, upgrading from one version to another would cause
  the serialized module to be invalidated, with complicated implementation
  logic and a loss of reliability
- Some engines plan to just return an error for IndexedDB module
  serialization as a design decision

The purpose of IndexedDB Module serialization is for reliable performance,
and these items provide risks to that. This patch proposes that, initially,
IndexedDB serialization would not be provided for WebAssembly modules.
@littledan
Copy link
Collaborator Author

When do we want to consider landing this PR? Should it be in the v1 spec, or are we OK with waiting for v2? It's a significant new feature, so I'm not sure if it might reset some sort of clock. @lukewagner suggested that waiting for v2 might be fine, as most of the use cases for postMessage are motivated by threads, which are only coming in v2.

@Ms2ger Would you be interested in looking into the tests which @annevk suggested?

@binji
Copy link
Member

binji commented Jun 25, 2018

Should it be in the v1 spec, or are we OK with waiting for v2?

We discussed having this for v1 already, and only decided recently not to move forward with serialization for IndexedDB part. I think it makes sense to keep for v1.

It's a significant new feature, so I'm not sure if it might reset some sort of clock.

It will, yes, but we're past the point where that matters. We need to put out a new version regardless which will necessarily reset the clock.

@littledan
Copy link
Collaborator Author

OK, let's discuss this at the next CG meeting tomorrow and merge it if it seems OK

@binji
Copy link
Member

binji commented Jun 26, 2018

We discussed this in the June 26th CG meeting and decided to include this PR in v1 of the spec.

@littledan littledan merged commit f85f6e6 into WebAssembly:master Jun 27, 2018
@littledan
Copy link
Collaborator Author

@Ms2ger Could you make sure we have good tests for this patch in web-platform-tests? In particular, we should have tests that serializing in IndexedDB throws, and that serialization in postMessage and MessageChannel works.

@Ms2ger
Copy link
Collaborator

Ms2ger commented Oct 22, 2018

Looks like web-platform-tests/wpt#12488 covers this neatly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants