Start writing the bundling spec. #98

jyasskin · 2017-12-21T00:58:41Z

This fills a similar niche as MHTML, but

is random access,
efficiently encodes 8-bit payloads,
can represent multiple top-level resources,
can represent content-negotiated resources,
and possibly other advantages.

I believe that combined with signed exchanges, bundles fill all of the web packaging use cases except that they'll need a subsequent change to work for "Third-party security review", which will need a signature to cover groups of resources to enforce that their versions are consistent.

annevk

You state that "parsing MUST fail" in a bunch of cases, but there's also a lot of scanning and jumping going on. Are there multiple levels where things can fail? That is, if you happened to just parse this once without jumping around, you can't simply return failure for all errors but would instead have to throw away items? That's rather unclear.

annevk · 2018-03-23T12:01:52Z

draft-yasskin-dispatch-bundled-exchanges.md

+* Has exactly two keys starting with a ':' character, ':method' and ':url'.
+* *R*\[':method'] is an HTTP method defined as cacheable (Section 4.2.3 of
+  {{!RFC7231}}) and safe (Section 4.2.1 of {{!RFC7231}}), as required for
+  PUSH_PROMISEd requests (Section 8.2 of {{?RFC7540}}).


This is not really good enough for browsers I think. You need to have a safelist.

https://www.iana.org/assignments/http-methods/http-methods.xhtml is precise about the safe methods, but you have to click through to get cacheability defined, and the WebDAV methods leave it out. In an IETF draft, I'm pretty sure I can list the current methods that satisfy this (GET and HEAD), and in the browser-side loading living spec I can just list those expecting the list to get updated if any new ones get added.

I also note that Fetch uses "unsafe" without specifying it more precisely.

In the cache section, which is somewhat optional in a way.

annevk · 2018-03-23T12:02:17Z

draft-yasskin-dispatch-bundled-exchanges.md

+* *R*\[':method'] is an HTTP method defined as cacheable (Section 4.2.3 of
+  {{!RFC7231}}) and safe (Section 4.2.1 of {{!RFC7231}}), as required for
+  PUSH_PROMISEd requests (Section 8.2 of {{?RFC7540}}).
+* *R*\[':url'] is an absolute URI (Section 4.3 of {{!RFC3986}}).


It's unclear how to validate this in browsers. There's no such primitive.

I believe one identifies an absolute-URL string by running the URL parser with no base URL and then checking that it doesn't have a fragment.

Are you complaining that I'm not linking to the URL spec, that this isn't phrased as a parsing algorithm, both, or something else?

Can I go with all three? E.g., http://example.com/% is not an "absolute URI", but will parse per the URL parser.

annevk · 2018-03-23T12:03:01Z

draft-yasskin-dispatch-bundled-exchanges.md

+A valid request item *R* is interpreted as an HTTP request by interpreting
+*R*\[':method'] as the request's method (Section 4 of {{!RFC7231}}),
+*R*\[':url'] as the request's effective request URI (Section 5.5 of
+{{!RFC7230}}), and the remaining key/value pairs as the request's header fields.


Those key/value pairs are suitably constrained somehow? What's the parsing for them?

They're HTTP headers, so we should do whatever https://fetch.spec.whatwg.org/#http-network-fetch does.

Or are you worried about how claimed request headers are matched against browser request headers? That's not written yet. I was hoping to use the matching rules for PUSH_PROMISEs, but those aren't written yet either.

Well they're not quite the same as ordinary HTTP headers, since they are delimited differently, right? E.g., can you have a 0x0A byte in the value?

HTTP/2 headers are also delimited in a way that lets their values include a 0x0A byte. I wonder if browsers handle that correctly...

I suspect the right thing to do here is to explicitly create a header list from this, but since that concept doesn't exist in the IETF, I'll have to give up on the idea of making this an IETF draft.

I'm going to ping some folks to see if I'll hurt any feelings by doing that.

Oooh wow, I didn't know that about H/2. That's another reason to get a web-platform-tests setup.

annevk · 2018-03-26T07:11:51Z

(Note that I replied even in the feedback GitHub considers "outdated".)

annevk · 2018-03-26T07:14:01Z

Are you going to leave it to Fetch to drop the response body for a HEAD request? Or will that be an error to codify? It seems to me that you'd want to forbid HEAD to be encoded and only support it on the retrieval side and do the appropriate translations to GET minus response body there.

jyasskin

Sorry for skipping the "Are there multiple levels where things can fail?" question earlier. My plan is that there are 2 levels where things can fail:

The browser parses the metadata (https://jyasskin.github.io/webpackage/bundles/draft-yasskin-dispatch-bundled-exchanges.html#load-metadata), and if anything fails there, the whole bundle is invalid.
Then for any response the browser actually wants to use, it parses that particular response (https://jyasskin.github.io/webpackage/bundles/draft-yasskin-dispatch-bundled-exchanges.html#load-response). If that fails, only that particular response is invalid, but the bundle can still contain other responses.

Thanks for saying that's not adequately clear; I'll try to fix it.

I hadn't thought about how exactly to constrain HEAD. If you think the right approach is to treat it as an automatic transformation from GET, I'll do that.

None of this is done yet. I'll comment again when it is.

jyasskin · 2018-03-26T23:51:44Z

draft-yasskin-dispatch-bundled-exchanges.md

+A valid request item *R* is interpreted as an HTTP request by interpreting
+*R*\[':method'] as the request's method (Section 4 of {{!RFC7231}}),
+*R*\[':url'] as the request's effective request URI (Section 5.5 of
+{{!RFC7230}}), and the remaining key/value pairs as the request's header fields.


HTTP/2 headers are also delimited in a way that lets their values include a 0x0A byte. I wonder if browsers handle that correctly...

I suspect the right thing to do here is to explicitly create a header list from this, but since that concept doesn't exist in the IETF, I'll have to give up on the idea of making this an IETF draft.

I'm going to ping some folks to see if I'll hurt any feelings by doing that.

annevk · 2018-03-27T07:09:31Z

(Another thought I had was to take inspiration from the Cache API, as sort of the API surface on top of this format. The Cache API only supports GET thus far, but it does have Vary support. Haven't fully explored whether it's a complete fit, but it's interesting to think about as a way to avoid introducing too many new primitives.)

jyasskin · 2018-05-21T22:09:36Z

I believe this is finally ready for a re-review. It's missing 3 things that are going to be important, but that I'd like to do in a subsequent patch:

The algorithm to take the request a browser would generate, e.g. with multiple values in its Accept header, and find a matching request inside a bundle, where requests will generally only have 1 value in their Accept headers.
Compression: the index can probably just be brotli-compressed, but response headers should be able to take advantage of a shared dictionary.
Signatures covering multiple resources: this both saves on signature verification costs and ensures that resources have matching versions.

kinu

Just skimmed through, had some nit comments / questions (non blocking).

kinu · 2018-05-22T09:45:58Z

draft-yasskin-dispatch-bundled-exchanges.md

+
+## Stream attributes and operations {#stream-operations}
+
+* A sequence of **available** bytes. As the stream delivers bytes, these are


nit: should bolding be applied to available bytes ?

kinu · 2018-05-22T09:46:03Z

draft-yasskin-dispatch-bundled-exchanges.md

+   the 4-item array initial byte and 8-byte bytestring initial byte, followed by
+   🌐📦 in UTF-8), return an error.
+
+1. Let `sectionOffsetsLength be the result of getting the length of the CBOR


nit: sectionOffsetsLength (missing the closing `)

kinu · 2018-05-22T10:01:44Z

draft-yasskin-dispatch-bundled-exchanges.md

+      + offset`. That is, offsets in the index are relative to the start of the
+      "responses" section.
+   1. If `offset + length` is greater than
+      `section-offsets\["responses"].length`, return an error.


nit: extra \

kinu · 2018-05-22T10:04:29Z

draft-yasskin-dispatch-bundled-exchanges.md

+
+1. Set `metadata`'s "manifest" item to `url`.
+
+### Parsing the critical section {#critical-section}


Wasn't fully unsure what could come here?

If we add more sections in the future, and the publisher wants to make sure parsers don't just ignore one of those sections because they don't understand it, this section points out the ones not to skip.

I'm not certain this is the right way to do this. PNG does it with a particular capitalization in the section names: https://tools.ietf.org/html/rfc2083#page-13

kinu · 2018-05-22T10:05:37Z

draft-yasskin-dispatch-bundled-exchanges.md

+struct to fill in, the parser MUST do the following:
+
+1. Let `critical` be the result of parsing `sectionContents` as a CBOR item
+   matching the above `critical` rule ({{parse-cbor}}. If `critical` is an


nit: missing ) after {{parse-cbor}}

kinu · 2018-05-22T10:13:14Z

draft-yasskin-dispatch-bundled-exchanges.md

+   TODO: Add the rest of the details of creating a `ReadableStream` object.
+
+1. Let `response` be a new response ({{FETCH}}) whose:
+   * Url list is `request`'s url list,


"Parsing the index section" seems to say request has one url, do you mean it could have a list / redirect chain?

This is just an asymmetry in how requests and responses default their values: https://fetch.spec.whatwg.org/#concept-request-url-list defaults from the request's url, while https://fetch.spec.whatwg.org/#concept-response-url is hard-coded to point to the response's url list. I believe both URL lists should always have 1 value for exchanges in the bundle.

I suspect Fetch will actually wind up copying or extending this response when serving browser-generated requests from bundles, so that a redirect into a bundle can work correctly.

jyasskin · 2018-05-22T18:16:11Z

draft-yasskin-dispatch-bundled-exchanges.md

+and {{semantics-load-response}} each of which can return an error instead of
+their normal result.
+
+## Stream attributes and operations {#stream-operations}


@domenic, here's the use of streams I mentioned on #whatwg. I'm thinking about re-doing it in terms of ReadableStream despite your advice because https://jyasskin.github.io/webpackage/bundles/draft-yasskin-dispatch-bundled-exchanges.html#load-response needs to create a body's stream, which is defined as a ReadableStream. The stream a bundle's parsed from is likely to be a response body anyway, so it'll probably be a ReadableStream even if I pretend it isn't.

What do you think?

annevk

I still find it a little hard to digest how the whole thing fits together, but I tried to review regardless.

annevk · 2018-05-23T10:21:48Z

draft-yasskin-dispatch-bundled-exchanges.md

+
+--- abstract
+
+Bundled exchanges provide a way to bundle up groups of HTTP request+response


request-response?

I was trying to emphasize that it's both. A hyphen doesn't feel quite right because "request" doesn't modify "response". A slash feels like it might be alternation, although https://infra.spec.whatwg.org/#pair uses it the way I intend here.

annevk · 2018-05-23T10:22:25Z

draft-yasskin-dispatch-bundled-exchanges.md

+Bundled exchanges provide a way to bundle up groups of HTTP request+response
+pairs to transmit or store them together. The component exchanges can be signed
+using {{?I-D.yasskin-http-origin-signed-responses}} to establish their
+authenticity.


I'd leave this out, since the signing is controversial and not required.

I don't intend to shy away from the controversial bits, but you're right that it's not required, so maybe it doesn't belong in the abstract.

I do expect to add a section to bundles to efficiently encode multi-resource signatures with the same semantics as multiple signed-exchanges, at which point it'll come back to the abstract.

annevk · 2018-05-23T10:22:45Z

draft-yasskin-dispatch-bundled-exchanges.md

+
+Discussion of this draft takes place on the ART area mailing list
+(art@ietf.org), which is archived
+at <https://mailarchive.ietf.org/arch/search/?email_list=art>.


It seems discussion also takes place on GitHub?

I think that's covered by the "issues list" comment in the next paragraph. HTTPWG drafts have similar text, and also discuss on GitHub issues.

annevk · 2018-05-23T10:23:39Z

draft-yasskin-dispatch-bundled-exchanges.md

+## Terminology
+
+Exchange (noun)
+: An HTTP request/response pair. This can either be a request from a client and


I'd pick one separator and stick with it.

Indeed, thanks.

annevk · 2018-05-23T10:26:16Z

draft-yasskin-dispatch-bundled-exchanges.md

+  error or because the stream has a finite length.
+* A **seek** operation to change the current offset, relative to either the
+  beginning of the available bytes or to the old current offset. A seek past the
+  end of the available bytes is an *error in this specification*.


"error in this specification" doesn't appear to mean anything. Wouldn't it be better to say that seek or read can return an error and that callers of them need to account for that?

I intended the same meaning as in https://infra.spec.whatwg.org/#assertions, but looking at the uses of "wait", "seek", and "read", every seek and read except the ones to the start of the stream was immediately preceded by a wait, so I may as well incorporate the wait into the other operations.

annevk · 2018-05-23T10:26:47Z

draft-yasskin-dispatch-bundled-exchanges.md

+
+requests
+
+: A map ({{INFRA}}) whose keys are the HTTP requests ({{FETCH}}) for the


What Fetch defines as "requests" are not necessarily HTTP requests. They're more broad.

What's the best way to say that the keys here come from the subset of Fetch "requests" that are HTTP requests? I don't think it makes sense to include the other schemes mentioned in https://fetch.spec.whatwg.org/#url.

Even those are more broad as they carry fields HTTP requests don't have. I guess you could say requests whose url list contains a single URL whose scheme is an HTTP(S) scheme (unless this is restricted to secure contexts?).

I've removed the attempt to subset the kinds of Fetch requests this operation can return. If it would noticeably simplify the request-matching function I need to write, I might add the subsetting back later, but I don't really expect that.

annevk · 2018-05-23T10:28:39Z

draft-yasskin-dispatch-bundled-exchanges.md

+{{semantics-load-metadata}}, while a client will generally want to load the
+response for a request that the client generated. This specification does not
+define how a client determines the best available bundled response, if any, for
+that client-generated request.


Why not? We define this for the Cache API, I think ideally we use the same algorithm here.

Clearly we're going to have to define that algorithm somewhere. I don't have a strong opinion between here and the WICG/Fetch side, but I was leaning toward WICG/Fetch, and would definitely prefer to do it in a separate PR even if you prefer this spec.

Unfortunately, https://w3c.github.io/ServiceWorker/#query-cache-algorithm assumes that varied headers will be identical between the cached request and the queried request, which isn't plausible for a bundle intended to be used by more than one UA.

For example, imagine that Greta's UA sends Accept-Language: de-DE,de,en;q=0.9. In the Cache API, that's exactly the request header that'll be cached since the Cache API uses https://fetch.spec.whatwg.org/#concept-fetch to fill it in. Similarly, if Hilda's UA sends Accept-Language: de-AT,de;q=0.9, that's what her cache will contain. If the server wants to put a de resource in a bundle to serve both users, what Accept-Language header should they write? There isn't one that Query-Cache will find for both people. The same problem shows up for Accept across browsers instead of people. So we need a new algorithm.

annevk · 2018-05-23T10:29:36Z

draft-yasskin-dispatch-bundled-exchanges.md

+
+The bundle is roughly a CBOR item ({{?I-D.ietf-cbor-7049bis}}) with the
+following CDDL ({{?I-D.ietf-cbor-cddl}}) schema, but bundle parsers are required
+to successfully parse some byte strings that aren't valid CBOR. For example,


Is there still use in CBOR then? Doesn't the MIME type lie with +cbor if you cannot actually use a CBOR parser?

I'm primarily using CBOR instead of a totally custom binary format because 1) ASCII/UTF-8 formats like JSON are nice because you can read them without custom parsers, and 2) we need a binary format for this, but there are likely to be generic tools to dump CBOR to a readable format, so that preserves as much of the ASCII benefit as possible.

I would actually like to say that bundles must be well-formed CBOR items, but because most parsers won't actually work that way, I expect non-CBOR bundles to wind up existing in the wild. Do you think I should say that clients MAY reject bundles that aren't a valid CBOR item? Or something else?

annevk · 2018-05-23T10:31:09Z

draft-yasskin-dispatch-bundled-exchanges.md

+   `knownSections` says not to process other sections, add those sections' names
+   to `ignoredSections`.
+
+1. Let `metadata` be a struct ({{INFRA}}) with no items.


The idea with structs is that they are immutable.

I think you mean that the set of item names is immutable. It's simple to change this to a map, so I've done that. The set of keys is only not-fixed because I wanted to let other specifications add section names that might define new metadata, but I'm not totally sure that's a good idea, so this might go back to a struct later.

annevk · 2018-05-23T10:35:41Z

draft-yasskin-dispatch-bundled-exchanges.md

+      1. Continue.
+   1. If `name` or `value` doesn't satisfy the requirements for a header in
+      {{FETCH}}, return an error.
+   1. If `headers` contains ({{FETCH}}) `name`, return an error.


You cannot have multiple headers of the same name? Might be worth calling out you cannot encode cookies.

I believe multiple Cookie headers would be encoded by joining them with semicolons (not that you'd generally put cookies in bundled requests), but yeah, Set-Cookie has no single-field form. Noted.

(This line is actually redundant with the fact that CBOR maps exclude duplicate keys, so I've replaced it with an Assert.)

annevk · 2018-05-24T05:04:59Z

Do you think I should say that clients MAY reject bundles that aren't a valid CBOR item? Or something else?

I think we want it to be fully deterministic across all conforming (browser) implementations how they would handle all possible sequences of bytes. I.e., at least as good as the HTML parser.
I don't think we should use +cbor if that isn't always the case or required to be the case.

Re: "best available bundled response" I'd prefer it as an open issue if you plan on addressing it rather than it saying it's implementation-defined.

(Note that we might get rid of pairs still: whatwg/infra#127.)

And tweak a couple definitions to let us parse from a stream without implementing a streaming CBOR parser.

thing fits together.

Since clients parse by following byte offsets, they won't enforce that the format is CBOR, which means some instances of it probably won't be. Putting +cbor in the MIME type would be misleading.

jyasskin · 2018-05-24T22:57:32Z

I haven't added the MAY.
I've removed the +cbor from the MIME type.
I marked the "best available bundled response" as a TODO.

Thanks!

Load Metadata will only return a subset of all possible requests, but it doesn't seem important to specify that subset precisely here.

jyasskin · 2018-05-29T20:56:25Z

I'm going to merge this so it's easier to refer to and so I can start sending PRs to modify it, but please feel free to keep commenting and filing issues against it.

jyasskin requested review from nyaxt, mrdewitt and KenjiBaheux December 21, 2017 00:58

jyasskin force-pushed the bundles branch 2 times, most recently from c464efb to 43462ae Compare December 22, 2017 16:56

jyasskin force-pushed the bundles branch from 43462ae to 2f65886 Compare January 16, 2018 22:45

This was referenced Jan 25, 2018

Officially rename webpackage to another name. #113

Closed

Web Packaging Format mozilla/standards-positions#29

Closed

Adding web packaging mozilla/standards-positions#53

Merged

jyasskin force-pushed the bundles branch 3 times, most recently from 42a3c2f to 2b6e2bb Compare February 27, 2018 17:55

annevk reviewed Mar 23, 2018

View reviewed changes

jyasskin force-pushed the bundles branch from 2b6e2bb to 6ce8cac Compare March 24, 2018 23:26

jyasskin commented Mar 26, 2018

View reviewed changes

BigBlueHat mentioned this pull request Apr 11, 2018

Internal referencing and internal only resources #177

Open

jyasskin force-pushed the bundles branch 6 times, most recently from 3a78b8b to f281570 Compare May 17, 2018 18:47

jyasskin force-pushed the bundles branch 2 times, most recently from 60842c5 to ee384a4 Compare May 21, 2018 21:47

kinu reviewed May 22, 2018

View reviewed changes

jyasskin commented May 22, 2018

View reviewed changes

jyasskin mentioned this pull request May 22, 2018

Signed Exchanges w3ctag/design-reviews#235

Closed

5 tasks

annevk reviewed May 23, 2018

View reviewed changes

jyasskin force-pushed the bundles branch 2 times, most recently from a9f8f25 to 03a5f78 Compare May 23, 2018 23:11

jyasskin added 7 commits May 24, 2018 15:47

Start writing the bundling spec.

01d642d

Specify the current set of safe+cacheable methods, per annevk.

1e273a5

Make the bundles spec more precise.

9ca2c04

And tweak a couple definitions to let us parse from a stream without implementing a streaming CBOR parser.

Fix annevk's comments.

d02b624

Fix a couple nits that kinu found.

60d7045

Fix annevk's comments.

03af55d

Try to improve the introductions so it's easier to digest how the whole

8679275

thing fits together.

jyasskin force-pushed the bundles branch from 80d01b9 to 8679275 Compare May 24, 2018 22:47

jyasskin added 2 commits May 24, 2018 15:56

Clearly say a spec needs to define request matching.

976d133

Remove +cbor from the MIME type.

07d75d4

Since clients parse by following byte offsets, they won't enforce that the format is CBOR, which means some instances of it probably won't be. Putting +cbor in the MIME type would be misleading.

The Fetch spec defines requests instead of HTTP requests.

69a0512

Load Metadata will only return a subset of all possible requests, but it doesn't seem important to specify that subset precisely here.

jyasskin merged commit f3459fa into WICG:master May 29, 2018

jyasskin deleted the bundles branch December 1, 2018 00:34


		## Stream attributes and operations {#stream-operations}

		* A sequence of available bytes. As the stream delivers bytes, these are


		1. Set `metadata`'s "manifest" item to `url`.

		### Parsing the critical section {#critical-section}


		--- abstract

		Bundled exchanges provide a way to bundle up groups of HTTP request+response


		requests

		: A map ({{INFRA}}) whose keys are the HTTP requests ({{FETCH}}) for the

Start writing the bundling spec. #98

Start writing the bundling spec. #98

Conversation

jyasskin commented Dec 21, 2017 • edited Loading

annevk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

annevk Mar 26, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

annevk commented Mar 26, 2018

annevk commented Mar 26, 2018

jyasskin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

annevk commented Mar 27, 2018

jyasskin commented May 21, 2018

kinu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jyasskin May 22, 2018 • edited Loading

Choose a reason for hiding this comment

annevk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

annevk commented May 24, 2018

jyasskin commented May 24, 2018

jyasskin commented May 29, 2018

jyasskin commented Dec 21, 2017 •

edited

Loading

annevk Mar 26, 2018 •

edited

Loading

jyasskin May 22, 2018 •

edited

Loading