-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Manifest processing should not be a function of document URL #668
Comments
I think it would be useful to determine how common "versioned" manifests are before making this change, since this will cause problems for existing sites. One example of a site that will be affected is weather.com, which has manifests at "versioned" URLs such as https://weather.com/weather/assets/manifest.507fcb498f4e29acfeed7596fe002857.json. If the manifest URL uniquely identifies a PWA then a "new" PWA will be created every time weather's manifest is deployed or moved. More generally, I'm concerned that the realization that different manifest URLs correspond to different PWAs is not obvious. Also, no manifest validator/linter will be able to identify the problem with complete accuracy, and even heuristic based approaches are awkward (see e.g. GoogleChrome/lighthouse#2570). The failure mode is rather unfortunate too: sites unveiling a new PWA at an (incorrectly) versioned manifest URL will only discover the problem when the manifest is updated and deployed, a situation QA processes are likely to miss. |
There are two problems. The versioned manifest is not the problem that we're dealing with here.
This is already a problem (Problem 1), and my proposed change doesn't make it any worse. Right now, an app is uniquely identified by the tuple (manifest URL, document URL), so if the manifest URL changes, it's a new app. My proposal is to remove document URL from that tuple. It will still be a problem that if the manifest URL changes, it's a new app. I don't want to talk too much about Problem 1 here. I agree it's a problem but we're talking about it elsewhere (GoogleChrome/lighthouse#2570, GoogleChromeLabs/gulliver#323). For Problem 2: I noticed weather.com also doesn't have a
(For my future reference, the Chromium code which does this is in installable_manager.cc where it sets START_URL_NOT_VALID). Given that this is the case, it seems it would not be breaking any existing expectations if we made |
I am OK with this change. I don't think this should be any issue for the Windows support either @boyofgreen |
I must imagine (without checking) that Windows support also ignores both of the rules relating to document URL, since they wouldn't have a canonical document URL for their Microsoft Store PWAs. So this change would only be making the spec more in line with Microsoft's (and Chrome's) implementation. @boyofgreen |
I'd be in favor of this change. @mgiuca, could you draft it up? |
Working on it! |
@mgiuca Ah, I didn't realize weather.com was no longer triggering any sort of A2HS prompt; that definitely worked in the past (they were showcased in the PWA tent at Google I/O last year) but either Chrome's behavior has changed, or In any case, I agree with this change, and I take your point that this issue mostly separate to the question of how PWAs should be uniquely identified (#586). |
Now the steps for processing a manifest do not take a document URL. This is necessary as manifests are often processed independent of a document, and the interpretation of a manifest should not depend upon the document it was included from. Normative change: If the start_url is not same-origin as the document URL, the entire manifest is rejected (as part of obtaining a manifest from a document, rather than processing a manifest), rather than getting a default start_url. Closes w3c#668
I've put up a branch (but I can't make a PR until #670 lands; we should continue discussion on #669): This one is also a breaking change and may require some discussion. The break is that before, if the start URL did not match the document URL, the start URL (and likely scope) would be set to their default values, and manifest succeeds. Now, the entire manifest fails to load (in the "obtaining a manifest" steps). This is necessary in order to make the same-origin-as-document check not intrinsic to processing the manifest (allowing the manifest to be processed independent of the document), but rather make it just part of the process of loading a manifest from an HTML page. We can't really just alter the I wonder what Chrome for Android does in this case (since the WebAPKs are built from the manifest, independent of the document URL, they clearly don't alter them to a default start_url if the origin doesn't match --- I wonder if they fail or if they simply allow a cross-origin PWA installation). |
Now the steps for processing a manifest do not take a document URL. This is necessary as manifests are often processed independent of a document, and the interpretation of a manifest should not depend upon the document it was included from. Normative change: If the start_url is not same-origin as the document URL, the entire manifest is rejected (as part of obtaining a manifest from a document, rather than processing a manifest), rather than getting a default start_url. Closes w3c#668
Now the steps for processing a manifest do not take a document URL. This is necessary as manifests are often processed independent of a document, and the interpretation of a manifest should not depend upon the document it was included from. Normative change: If the start_url is not same-origin as the document URL, the entire manifest is rejected (as part of obtaining a manifest from a document, rather than processing a manifest), rather than getting a default start_url. Closes w3c#668
Now the steps for processing a manifest do not take a document URL. This is necessary as manifests are often processed independent of a document, and the interpretation of a manifest should not depend upon the document it was included from. Normative change: If the start_url is not same-origin as the document URL, the entire manifest is rejected (as part of obtaining a manifest from a document, rather than processing a manifest), rather than getting a default start_url. Closes w3c#668
Now the steps for processing a manifest do not take a document URL. This is necessary as manifests are often processed independent of a document, and the interpretation of a manifest should not depend upon the document it was included from. Normative change: If the start_url is not same-origin as the document URL, the entire manifest is rejected (as part of obtaining a manifest from a document, rather than processing a manifest), rather than getting a default start_url. Closes w3c#668
This (the dependence on the document) goes up a bit higher than I previously thought. I'm trying to update the spec to basically allow a web app to be installed directly through its manifest URL (not from within a document context at all), which is how Microsoft Store PWAs work, as well as Google's WebAPKs, and any future "install app directly from a store page" type of installation. Currently, this isn't really allowed. In addition to the "steps for processing a manifest" (fixed above):
I think the above algorithms can be rewritten in the following way:
These direct-install user agents can thus use the non-document versions of these algorithms. |
Now the steps for processing a manifest do not take a document URL. This is necessary as manifests are often processed independent of a document, and the interpretation of a manifest should not depend upon the document it was included from. Normative change: If the start_url is not same-origin as the document URL, the entire manifest is rejected (as part of obtaining a manifest from a document, rather than processing a manifest), rather than getting a default start_url. Closes w3c#668
Now the steps for processing a manifest do not take a document URL. This is necessary as manifests are often processed independent of a document, and the interpretation of a manifest should not depend upon the document it was included from. Normative change: If the start_url is not same-origin as the document URL, the entire manifest is rejected (as part of obtaining a manifest from a document, rather than processing a manifest), rather than getting a default start_url. Closes w3c#668
This was redundant, because a) the start_url is already checked to be same-origin-as-document, and b) the start_url is checked to be within scope of scope (which implies they are same-origin). Thus the scope is guaranteed to be same-origin as the document without this explicit check. This removes the document URL parameter to the scope algorithm. No normative changes. Work towards fixing #668.
Now the steps for processing a manifest do not take a document URL. This is necessary as manifests are often processed independent of a document, and the interpretation of a manifest should not depend upon the document it was included from. Normative change: If the start_url is not same-origin as the document URL, the entire manifest is rejected (as part of obtaining a manifest from a document, rather than processing a manifest), rather than getting a default start_url. Closes w3c#668
Now the steps for processing a manifest do not take a document URL. This is necessary as manifests are often processed independent of a document, and the interpretation of a manifest should not depend upon the document it was included from. Normative change: If the start_url is not same-origin as the document URL, the entire manifest is rejected (as part of obtaining a manifest from a document, rather than processing a manifest), rather than getting a default start_url. Closes w3c#668
Now the steps for processing a manifest do not take a document URL. This is necessary as manifests are often processed independent of a document, and the interpretation of a manifest should not depend upon the document it was included from. Normative change: If the start_url is not same-origin as the document URL, the entire manifest is rejected (as part of obtaining a manifest from a document, rather than processing a manifest), rather than getting a default start_url. Closes w3c#668
Now the steps for processing a manifest do not take a document URL. This is necessary as manifests are often processed independent of a document, and the interpretation of a manifest should not depend upon the document it was included from. Normative change: If the start_url is not same-origin as the document URL, the entire manifest is rejected (as part of obtaining a manifest from a document, rather than processing a manifest), rather than getting a default start_url. Closes w3c#668
This was redundant, because a) the start_url is already checked to be same-origin-as-document, and b) the start_url is checked to be within scope of scope (which implies they are same-origin). Thus the scope is guaranteed to be same-origin as the document without this explicit check. This removes the document URL parameter to the scope algorithm. No normative changes. Work towards fixing w3c#668.
I agree that the current manifest specification assumes installing a web app from a document but that was a design choice, it didn't have to be specified that way. That design choice unfortunately means that some implementations (e.g. app stores, mobile device management, kiosks) which already use web app manifests today can never be compliant with the specification.
That's a good question. The web app manifest specification differs from other web specifications in that by its nature it can not be limited to just browsers, because it can not be implemented entirely inside a browser engine. It requires interactions between a browser engine, browser chrome, the underlying OS and in some cases external services like app stores and sync services. My understanding of Progressive Web Apps is that they are all about web apps escaping web browsers. Restricting the scope of the specification to the context of a web browser would seem to fail to meet that primary use case. I therefore agree in principle with the statement "manifest processing should not be a function of document URL", though in practice untangling the two at this stage would be challenging. Proposals like the proposed approach to unique IDs would further complicate this. |
This may be a naive question, but would it be possible to make the document optional so that the parser takes it into consideration if it's available but doesn't if it doesn't exist? Would that be too complicated? |
Also… in store/catalog contexts, I would think a |
The parsing aspect is not complicated, no. It's all the stuff that happens around it that gets complicated. That's why we presume this is coming form a document: because link relationships are from Document (and security/environment) -> linked thing (manifest), but never treated as independent.
Yes. I might make sense to do that. The stores can enforce that policy, while advising developers to also treat a manifest as an totally independent resource (not as a link relationship of a document). |
@benfrancis wrote:
Note that this effort predates that write up, and I don't think there was every wholesale endorsement of PWAs as outlined in that blog post by the standards community (or browser vendors at large). As aspirational a guide as that blog post has been to the community, there are a lot of dissenting opinions, and in some cases, outright rejections, of what is proposes. Without digressing too much, from the very beginning, the way we designed the "progressive" nature of this particular specification was that developers could add as few or as many manifest members as they wanted in their manifest, without any strict requirement on what they must include.
Agree. And at this point in this nearly decade long journey, I'd really just like to get the unified base set of functionality standardized. It is, admittedly, a somewhat impoverished subset of the functionality web developers need - but I'm optimistic that we can continue to evolve this specification, along with implementations, in unison. But I really just want to get the core of it "done" - so we can then incrementally standardize little enhancements to make everyone's lives better. |
The reality is a document is required to install an app. Sure, you can INDEX an app (it's name, icons, etc) but to me that is not INSTALLING an app. The installation of an app requires a document. It often requires a service worker, fetching of js, css, caching pages, numerous other stages during the install lifecyle of the service worker. To really separate the document from the installation process - wouldn't the manifest file need to link to all the assets needed to install the app? i.e. is this issue trying to help address packaging of an app, and once packaged not really "needing" a document any longer? It seems to me the fact remains a document is needed to practically package and install an app. You could "index" it, but what does that really give you? You've indexed a bunch of icons, name, but not really done anything practical. A document is an integral part of the manifest to describe where an app sits, how it can be accessed and installed? Maybe I'm misunderstanding the intention here though. |
I was thinking about this very thing. For digital catalogs, the So, to go back to @mgiuca’s initial recommendation (and this PR), I’m all for making |
I personally don't agree with making it required, for the reasons I already stated. But I'd be totally ok with making |
I'm revising this again, and, as the spec level, I just don't think it's possible to separate them right now. That's not to say that auxiliary crawlers, app stores, and other software can't treat the manifest as independent (they can and should, and that's fine); but at a spec level (and for the purpose of conforming "user agents") the manifest is inherently tied to its client window and document for the reasons I mentioned above. Let's continue to stick to "browser land" for the purpose of conformance, as otherwise the scope of the problem becomes too large. We can continue to revise this after we at least standardize how the three existing engines handle web manifests. |
I want to re-open this as it's come up internally (at Google). I wrote a doc explaining the rationale, the proposed change, and potential issues. It's quite messy at the moment though including a bunch of unrelated changes but I wanted to restart this conversation. https://docs.google.com/document/d/1NUsoTWkWrPFFNnxR00r1NF-MXzq-_rjl2ldAhZm-TEw/edit I will follow up with a less messy proposal after a bit of informal discussion. (I'm also not 100% convinced that we should do this any more after @philloooo pointed out that from a security perspective, we don't necessarily want manifests to be able to "appify" a site on another origin without that site linking back to the manifest. But if we decide not to do this on those grounds, we can properly document that as I don't think this objection has been raised before.) |
I'm pleased to see this reopened because this is still my biggest gripe with the Manifest specification. Not being able to install a web app directly from a manifest URL rules out lots of really interesting use cases (e.g. app stores, digital signage, mobile device management) where implementers have no choice but to either go against the specification or jump through strange hoops like generating fake documents or fetching and parsing an HTML document just to find a manifest URL. To be honest I had thought that the I hope that you can untangle this in Chromium, because being backwards compatible with Chromium desktop historically using If this can't be achieved, I wonder if it's worth considering a version 2.0 of Manifest which breaks backwards compatibility and allows the installation of web apps directly from a manifest URL (e.g. by using manifest URL as a default identifier rather than |
This was discussed in a call yesterday with @marcoscaceres and @dmurph. The main idea of being able to process a manifest without a document is generally agreed as something we'd like to do, with good use cases behind it. But it's a little messy. There was disagreement on what to do if there's no I think a reasonable compromise is to let the existing behaviour stand, but require Before we get to that, we need to refactor the algorithm so it takes URLs instead of HTML objects, so that it's at least theoretically callable without a document URL. That part is uncontroversial so I will get started on it now.
That's right. I think that, given the "
I agree that it's created a bunch of unnecessary complexity. If I was designing it from scratch, I'd have made
I don't think it's worth breaking backwards compat over this. It's a minor historical quirk that we can work around. |
Replaces the link and response parameters with document URL and manifest URL. These parameters were only used to get the document URL and manifest URL, respectively, so it doesn't make sense to accept the much bulkier HTML objects. This was limiting the ability to call the processing algorithm from outside an HTML document context, which is a future direction we wish to explore. Note that the only call to this algorithm is in the HTML spec, which needs to be updated simultaneously to use the new interface. Pre-work for w3c#668.
Another point: the most compelling reason we identified yesterday (I believe @dmurph raised) is privacy-sensitive: when performing these "automated" installs without user supervision (e.g., installing from sync, admin policy or a default app installed by the user agent), there is a privacy issue in sending a request to the server of every app being installed. If we can install from a manifest without loading a whole document, it allows us to have the manifest body - and its icons - be the thing distributed (e.g. over sync) rather than its URL, and install it offline without hitting the server. (Unfortunately, this won't allow us to have a service worker cached, without Web Bundles, but at least we can have the app icon registered with the system.) |
Spun off the first step in this process to a separate issue: #1068 (this doesn't remove the document URL, but it means we won't have to load the document before triggering install). We can continue to discuss making the document URL optional on this issue. |
- Fixed call to "process the id member" (removed the document URL argument which is not actually accepted by that algorithm). - Removed unnecessarily complex for loops over ~2 members when it's more readable to just have a separate step for each member. - Use Respec-style syntax instead of HTML. - Moved default values into individual processing steps. This keeps the relevant info about the default values of members closely related to the other relevant material about that member. It's also consistent with how the rest of the members (e.g. scope) treat default values. And prevents possible errors where the incorrect default value is used by an intermediate step in between assigning the default and assigning the actual value. Pre-work for #668.
Replaces the link and response parameters with document URL and manifest URL. These parameters were only used to get the document URL and manifest URL, respectively, so it doesn't make sense to accept the much bulkier HTML objects. This was limiting the ability to call the processing algorithm from outside an HTML document context, which is a future direction we wish to explore. Note that the only call to this algorithm is in the HTML spec, which needs to be updated simultaneously to use the new interface. Pre-work for w3c#668.
Replaces the link and response parameters with document URL and manifest URL. These parameters were only used to get the document URL and manifest URL, respectively, so it doesn't make sense to accept the much bulkier HTML objects. This was limiting the ability to call the processing algorithm from outside an HTML document context, which is a future direction we wish to explore. Note that the only call to this algorithm is in the HTML spec, which needs to be updated simultaneously to use the new interface. Pre-work for w3c#668.
… document (#1069) * Processing the manifest: Simplify the interface. Replaces the link and response parameters with document URL and manifest URL. These parameters were only used to get the document URL and manifest URL, respectively, so it doesn't make sense to accept the much bulkier HTML objects. This was limiting the ability to call the processing algorithm from outside an HTML document context, which is a future direction we wish to explore. Note that the only call to this algorithm is in the HTML spec, which needs to be updated simultaneously to use the new interface. Pre-work for #668. * Added new normative text (with a non-normative note) allowing user agents to invoke the processing steps without a document, provided that they supply a valid document URL. * Reword non-normative note. * Correctly link. (Actually this makes the wrong link, but it's a respec bug: https://github.com/w3c/respec/issues/4435 * Added a SHOULD to set CORS correctly. Note: There's a reference error here because HTML doesn't export a term. I'm getting it exported. * Move all this text to its own section; it's getting a bit much. * Use variables to avoid repeating complex sentences. * Rewrote processing without a document section for clarity and correctness. - Changed MUST into a SHOULD. We can't really expect all uses to directly verify this. - Removed the "or" clause that the document be same-origin as manifest; you still want a link from the document to the manifest either way. - Added "at least at some point in the past", to acknowledge that you don't need to verify this at install time, just whenever you did the caching. - Clarify that the CORS request is only needed if the manifest is not same-origin as the document. * Apply suggestions from code review Co-authored-by: Marcos Cáceres <marcos@marcosc.com> * Respond to review. --------- Co-authored-by: Marcos Cáceres <marcos@marcosc.com>
@marcoscaceres @mgiuca I note that the commit message in 22a0b1e describes that change as "pre-work" for this issue. Since the algorithm for processing a manifest still takes document URL as an input, should this issue not still be open? I'm also curious what the next step would be for separating these out, since the current specification is very much dependent on a document URL. Perhaps a version 2.0? |
The steps for processing a manifest are currently defined as a function of both manifest URL and document URL:
This means that the entire identity of the app is uniquely determined by the tuple (text, manifest URL, document URL). Though practically, since text can be derived from manifest URL, it means just the pair (manifest URL, document URL).
The problem is document URL. The app can be interpreted differently depending on which document URL the manifest was referenced by. In reality, the document URL is only used in two places:
"start_url"
."scope"
is same-origin as document URL.I maintain that the processing of a manifest should not be a function of the document URL. This means the app will potentially behave differently depending on which page the user was on when they installed the app, and we cannot meaningfully index or install an app from outside of a document (as Microsoft is currently doing).
We should:
"start_url"
, and require"start_url"
."scope"
is same-origin as document URL outside of the steps for processing the manifest (so if that check fails, it means that the manifest is not valid for that page, as opposed to being an invalid manifest).This means that the semantics of the manifest becomes a pure function of the manifest URL itself (and of course the manifest text), and thus it will behave the same no matter where you install it from.
Making the above change 1 will technically be a breaking change, but since sites can't rely on the default
"start_url"
(since it's dynamic) this should not be able to break any expectations.The text was updated successfully, but these errors were encountered: