document.open() URL updates can cause recursive loading problems on history traversal #6556

domenic · 2021-04-05T16:59:11Z

In #3885 we noticed that, back then, document.open() had the strange behavior that it updated document's URL to the entry document's URL, but it did not update the corresponding session history entry (SHE). This meant that there could be situations where the document's URL is not equal to its corresponding SHE's URL, which is very strange and led to some bad behaviors detailed in that issue.

/cc @rniwa @TimothyGu @smaug---- who were involved at the time.

We changed the spec in #3946. At the time we believed Safari matched the updated spec, and Firefox and Chromium were happy to follow the updated spec.

Chromium only recently implemented this (/cc @natechapin). @csreis found that this causes a problem, however, in crbug.com/1189026:

Visit a page with an iframe
frames[0].document.write("foo") will update that iframe's URL to the outer page
Navigate the tab (outer frame) to a new URL
Go back.

The result is that you load the outer page, with itself in an iframe. In Chromium this is particularly bad because our recursive iframe prevention doesn't work on history traversals, and also somehow our bfcache doesn't kick in. You can thus set up scenarios where for every history traversal that visits the entry, you get yet another nested iframe.

Note that Firefox and Safari do not seem to implement this part of the spec; they restore the original URL, not the URL that was set by document.open() updating the document's URL/SHE's URL. So this means our testing of Safari back in 2018 was incomplete.

This seems like an unintended, but somewhat predictable, consequence of our efforts to align document's URL with SHE's URL. Ultimately if you set an iframe's SHE's URL to the entry document's URL, you are setting yourself up for recursion upon history traversal (or session restore, e.g. closing and reopening the tab).

So, we have a few choices on where to go from here:

Accept this problem as the cost of keeping SHE's URL and document's URL in sync. We're not yet sure if this is feasible in Chromium; in particular, the change hasn't hit stable yet, and once it does, we might be forced to mitigate (somehow) if these recursive frames break enough sites.
Accept this problem as the cost of keeping SHE's URL and document's URL in sync, but beef up recursive frame detection in Chromium. (And possibly in other browsers, if they have similar limitations.) It's not clear this would be much better; in particular recursive frame detection just refuses to load the iframe, which might be just as breaking to sites.
Revert back to a world where SHE's URL is not always equal to the corresponding document's URL. Maybe try to solve some of the issues identified in document.open() can make a document's URL go out of sync with its history entry #3885 in another way.

The text was updated successfully, but these errors were encountered:

domenic · 2021-04-07T16:59:47Z

Another idea:

Stop document.open() from running the URL and history change steps entirely. Instead, introduce some "base URL override" concept which document.open() sets, which only affects base URL computation. (Kind of an "invisible <base>".)

This is based on the guess that the reason document.open() modifies the URL is so that when you do

<iframe></iframe>
<script>
frames[0].document.write('<img src="foo.jpg">');
</script>

this resolves foo.jpg relative to the caller URL, instead of relative to about:blank. We could make this case work by only monkeying with the base URL computation for the iframe document, instead of overwriting the document's URL itself.

Not sure if this is a good idea...

csreis · 2021-04-07T22:42:36Z

I would be a big fan of not updating the document's URL at all after document.open. In this case where one frame calls document.open on another frame, it seems like the target frame is never actually expected to load the calling frame's URL, either at the time or when you leave and come back. That has always made the URL change puzzling to me.

Your suggestion to somehow override the base URL might be a nice way to solve the relative URL loading case you mention. I agree that there may be other implications we haven't considered yet, though.

Do others know if there are additional reasons the URL changes?

domenic · 2021-04-08T16:53:55Z

I agree that there may be other implications we haven't considered yet, though.

Do others know if there are additional reasons the URL changes?

So this largely comes down to "when does the spec check URL vs. base URL". The short answer is: it uses base URL to resolve relative URLs, and uses URL otherwise.

Here are some examples:

location.href uses the document's URL. So, with the current spec, after document.write()ing into the iframe, iframe.contentWindow.location.href will equal to the caller URL. With the base URL-change spec, it will stay on its original value (e.g. about:blank).
history.pushState() uses the document's URL to determine if the URL change is allowed. So, with the current spec, after document.write()-ing into the iframe, it can call history.pushState("/foo.html") and that will work. With the base URL-change spec, it would not work, since "/foo.html" is not same-origin to about:blank.
<meta refresh="5"> (with no URL=) uses the URL. So, with the current spec, inserting <meta refresh="5"> would refresh the iframe to the caller's URL after 5 seconds. With the base URL-change spec, it would refresh the iframe to its original URL after 5 seconds.
<form> with no action="" uses the URL. So, with the current spec, inserting <form> and submitting it would submit to the caller's URL. With the base URL-change spec, it would submit to the iframe's original URL.
COOP reporting uses the document's URL. So, with the current spec, a document.write()-en iframe accessing a popup that COOP prevents access to would generate a COOP report with the caller's URL as the culprit. With the base URL-change spec, it would generate a COOP report with the iframe's original URL.

My tentative conclusion is that changing to a base URL-override model would have potential compat issues, but perhaps they wouldn't be terrible, and so they might be worth the benefit of having a more reasonable model.

csreis · 2021-04-09T07:22:23Z

Thanks, and hmm. I could imagine getting bug reports for the pushState and form submission cases. It seems plausible that sites are using pushState within document.write content, and that pages might put a form in an about:blank iframe and expect to receive the form submission. Maybe it's possible to collect data on whether that's actually happening?

The other cases seem less concerning at first glance. location.href seems more sensible to keep as the original URL, at least to me (and it seems somewhat unlikely that sites would depend on it changing after document.open). I would be surprised if authors used document.write to use meta refresh to intentionally load the caller's URL. Not sure whether the COOP reporting change matters or not; it doesn't seem like there's an obvious answer for which of the two URLs should be reported, so it may not matter.

domenic · 2021-05-04T19:35:39Z

We've run into a compat problem with this change and are working on a revert. Given the lack of engagement here probably the best path is to just revert to the previous spec and let the history entry's URL and the document's URL get out of sync.

What are the consequences of this? Here's what I've found so far:

Usages of session history entry's URL:
- It's of course used in "traverse the history"
- We should add a general warning that this can get out of sync with the corresponding document's URL, and probably discourage specs from using it except in the specific context of session history traversal.
- history.pushState() and history.replaceState() with no URL use the current session history entry's URL (not the document's URL). We should test which browsers do and either update the spec to use document URL, or add a note that we're violating the above guidance.
- This may impact Should navigating to the current URL preserve history.state? #6213
Reloads
- Reloads generally seem to be specified by calling the navigate algorithm with the document's current URL, and historyHandling set to "reload".
- This means that we'll create a new document by fetching the document's current URL, but then overwrite that document's URL with the session history entry's URL. I.e. the mismatch will carry over.
- Maybe this is how browser behave, maybe not. To be tested.

domenic · 2021-05-04T19:51:35Z

OK, I wrote some tests:

history.pushState() and history.replaceState() seem to use the document's current URL in all browsers, instead of using the session history entry's URL. (New test to be uploaded.)
reload() seems to use the session history entry's URL in Firefox and Safari, per https://wpt.fyi/results/html/webappapis/dynamic-markup-insertion/opening-the-input-stream/reload.window.html?label=master&label=experimental&product=chrome&product=firefox&product=safari&aligned&q=input-stream . In Chrome it appears to use the document's current URL, i.e. Chrome is only passing that test because we aligned the two of them, and Chrome will start failing that test after the revert. This is wrong; see Allow document.open() to desync session history and document URLs #6649 (comment)

See whatwg/html#6556.

Closes #6556. In particular, reverts document.open() to only update the document's URL, and not the session history entry's URL, like it did before ae7cf0c. Now that they can mismatch, we need to audit the cases where this might be important, which leads to the following changes: * Changes location.reload() to reload the current session history entry's URL, instead of the document's URL. This ensures that post-document.open() reload behavior is aligned with WebKit and Gecko, as tested by https://wpt.fyi/results/html/webappapis/dynamic-markup-insertion/opening-the-input-stream/reload.window.html. * Changes history.pushState()/history.replaceState() with no URL argument to default to the document's URL, instead of the current session history's URL. This ensures that post-document.open() pushState()/replaceState() behavior is aligned with all engines, as tested by web-platform-tests/wpt#28826. This also modernizes and makes a bit more precise the location.reload() method steps. The user-initiated reload steps remain vague; #6600 will tackle those.

hsivonen · 2022-09-07T11:00:48Z

https://github.com/web-platform-tests/wpt/blob/master/upgrade-insecure-requests/gen/iframe-blank-inherit.meta/upgrade/fetch.https.html expects even CSP to inherit from the caller.

domenic added topic: history topic: document.open() labels Apr 5, 2021

domenic added a commit to web-platform-tests/wpt that referenced this issue May 4, 2021

Test history.push/replaceState() with no URL + document.open()

222c598

See whatwg/html#6556.

domenic mentioned this issue May 4, 2021

Test history.push/replaceState() with no URL + document.open() web-platform-tests/wpt#28826

Open

domenic linked a pull request May 4, 2021 that will close this issue

Allow document.open() to desync session history and document URLs #6649

Open

3 tasks

zcorpan mentioned this issue Mar 28, 2023

Back-forward cache meta-bug #5880

Open

domenic mentioned this issue Jun 1, 2023

browsers handle document.open'ed iframe documents differently in session history #9330

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

document.open() URL updates can cause recursive loading problems on history traversal #6556

document.open() URL updates can cause recursive loading problems on history traversal #6556

domenic commented Apr 5, 2021

domenic commented Apr 7, 2021

csreis commented Apr 7, 2021

domenic commented Apr 8, 2021

csreis commented Apr 9, 2021

domenic commented May 4, 2021

domenic commented May 4, 2021 •

edited

Loading

hsivonen commented Sep 7, 2022

document.open() URL updates can cause recursive loading problems on history traversal #6556

document.open() URL updates can cause recursive loading problems on history traversal #6556

Comments

domenic commented Apr 5, 2021

domenic commented Apr 7, 2021

csreis commented Apr 7, 2021

domenic commented Apr 8, 2021

csreis commented Apr 9, 2021

domenic commented May 4, 2021

domenic commented May 4, 2021 • edited Loading

hsivonen commented Sep 7, 2022

domenic commented May 4, 2021 •

edited

Loading