Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Origin of blob URLs doesn't match what implementations do #127

Closed
mkruisselbrink opened this issue Jun 15, 2016 · 21 comments · Fixed by #371
Closed

Origin of blob URLs doesn't match what implementations do #127

mkruisselbrink opened this issue Jun 15, 2016 · 21 comments · Fixed by #371

Comments

@mkruisselbrink
Copy link
Contributor

As mentioned in Origin of Blob URLs, the origin of a blob URL should be the origin of the incumbent settings object when the URL was created. The simple "parse the origin from the url in the first string of the path of the blob url" algorithm the URL spec uses doesn't accomplish that for opaque origins.

The way at least chrome implements this is by maintaining a separate blob url -> origin mapping, and using that to lookup the origin of a blob url. I assume firefox does someting similar (see also whatwg/html#1322 for an example where this behavior matters).

Should the URL spec be updated to actually match the File API spec about blob URLs, and to match what implementations seem to do?

@annevk
Copy link
Member

annevk commented Jun 16, 2016

I think the standard has a better approach. Otherwise looking up an aspect of the URL involves going through a global lookup table.

Note that the origin of a blob URL is determined at creation time and can therefore be stored in the blob URL (as the File API prescribes). So extracting it from the URL is fine too. And modifying it doesn't work since the entire URL is used as key.

@annevk
Copy link
Member

annevk commented Jun 16, 2016

Also, note that in https://bugzilla.mozilla.org/show_bug.cgi?id=1270451 Gecko fixed this.

@mkruisselbrink
Copy link
Contributor Author

I think the standard has a better approach

Which standard. Currently the file API spec and the URL spec disagree here. According to the url spec the origin of a blob url with an opaque origin is a new unique opaque origin, since its created from the serialized url. While according to the file API spec the origin should be the same opaque origin as that of the incumbent settings object that created the URL.

Of course both would serialize to "null", so from the javascript URL API point of view this wouldn't matter, but when other parts of the spec try to determine if URLs are same origin this does matter. In particular for things like creating workers from blob URLs. But also other situations. Should a blob URL be same origin with the setting object that created them or not? I think yes, and I believe implementations agree (at least chrome does). The File API spec certainly agrees. The current URL spec seems to be the odd one out.

@mkruisselbrink
Copy link
Contributor Author

Also, note that in https://bugzilla.mozilla.org/show_bug.cgi?id=1270451 Gecko fixed this.

Not sure what exactly gecko fixed. Are in latest gecko blob URLs no longer same origin with their creating context? I would find that very surprising and broken. And in particular "What is the problem with the model from the URL standard? It seems to work fine in Chrome." makes no sense. Chrome does not implement the model from the URL standard as it doesn't work.

@annevk
Copy link
Member

annevk commented Jun 16, 2016

I don't understand. The File API defines the origin of a blob and puts it in the URL it creates. The URL Standard extracts. I don't think there is a mismatch.

@mkruisselbrink
Copy link
Contributor Author

The file API defines not only the origin of a blob, but also the origin of a blob URL. And defines the origin of that URL to be the same as that of the context that creates it. In particular when that origin is an opaque origin the origin of the blob URL needs to still be the same origin with it. That is impossible to do through serialization and deserialization via the URL itself (separately the FileAPI spec doesn't actually define how implementations should serialize an opaque origin in a blob URL, it leaves that implementation defined). But some out-of-band data (like the already existing Blob URL store) is needed to actually have blob URLs be same origin with the context that created them.

@annevk
Copy link
Member

annevk commented Jun 17, 2016

I see, opaque origins. I think File API should not serialize those other than as "null". Seems like we might reveal more than we want otherwise.

What does this mean in practice for opaque origins though? What scenario is broken and is that a problem or a feature?

@mkruisselbrink
Copy link
Contributor Author

I think in practice browsers do indeed serialize opaque origins as "null" in blob URLs, even though the spec might allow other things. So that part is fine.

One of the features we'd lose if blob urls would no longer be same origin with their creator would be the feature discussed in whatwg/html#1322, the ability for a sandboxed iframe to start a worker using a blob URL.

Something else that would break without this same originness is using fetch to read blobs in opaque origins (but creating a blob url and then fetching that). That more or less is why blink/webkit originally implemented this lookup for the origin (although in chrome it is slightly worse since the entire FileReader API is implemented in terms of fetching blob URLs. So FileReader was broken in opaque origins without this origin mapping. But that's more an implementation detail).

@annevk
Copy link
Member

annevk commented Jun 17, 2016

@bakulf thoughts?

@annevk
Copy link
Member

annevk commented Jun 22, 2016

Okay, @bakulf seemed fine with reverting Firefox to the earlier behavior. I'll need to file a bug. So just to confirm, @mkruisselbrink, the idea is that we'd do some kind of global lookup when determining the origin of a blob URL.

Does that effectively mean the blob store is per unit of related-origin browsing contexts?

@domenic
Copy link
Member

domenic commented Jun 22, 2016

Probably per event loop, since they work in workers as well? I dunno, I just heard that phrase and am popping my head in.

@mkruisselbrink
Copy link
Contributor Author

mkruisselbrink commented Jun 22, 2016

The blob store concept as defined in the latest version of the File API spec definitely needs work, as it doesn't actually reflect what browsers do (and also doesn't seem to match what older versions of the File API spec did). Having it per execution context as in the current spec certainly doesn't seem like it would make sense, as using blob urls in other execution contexts is definitely something that works today. I believe in chrome the thing that best reflects the "blob store" concept is actually fully global. But then the blob url -> (possibly opaque) origin map chrome also maintains is more per event loop. I really need to find the time to figure out what browsers are actually doing in this space, and then figure out how to best write that down in the spec, as what is currently in the specs about blob URLs is all over the place and certainly doesn't match reality... I should be able to find some time for that in the next few weeks.

@annevk
Copy link
Member

annevk commented Jun 27, 2016

I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1282504 to get Firefox back to using a store again for the origin. Leaving this open until we figure out what the scope of that store is.

@annevk
Copy link
Member

annevk commented Jun 28, 2016

@mkruisselbrink so in Chrome new URL("blob:http://foo.com/bar").origin == "http://foo.com" which seems a little weird if you just use the blob URL store, no?

@mkruisselbrink
Copy link
Contributor Author

In Chrome it first tries the blob URL store, and if that doesn't have an entry it parses the origin out of the URL string.

@annevk
Copy link
Member

annevk commented Jun 29, 2016

@mkruisselbrink would you be open to remove the fallback? I'm not sure how having the fallback makes sense if we use the store anyway.

@annevk
Copy link
Member

annevk commented Feb 2, 2017

@mkruisselbrink ping! We should really start fixing this.

@mkruisselbrink
Copy link
Contributor Author

Not sure about not having the fallback in getting the origin for a blob URL. I guess it sort of makes sense from a spec point of view, but what would you return then if a URL doesn't (or no longer does) exists in the Blob URL Store? Presumably a new unique origin? It would then allow code to check if a blob url is (still) valid and not revoked by getting its origin? Not sure I like that. Also note that from the point of view of new URL(some blob url).origin it's indistinguishable if the origin is always extracted from the URL, or if it is first looked up in this map, and if not found it is gotten from the URL. Only in places where same origin checks are done with the unserialized origin (particularly for opaque origins) is there any difference.

In blink the store used to map url to origin is currently per-thread, which is problematic in that things that should work between workers and documents don't always work correctly. I think it makes the most sense for the Blob URL Store to conceptually just be completely global. Of course implementations can scope it smaller since blob urls shouldn't ever be resolved cross origin anyway, but no reason not to spec it in the simplest way? (hmm, the File API spec says non-normatively that all cross origin requests should be errors, but from the fetch spec it seems like a no-cors cross origin request would be allowed? I wonder what implementations do...)

@annevk
Copy link
Member

annevk commented Feb 7, 2017

Isn't the blob URL store lookup already exposed synchronously somewhere? (I don't recall offhand how though, so maybe not.)

Also, if you put a blob URL in a new browsing context, why would that not work? Can't the user open them somewhere else that might be a new thread/process?

@annevk
Copy link
Member

annevk commented Feb 13, 2017

You can already determine synchronously whether a URL is in the blob URL store:

const blob = new Blob(["test"]),
      url = URL.createObjectURL(blob),
      client = new XMLHttpRequest
URL.revokeObjectURL(url)
client.open("GET", url, false)
client.send()

I suppose you could make an argument that we only look up the origin in the blob URL store when it turns out to be opaque from parsing the blob URL. As that would give slightly better performance probably. I would be okay with such an approach, though it still seems a little worse than simply always looking it up or never looking it up (which is not an option as that would break opaque origin use cases).

@annevk
Copy link
Member

annevk commented Feb 21, 2017

@mkruisselbrink thoughts? What's your timeline for defining the blob store properly?

mkruisselbrink added a commit to mkruisselbrink/url that referenced this issue Jan 4, 2019
annevk pushed a commit that referenced this issue Jan 14, 2019
In particular some blob: URLs representing an opaque origin could still be same origin with something. That only works if the origin is stored somewhere and is not obtained from the URL.

Tests: see #371 (comment).

This fixes #127, fixes #290 , fixes w3c/FileAPI#20, and fixes w3c/FileAPI#63.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

3 participants