You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the spec, the "fetch a single module script" algorithm handles fetching and parsing a specific ES module, and in step 13 it uses the encoding spec's "UTF-8 decode" to decode the response body into a string, which it then passes to the relevant parser. This algorithm decodes UTF-8 lossily (i.e. substituting invalid byte sequences with replacement characters), and if the body starts with a UTF-8 BOM, it strips it. It doesn't switch the encoding based on the MIME type, or on the presence of a UTF-16 BOM.
Deno's behavior depends on whether the module specifier has a file:// schema. For non-file:// schemas, the MIME type's charset parameter is used to pick an encoding if present. If not, UTF-8 is used. If the charset parameter doesn't correspond to a valid encoding, it throws. The BOM is ignored when choosing the encoding. For local files, the BOM is used, and if there isn't one, UTF-8 is used. In either case, if the decoding finds an invalid byte sequence in the chosen encoding at any point, the module load throws. And a BOM is not stripped when decoding, regardless of whether it was used to choose the encoding.
Note that Deno's behavior of choosing the MIME charset parameter over the BOM goes not only against the browser's handling of modules, but also against the way browsers handle classic scripts and page loads, which always prioritizes the BOM.
This was not noticed before now because most of the WPT tests that test for module loading are HTML-only, which our test runner doesn't support, since they typically use <script type="module">. The only JS tests for module loading deal with import assertions and JSON modules, and so this was only noticed when implementing them in denoland/deno#12866.
Since the behavior with local and remote files is so different, and this doesn't seem to have been noticed until now, it doesn't seem like switching to UTF-8 only would be very breaking in the Deno ecosystem. The one case where this might be breaking is with UTF-16 files with a BOM, since servers would be likely to detect and serve them as the correct encoding – but I doubt this is something to lose sleep over, since UTF-16 is very rarely used. Another potential concern might be CDN's that transpile Node.js code, since the served code might not have been run locally at any point, but the most used ones seem to always serve either with charset=utf-8 or without the charset parameter.
The text was updated successfully, but these errors were encountered:
I've been looking at this again, and this seems like an unlikely change to be accepted, since despite the arguments I made, there's no knowing what would break. Depending on how we feel about it, we might enable warnings for non-UTF-8 JS/TS modules and see the user response. But UTF-8 should be required for JSON modules, right?
In the spec, the "fetch a single module script" algorithm handles fetching and parsing a specific ES module, and in step 13 it uses the encoding spec's "UTF-8 decode" to decode the response body into a string, which it then passes to the relevant parser. This algorithm decodes UTF-8 lossily (i.e. substituting invalid byte sequences with replacement characters), and if the body starts with a UTF-8 BOM, it strips it. It doesn't switch the encoding based on the MIME type, or on the presence of a UTF-16 BOM.
Deno's behavior depends on whether the module specifier has a
file://
schema. For non-file://
schemas, the MIME type'scharset
parameter is used to pick an encoding if present. If not, UTF-8 is used. If thecharset
parameter doesn't correspond to a valid encoding, it throws. The BOM is ignored when choosing the encoding. For local files, the BOM is used, and if there isn't one, UTF-8 is used. In either case, if the decoding finds an invalid byte sequence in the chosen encoding at any point, the module load throws. And a BOM is not stripped when decoding, regardless of whether it was used to choose the encoding.Note that Deno's behavior of choosing the MIME
charset
parameter over the BOM goes not only against the browser's handling of modules, but also against the way browsers handle classic scripts and page loads, which always prioritizes the BOM.This was not noticed before now because most of the WPT tests that test for module loading are HTML-only, which our test runner doesn't support, since they typically use
<script type="module">
. The only JS tests for module loading deal with import assertions and JSON modules, and so this was only noticed when implementing them in denoland/deno#12866.Since the behavior with local and remote files is so different, and this doesn't seem to have been noticed until now, it doesn't seem like switching to UTF-8 only would be very breaking in the Deno ecosystem. The one case where this might be breaking is with UTF-16 files with a BOM, since servers would be likely to detect and serve them as the correct encoding – but I doubt this is something to lose sleep over, since UTF-16 is very rarely used. Another potential concern might be CDN's that transpile Node.js code, since the served code might not have been run locally at any point, but the most used ones seem to always serve either with
charset=utf-8
or without thecharset
parameter.The text was updated successfully, but these errors were encountered: