Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import with type text, bytes, and URL #9444

Open
kriskowal opened this issue Jun 19, 2023 · 12 comments
Open

Import with type text, bytes, and URL #9444

kriskowal opened this issue Jun 19, 2023 · 12 comments
Labels
addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest topic: script

Comments

@kriskowal
Copy link

kriskowal commented Jun 19, 2023

I’m working with a group of TC39 delegates on what we call Module Harmony, an effort to make proposals pertaining to the module system coherent. I am consequently looking for the right venue to propose and establish a precedent for a host integration with modules, specifically to address the portability of code that uses the module system to express a dependency upon plain text, bytes, or references to assets. Concretely, I would like to propose that:

import text from 'text.txt' with { type: 'text' };
import bytes from 'bytes.oct' with { type: 'bytes' };
import imageUrl from 'image.jpg' with { type: 'url' };

Such that:

typeof text === 'string';
bytes instanceof Uint8Array;
typeof imageUrl === 'string'; // edit: was instanceof URL

So that a module can express these kinds of dependency in a way that is portable. Specifically, I aim for a program to be run on the server side and the client side of a web application, both raw and thru an optimizing translation (e.g., bundling). With import attributes, ECMA 262 is already sufficiently expressive to allow a host integration to address this problem without additional features, and would be coherent with future 262 proposals, particularly virtual module sources.

@bathos
Copy link

bathos commented Jun 19, 2023

Re: the URL import, using a URL instance for this seems contradictory to guidance in WHATWG URL:

A standard that exposes URLs should expose the URL as a string (by serializing an internal URL). A standard should not expose a URL using a URL object. URL objects are meant for URL manipulation.

For a module, not using the mutable URL representation would seem particularly important, I’d think?

@kriskowal
Copy link
Author

For a module, not using the mutable URL representation would seem particularly important, I’d think?

A string representation of the URL would entirely satisfy the motivating use cases.

@annevk
Copy link
Member

annevk commented Jun 20, 2023

Looking at https://fetch.spec.whatwg.org/#body-mixin I wonder if we want arrayBuffer instead, but I suppose that was a mistake on Fetch's part and it should have been bytes returning a view (we could still add that I suppose).

I'm not sure I understand how url works. How is it different from import.meta.resolve('image.jpg')?

Do we need to solve @domenic's #7017 about feature detection at the same time?

There's also #4321 from @Jamesernator and #7706 from @7ombie. These all look like duplicates, but I'm fine with keeping them open until we have some kind of plan. One thing that's raised in the latter that's important here is what to do about MIME types. Would we not check response MIME types for these, similar to Fetch? Or would we try to enforce something?

cc @whatwg/modules

@annevk annevk added addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest labels Jun 20, 2023
@littledan
Copy link
Contributor

How is it different from import.meta.resolve('image.jpg')

Good question. The semantics would actually be the same. One piece of motivation is that this form is more "declarative"-looking and therefore statically analyzable (which should mostly help build tools, given that not enough information is available for a prefetcher to use this). See more information about motivation (for a previous iteration of this idea) at https://github.com/tc39/proposal-asset-references . Also note that some people in TC39 are considering whether we should propose some other syntax for this, besides using import attributes.

@kriskowal
Copy link
Author

kriskowal commented Jun 20, 2023

Yes, this would provide a statically analyzable alternate route to the same url value, analogous to static vs dynamic import. This is less interesting for the web than it is interesting because it establishes a convention that build tooling would benefit from.

For example, a bundler that takes a whole web application directory tree and generates a new tree, the bundler would be able to discern the dependency and rewrite the URL.

For a bundler that takes a whole web application tree and generates a single JavaScript file, it would have the option of embedding the underlying data URL.

That’s to say, any static syntax that reveals the url of an asset in a way that implies a dependency needs to be arranged by a bundler is an improvement on the status quo. This is one of the options we are considering.

As @annevk mentions in chat, this approach has the disadvantage of introducing a code path under the host import hook that bypasses a fetch.

For this reason, the alternative approach is to introduce another import phase, as we do with import source and import defer proposals, except the phase would occur before fetch. This has a different smell: it is not clear that such a module would advance beyond the asset phase. It is clear that it would not compose well with import with type, since the type is irrelevant unless we advance to fetch. We would presumably be obliged to allow the module system to fetch an image (for example) and fail to interpret it as JavaScript.

The implication for Module virtualization is that an asset import would have to bypass the import hook and provide an alternate lane that can be interrupted before fetch (to produce a url) and then again before parse (to produce bytes or text) before possibly proceeding to produce source, at which point it will have done all the work currently subsumed by the host import hook.

[added:]

The implication for Module virtualization if we pursue with type is simply that these are different module source types that terminate at exporting a default value when they’re evaluated. So, the proposed import hook virtualization would just return a non-JavaScript module source with the appropriate behavior.

@Jamesernator
Copy link

Jamesernator commented Jun 29, 2023

There's also #4321 from @Jamesernator

The suggestions there had quite a different flavour given at the time JSON modules were proposed to be derived based on MIME type, rather than the current approach that uses import attributes (which MIME type must agree with).

This new style with import attributes is strictly more useful as one can interpret essentially anything as an array buffer/text regardless of it's actual MIME type.

e.g. In my previous suggestion, text would only be successfully imported if it were text/plain, but a lot of stuff might be in text/yaml, text/json5, etc etc.

As such that old issue can be closed in strong favour of this one.

One thing that's raised in the latter that's important here is what to do about MIME types. Would we not check response MIME types for these, similar to Fetch? Or would we try to enforce something?

For urls there's obviously nothing to do as no fetching is involved.

For array buffers, checking MIME types is undesirable as people might be loading any content for some processing (e.g. images, audio, application specific formats, are all reasonable reasons to import array buffers).

For text checking the type/essence is similar to array buffers, any MIME type (not just text/*) might contain text. However we do need to know about encoding, so the parameter charset should probably be respected.

Alternatively for text, we could have a separate attribute that indicates what format to decode as (potentially useful if the server doesn't know what charset files are using).

import someText from "./file.txt" with { type: "text", encoding: "utf16" };
import someText from "./oldData.dat" with { type: "text", encoding: "latin2" };

// Would default to utf8 naturally so these would be equivalent
import someText2 from "./file2.ini" with { type: "text", encoding: "utf8" };
import someText2 from "./file2.ini" with { type: "text" }; 

@Jarred-Sumner
Copy link

Jarred-Sumner commented Apr 23, 2024

In Bun v1.1.5, we are adding bundler & runtime support for text, json & toml. text is UTF-8 and replaces invalid UTF-8 with FFD. We probably will support BOM later to handle UTF-16. Named imports (excluding default) with type: “text” throw an error at parse time.

oven-sh/bun#10456

@kriskowal
Copy link
Author

kriskowal commented Oct 18, 2024

Kindly consider TC 39 Stage 1 immutable ArrayBuffer for type: 'bytes'. https://github.com/tc39/proposal-immutable-arraybuffer

@7ombie
Copy link

7ombie commented Nov 12, 2024

Sorry if this is a dumb question, but why do we care about the MIME type for raw bytes and UTF-8? I thought that was a security concern that stemmed from the fact that browsers parse the result. If we just get the bytes or characters we asked for (like a static fetch), I'm not sure why it needs to be any tighter than that.

All a static analyzer would see is a filepath that ends on some extension (say .png), and must (at least generally) assume it's a path to a PNG file.

@kriskowal
Copy link
Author

I find it reasonable to enable or even encourage import with type bytes to also specify the expected MIME type, possibly with another attribute like mimeType so that the module cannot be deceived into misinterpreting the imported content, especially for dynamic import, for example, import(location, { with: { type: 'bytes', mimeType: 'image/png' } }). I would want the mimeType assertion to be optional since not all binary data that modules can usefully interpret has a MIME type, except insofar as application/octet-stream is sufficiently abstract to apply to anything.

@7ombie
Copy link

7ombie commented Nov 13, 2024

@kriskowal - That makes perfect sense. There's a benefit in being able to opt into extra checks and balances, but no reason to require them.

@Jamesernator
Copy link

Something that probably should be done with { type: "url" } is the ability to set a destination so the browser can preload into the right place. i.e.:

import imageUrl from "./image.png" with { type: "url", preloadAs: "image" };

import workerUrl from "./worker.js" with { type: "url", preloadmoduleAs: "worker" };
// Or with source phase imports: https://github.com/tc39/proposal-esm-phase-imports
import source workerSrc from "./worker.js" with { preloadmoduleAs: "worker" };

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest topic: script
Development

No branches or pull requests

7 participants