Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added debug id proposal #20

Merged
merged 4 commits into from
Apr 12, 2023
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
237 changes: 237 additions & 0 deletions proposals/debug-id.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,237 @@
# Source Map Debug ID Proposal

This document presents a proposal to add globally unique build or debug IDs to
source maps and transpiled JavaScript files, making build artifacts
self-identifying.

## Background

Source maps play a crucial role in debugging minified JavaScript files by
providing a mapping between the minified code and the original source code.
However, the current source map specification lacks important properties such as
self-describing and self-identifying capabilities for both the JavaScript
artifact (the transpiled JavaScript file) as well as the source map. This
results in a subpar user experience and numerous practical problems. To address
these issues, we propose an extension to the source map format: the addition of
globally unique Debug IDs.

## Objective and Benefits

The primary objective of this proposal is to enhance the source map format by
introducing globally unique Debug IDs, enabling better identification and
organization of minified JavaScript files and their corresponding source maps.
This improvement will streamline the debugging process and reduce the likelihood
of errors arising from misidentification or misassociation of files.

Debug IDs (also sometimes called Build IDs) are already used in the native language
ecosystem and supported by native container formats such as PE, ELF, MachO or
WASM.

The proposed solution offers the following benefits:

1. Improved File Identification: The introduction of globally unique Debug IDs
will make it easier to identify and associate minified JavaScript files with
their corresponding source maps.

2. Self-Identifying Files: This specification changes source maps and minified
JavaScript files so that they become self-identifying, eliminating the need
for external information to work with the files.

3. Streamlined Debugging Process: The implementation of Debug IDs will simplify
and streamline the debugging process by reducing the likelihood of errors
resulting from misidentification or misassociation of files.

4. Standardization: The adoption of this proposal as a web standard will
encourage a consistent and unified approach to handling source maps and
minified JavaScript files across the industry.

5. Guaranteed bidirectionality: Today source maps do not provide the ability to
reliably resolve back to the transpiled file they are from. However in
practice tools often require this as they are often leveraging the
transpiled artifact to resolve scope information by parsing the source.

6. Symbol server support: with Debug IDs and source maps with embedded sources
it becomes possible to support symbol server lookup from symbol servers.

## Scope

This proposal sets some specific limitations on source maps to simplify the
processing in the wider ecosystem. Debug IDs are at present only specified to
source maps with embedded sources or where sources are categorically not
available. The lookup for original sources from a source map identified by a
debug ID is not defined.

Additionally, this specification applies only to non-indexed source maps and
currently specifies references only for JavaScript.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its unclear to me what this means? Does it mean that only JS files have the //# debugId comment in them? Because you can have any arbitrary file as one of the "sources".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original spec also leaves open CSS and other formats.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhh absolutely, I always forget that this is not exclusive to JS. 🤔


## Terms

In the context of this document:

- **Source Map:** Refers to a non-indexed, standard source map.
- **Transpiled File:** Refers to a transpiled (potentially minified) JavaScript file.
- **Debug ID:** Refers to a UUID as described in this document.

## Debug IDs

Debug IDs are globally unique identifiers for build artifacts. They are
specified to be UUIDs. In the context of this proposal, they are represented in
hexadecimal characters. When comparing debug IDs they must be normalized. This
means that `85314830-023f-4cf1-a267-535f4e37bb17` and
`85314830023F4CF1A267535F4E37BB17` are equivalent but the former representation
is the canonical format.

The way a debug ID is generated is specific to the toolchain and no requirements
are placed on it. It is however recommended to generate deterministic debug IDs
(UUID v3 or v5) so that rebuilding the same artifacts yields stable IDs.
Comment on lines +85 to +86
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today I learned: https://en.wikipedia.org/wiki/Universally_unique_identifier#Versions_3_and_5_(namespace_name-based)

UUIDs have special version and variant tags, so they don’t use the full 128-bits.

But as you said, I would leave that up to the specific toolchain, as long as the identifier is formatted like a UUID and sufficiently unique, it will be fine :-)


Debug IDs are embedded in both source maps and transpiled files, allowing a
bidirectional mapping between them. The linking of source maps and transpiled
files via HTTP headers is explicitly not desired. A file identified by a Debug
ID must have that Debug ID embedded to ensure the file is self-identifying.
Comment on lines +89 to +91
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this needs to be mentioned here or anywhere. Tools like webpack have an option to have "hidden" (See https://webpack.js.org/configuration/devtool/#devtool) sourcemaps, which are created but not referenced, to create some sense of "obfuscation". As the Debug ID would just be some random bytes, they do not pose any "obfuscation" risk so to say, and tools should always put them in.


### Debug IDs in Source Maps

We propose adding a `debugId` property to the source map at the top level of
the source map object. This property should be a string value representing
the Debug ID in hexadecimal characters, preferably in the canonical UUID
format:

```json
{
"version": 3,
"file": "app.min.js",
"debugId": "85314830-023f-4cf1-a267-535f4e37bb17",
"sources": [...],
"sourcesContent": [...],
"mappings": "..."
}
```

### Debug IDs in JavaScript Artifacts

Transpiled JavaScript files containing a Debug ID must embed the ID near the end
of the source, ideally on the last line, in the format `//# debugId=<DEBUG_ID>`:

```javascript
//# debugId=85314830-023f-4cf1-a267-535f4e37bb17
```

If the special `//# sourceMappingURL=` comment already exists in the file, it is
recommended to place the `debugId` comment in the line above to maintain
compatibility with existing tools. Because the last line already has meaning in
the existing specification for the `sourceMappingURL` comment, tools are
required to examine the last 5 lines to discover the Debug ID.

## JavaScript API for Debug ID Resolution

Today `error.stack` in most runtimes only returns the URLs of the files referenced
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error.stack returns an opaque string. With some luck you are able to parse a URL out of it though.

by the stack trace. For Debug IDs to be useful, a solution would need to be added
to enable mapping of JavaScript file URLs to Debug IDs.

The strawman proposal is to add the Debug ID in two locations:

* `import.meta.debugId`: a new property that should return the debug ID as UUID
of the current module if has one
mitsuhiko marked this conversation as resolved.
Show resolved Hide resolved
* `System.getDebugIdForUrl(url)` looks up the debug ID for a given script file by
URL that has already been loaded by the browser in the current context.
Comment on lines +143 to +144
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this might be the most controversial point of this proposal.

While a lot of tools use the System global, I believe no "official" spec is using it?

Also, "URL that has already been loaded by the browser" might need a bit more clarification, for example:

  • It should return the debug-id of scripts "that are current loaded" into the engine.
  • The URL should match the one that was used to load the script, and should also match the ones pretty-printed in Error.stack output, import.meta.url, etc.
  • Specifically, it should have the same query-string and fragment.
  • It should not do any network IO
  • On mismatch, should it return undefined or rather throw a ReferenceError or any other kind of error?
  • On mismatch, should it give an explanation of the reason? Is the file not loaded at all? Do the querystring parameters not match? etc…


## Appendix A: Self-Description of Files

Unfortunately, neither transpiled JavaScript files nor source maps can be easily
identified without employing heuristics. Unlike formats like ELF binaries, they
lack a distinctive header for identification purposes. When batch processing
files, the ability to differentiate between various files is invaluable, but
this capability is not fully realized in the context of source maps or
transpiled JavaScript files. Although solving this issue is beyond the scope of
this document, addressing it would significantly aid in distinguishing different
files without relying on intricate heuristics.

Nevertheless, we recommend that tools utilize the following heuristics to
determine self-identifying JavaScript files and source maps:

* a JSON file containing a toplevel object with the keys `mapping`, `version`,
`debugId` and `sourcesContent` should be considered to be a self-identifying
source map.
* a UTF-8 encoded text file matching the regular expression
`(?m)^//# debugId=([a-fA-F0-9-]{12,})$` should be considered a transpiled
JavaScript file.
Comment on lines +160 to +165
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻 I like this, especially also implicitly forcing JS files to be UTF-8 ;-)

Should we also propose a JSON Schema along with a "$schema" field for source maps, or would that be a bit too much?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly want to see initial feedback but a JSON schema for source maps would be very valuable by itself.


## Appendix B: Symbol Server Support

With debug IDs it becomes possible to resolve source maps and minified JavaScript
files from the server. That way a tool such as a browser or a crash reporter could
be pointed to a S3, GCS bucket or an HTTP server that can serve up source maps and
build artifacts keyed by debug id.
Comment on lines +170 to +172
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should also plan for browsers and local development servers.
Using something like Import Maps could solve this, and point the browser to the local development server to resolve the corresponding source map.

Related to my note above, this could potentially replace the various different ways that bundlers / development servers can offer source maps today. I believe offering a local symbol server would also benefit performance a bit, as the tools wouldn’t have to embed base-64 encoded sourcemaps into the development assets, but sourcemaps would only be needed to be serialized on demand.


The structure itself is inspired by [debuginfod](https://sourceware.org/elfutils/Debuginfod.html):

* transpiled JavaScript artifact: `<DebugIdFirstTwo>/<DebugIdRest>/js`
* source map: `<DebugIdFirstTwo>/<DebugIdRest>/sourcemap`
mitsuhiko marked this conversation as resolved.
Show resolved Hide resolved

with the following variables:

* `DebugIdFirstTwo`: the first two characters in lowercase of the hexadecimal Debug ID
* `DebugIdRest`: the remaining characters in lowercase of the hexadecimal Debug ID without dashes

## Appendix C: Emulating Debug IDs

In the absence of browser support for loading debug IDs a transpiler can inject
some code to maintain a global dictionary of loaded JavaScript files which allows
experimentation with this concept:

```javascript
(function() {
try {
throw new Error();
} catch (err) {
let match;
if ((match = err.stack.match(/(?:\bat |@)(.*?):\d+:\d+$/m)) !== null) {
let ids = (globalThis.__DEBUG_IDS__ = globalThis.__DEBUG_IDS__ || {});
ids[match[1]] = "<DEBUG_ID>";
}
}
})();
```

```javascript
function getDebugIdForUrl(url) {
return __DEBUG_IDS__ && _DEBUG_IDS__[url] || undefined;
}
```

## Appendix D: Parsing Debug IDs

The following Python code shows how Debug IDs are to be extracted from
transpiled JavaScript and source map files:

```python
import re
import uuid
import json


_debug_id_re = re.compile(r'^//# debugId=(.*)')


def normalize_debug_id(id):
try:
return uuid.UUID(id)
except ValueError:
return None


def debug_id_from_transpiled_javascript(source):
for line in source.splitlines()[::-5]:
match = _debug_id_re.index(line)
if match is not None:
debug_id = normalize_debug_id(match.group(1))
if debug_id is not None:
return debug_id


def debug_id_from_source_map(source):
source_map = json.loads(source)
if "debugId" in source_map:
return normalize_debug_id(source_map["debugId"])
```