From 232723ba9f3132196fd9a92beac8fcd356de8c5c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Nicol=C3=B2=20Ribaudo?= Date: Wed, 7 Aug 2024 22:56:38 +0200 Subject: [PATCH] Define decoding algorithms for source maps This PR introduces "decoded source map" data structures, which are internal spec representations of the information encoded in source maps. It also defines algorithms to decode source maps from either a JSON string or [infra](https://infra.spec.whatwg.org/) representation. The goal is: - use them to explicitly write down all the possible error cases - use them as a starting point to define new data structures, for example for the scopes proposal - eventually add algorithms such as "get the original location given a decoded source map and a generated location". This PR also explicitly defines sources/sourceRoot resolution in terms of the [WHATWG URL](https://url.spec.whatwg.org/) spec. --- source-map.bs | 317 +++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 290 insertions(+), 27 deletions(-) diff --git a/source-map.bs b/source-map.bs index 87e567e..6613f3d 100644 --- a/source-map.bs +++ b/source-map.bs @@ -185,37 +185,145 @@ following structure: } ``` -* version is the version field which must always be the number +* version is the version field which must always be the number `3` as an integer. The source map may be rejected in case of a value different from `3`. -* file is an optional name of the generated code +* file is an optional name of the generated code that this source map is associated with. It's not specified if this can be a URL, relative path name, or just a base name. As such it has a mostly informal character. -* sourceRoot is an optional source root, +* sourceRoot is an optional source root, useful for relocating source files on a server or removing repeated values in - the [=sources=] entry. This value is prepended to the individual entries in the + the [=json/sources=] entry. This value is prepended to the individual entries in the "source" field. -* sources is a list of original sources - used by the [=mappings=] entry. Each entry is either a string that is a +* sources is a list of original sources + used by the [=json/mappings=] entry. Each entry is either a string that is a (potentially relative) URL or `null` if the source name is not known. -* sourcesContent is an optional list +* sourcesContent is an optional list of source content (that is the [=Original Source=]), useful when the "source" - can't be hosted. The contents are listed in the same order as the [=sources=]. + can't be hosted. The contents are listed in the same order as the [=json/sources=]. `null` may be used if some original sources should be retrieved by name. -* names is an optional list of symbol names which may be used by the [=mappings=] entry. -* mappings is a string with the encoded mapping data (see [[#mappings-structure]]). -* ignoreList is an optional list of indices of files that - should be considered third party code, such as framework code or bundler-generated code. This +* names is an optional list of symbol names which may be used by the [=json/mappings=] entry. +* mappings is a string with the encoded mapping data (see [[#mappings-structure]]). +* ignoreList is an optional list of indices of files that + should be considered third party code, such as framework code or bundler-generated code. This allows developer tools to avoid code that developers likely don't want to see or step through, without requiring developers to configure this beforehand. - It refers to the [=sources=] array and lists the indices of all the known third-party sources + It refers to the [=json/sources=] array and lists the indices of all the known third-party sources in the source map. Some browsers may also use the deprecated x_google_ignoreList - field if [=ignoreList=] is not present. + field if [=json/ignoreList=] is not present. + + +A decoded source map is a [=struct=] with the following fields: +
+
file
+
A [=string=] or null.
+ +
sources
+
A [=list=] of [=decoded source|decoded sources=].
+ +
mappings
+
A [=list=] of [=decoded mapping|decoded mappings=].
+
+ +A decoded source is a [=struct=] with the following fields: +
+
URL
+
A [=/URL=] or null.
+ +
content
+
A [=string=] or null.
+ +
ignored
+
A [=boolean=].
+
+ +To decode a source map from a JSON string |str| given a [=/URL=] |baseURL|, run the +following steps: +1. Let |jsonMap| be the result of [=parse a JSON string to an Infra value|parsing a JSON string to + an Infra value=] |str|. +1. If |jsonMap| is not a [=map=], report an error and abort these steps. +1. [=Decode a source map=] given |jsonMap| and |baseURL|, and return its result if any. + +To decode a source map given a [=string=]-keyed [=map=] |jsonMap| and a [=/URL=] +|baseURL|, run the following steps: +1. If |jsonMap|[`"version"`] does not [=map/exist=] or |jsonMap|[`"version"`] is not 3, + [=optionally report an error=]. +1. If |jsonMap|[`"mappings"`] does not [=map/exist=] or |jsonMap|[`"mappings"`], is not a + [=string=], throw an error. +1. Let |sourceMap| be a new [=decoded source map=]. +1. Set |sourceMap|'s [=decoded source map/file=] to [=optionally get a string=] `"file"` from |jsonMap|. +1. Set |sourceMap|'s [=decoded source map/sources=] to the result of [=decode source map + sources|decoding source map sources=] given |baseURL| with: + - [=decode source map sources/sourceRoot=] set to [=optionally get a string=] `"sourceRoot"` + from |jsonMap|; + - [=decode source map sources/sources=] set to [=optionally get a list of optional strings=] + `"sources"` from |jsonMap|; + - [=decode source map sources/sourcesContent=] set to [=optionally get a list of optional + strings=] `"sourcesContent"` from |jsonMap|; + - [=decode source map sources/ignoredSources=] set to [=optionally get a list of array indexes=] + `"ignoreList"` from |jsonMap|. +1. Set |sourceMap|'s [=decoded source map/mappings=] to the result of [=decode source map + mappings|decoding source map mappings=] with: + - [=decode source map mappings/mappings=] set to |jsonMap|[`"mappings"`]; + - [=decode source map mappings/names=] set to [=optionally get a list of strings=] `"names"` + from |jsonMap|; + - [=decode source map mappings/sources=] set to |sourceMap|'s [=decoded source map/sources=]. +1. Return |sourceMap|. + +To optionally get a string |key| from a [=string=]-keyed [=map=] |jsonMap|, run the +following steps: +1. If |jsonMap|[|key|] does not [=map/exist=], return null. +1. If |jsonMap|[|key|] is not a [=string=], [=optionally report an error=] and return null. +1. Return |jsonMap|[|key|]. + +To optionally get a list of strings |key| from a [=string=]-keyed [=map=] +|jsonMap|, run the following steps: +1. If |jsonMap|[|key|] does not [=map/exist=], return a new empty [=list=]. +1. If |jsonMap|[|key|] is not a [=list=], [=optionally report an error=] and return a new empty + [=list=]. +1. Let |list| be a new empty [=list=]. +1. [=For each=] |jsonItem| of |jsonMap|[|key|]: + 1. If |jsonItem| is a [=string=], [=list/append=] it to |list|. + + 1. Else, [=optionally report an error=] and append `""` to |list|. +1. Return |list|. + +To optionally get a list of optional strings |key| from a [=string=]-keyed [=map=] +|jsonMap|, run the following steps: +1. If |jsonMap|[|key|] does not [=map/exist=], return a new empty [=list=]. +1. If |jsonMap|[|key|] is not a [=list=], [=optionally report an error=] and return a new empty + [=list=]. +1. Let |list| be a new empty [=list=]. +1. [=For each=] |jsonItem| of |jsonMap|[|key|]: + 1. If |jsonItem| is a [=string=], [=list/append=] it to |list|. + 1. Else, + 1. If |jsonItem| is not null, [=optionally report an error=]. + 1. Append null to |list|. +1. Return |list|. + +To optionally get a list of array indexes |key| from a [=string=]-keyed [=map=] +|jsonMap|, run the following steps: +1. If |jsonMap|[|key|] does not [=map/exist=], return a new empty [=list=]. +1. If |jsonMap|[|key|] is not a [=list=], [=optionally report an error=] and return a new empty + [=list=]. +1. Let |list| be a new empty [=list=]. +1. [=For each=] |jsonItem| of |jsonMap|[|key|]: + 1. If |jsonItem| is a non-negative integer number, [=list/append=] it to |list|. + 1. Else, + + 1. If |jsonItem| is not null, [=optionally report an error=]. + 1. Append null to |list|. +1. Return |list|. + +To optionally report an error, implementations can choose to: +- Do nothing. +- Report an error to the user, and continue processing. +- Throw an error to abort the running algorithm. ([[Infra#algorithm-control-flow]]) Mappings Structure {#mappings-structure} ---------------------------------------- -The [=mappings=] data is broken down as follows: +The [=json/mappings=] data is broken down as follows: - each group representing a line in the generated file is separated by a semicolon (`;`) - each segment is separated by a comma (`,`) @@ -229,7 +337,7 @@ The fields in each segment are: a [=Base64 VLQ=] that is relative to the previous occurrence of this field. Note that this is different than the fields below because the previous value is reset after every generated line. -2. If present, a zero-based index into the [=sources=] list. This field is a [=Base64 VLQ=] +2. If present, a zero-based index into the [=json/sources=] list. This field is a [=Base64 VLQ=] relative to the previous occurrence of this field, unless this is the first occurrence of this field, in which case the whole value is represented. @@ -238,12 +346,12 @@ The fields in each segment are: occurrence of this field, in which case the whole value is represented. Always present if there is a source field. -4. If present, the zero-based starting [=column=] of the line in the source represented. This +4. If present, the zero-based starting [=column=] of the line in the source represented. This field is a [=Base64 VLQ=] relative to the previous occurrence of this field unless this is the first occurrence of this field, in which case the whole value is represented. Always present if there is a source field. -5. If present, the zero-based index into the [=names=] list associated with this segment. This +5. If present, the zero-based index into the [=json/names=] list associated with this segment. This field is a base 64 VLQ relative to the previous occurrence of this field unless this is the first occurrence of this field, in which case the whole value is represented. @@ -256,14 +364,169 @@ with four fields represent mapped code where a corresponding name does not exist fields represent mapped code that also has a mapped name. Note: Using file offsets was considered but rejected in favor of using line/column data to avoid becoming -misaligned with the original due to platform-specific line endings. +misaligned with the original due to platform-specific line endings. + +A decoded mapping is a [=struct=] with the following fields: +
+
generatedLine
+
A non-negative integer.
+ +
generatedColumn
+
A non-negative integer.
+ +
originalSource
+
A [=decoded source=] or null.
+ +
originalLine
+
A non-negative integer or null.
+ +
originalColumn
+
A non-negative integer or null.
+ +
name
+
A [=string=] or null.
+
+ + +To decode source map mappings given a [=string=] +|mappings|, a [=list=] of [=strings=] +|names|, and a [=list=] of [=decoded source|decoded +sources=] |sources|, run the following steps: +1. If |mappings| is not an [=ASCII string=], throw an error. +1. If |mappings| contains any [=code unit=] other than: + - U+002C (,) or U+003B (;); + - U+0030 (0) to U+0039 (9); + - U+0041 (A) to U+005A (Z); + - U+0061 (a) to U+007A (z); + - U+002B (+), U+002F (/) + + + NOTE: These are the valid [[base64]] characters (excluding the padding character `=`), together + with `,` and `;`. + + then, throw an error. +1. Let |decodedMappings| be a new empty [=list=]. +1. Let |groups| be the result of [=strictly split|strictly splitting=] |mappings| on `;`. +1. Let |generatedLine| be 0. +1. While |generatedLine| is less than |groups|'s [=list/size=]: + 1. If |groups|[|generatedLine|] is not the empty string, then: + 1. Let |segments| be the result of [=strictly split|strictly splitting=] + |groups|[|generatedLine|] on `,`. + 1. Let |generatedColumn| be 0. + 1. Let |sourceIndex| be 0. + 1. Let |originalLine| be 0. + 1. Let |originalColumn| be 0. + 1. Let |nameIndex| be 0. + 1. [=For each=] |segment| in |segments|: + 1. Let |position| be a [=position variable=] for |segment|, initially pointing at + |segment|'s start. + 1. [=Decode a base64 VLQ=] from |segment| given |position| and let + |relativeGeneratedColumn| be the result. + 1. If |relativeGeneratedColumn| is null, [=optionally report an error=] and continue + with the next iteration. + 1. Increase |generatedColumn| by |relativeGeneratedColumn|. If the result is negative, + [=optionally report an error=] and continue with the next iteration. + 1. Let |decodedMapping| be a new [=decoded mapping=] whose + [=decoded mapping/generatedLine=] is |generatedLine|, + [=decoded mapping/generatedColumn=] is |generatedColumn|, + [=decoded mapping/originalSource=] is null, + [=decoded mapping/originalLine=] is null, + [=decoded mapping/originalColumn=] is null, + and [=decoded mapping/name=] is null. + 1. Append |decodedMapping| to |decodedMappings|. + 1. [=Decode a base64 VLQ=] from |segment| given |position| and let |relativeSourceIndex| + be the result. + 1. [=Decode a base64 VLQ=] from |segment| given |position| and let + |relativeOriginalLine| be the result. + 1. [=Decode a base64 VLQ=] from |segment| given |position| and let + |relativeOriginalPosition| be the result. + 1. If |relativeOriginalPosition| is null, then: + 1. If |relativeSourceIndex| is not null, [=optionally report an error=]. + 1. Continue with the next iteration. + + 1. Increase |sourceIndex| by |relativeSourceIndex|. + 1. Increase |originalLine| by |relativeOriginalLine|. + 1. Increase |originalColumn| by |relativeOriginalPosition|. + 1. If any of |sourceIndex|, |originalLine|, or |originalColumn| are less than 0, or if + |sourceIndex| is greater than or equal to |sources|'s [=list/size=], [=optionally + report an error=]. + 1. Else, + 1. Set |decodedMapping|'s [=decoded mapping/originalSource=] to + |sources|[|sourceIndex|]. + 1. Set |decodedMapping|'s [=decoded mapping/originalLine=] to |originalLine|. + 1. Set |decodedMapping|'s [=decoded mapping/originalColumn=] to |originalColumn|. + 1. [=Decode a base64 VLQ=] from |segment| given |position| and let |relativeNameIndex| + be the result. + 1. If |relativeNameIndex| is not null, then: + 1. Increase |nameIndex| by |relativeNameIndex|. + 1. If |nameIndex| is negative or greater than |names|'s [=list/size=], [=optionally + report an error=]. + 1. Else, set |decodedMapping|'s [=decoded mapping/name=] to |names|[|nameIndex|]. + 1. If |position| does not point to the end of |segment|, [=optionally report an + error=]. + 1. Increase |generatedLine| by 1. +1. Return |decodedMappings|. + +To decode a base64 VLQ from a [=string=] |segment| given a [=position variable=] +position, run the following stepsis the number : +1. Let |first| be a [=byte=] whose the [=byte/value=] is the number corresponding to |segment|'s + |position|th [=code unit=], according to the [[base64]] encoding. + + NOTE: The two most significant bits of |first| are 0. +1. Let |sign| be 1 if |first| & 0x01 is 0x00, and -1 otherwise. +1. Let |value| be (|first| >> 1) & 0x0F, as a number. +1. Let |nextShift| be 4. +1. Let |currentByte| be |first|. +1. While |currentByte| & 0x20 is 0x20: + 1. Advance |position| by 1. + 1. If |position| points to the end of |segment|, throw an error. + 1. Set |currentByte| to the [=byte=] whose the [=byte/value=] is the number corresponding to + |segment|'s |position|th [=code unit=], according to the [[base64]] encoding. + 1. Let |chunk| be |currentByte| & 0x1F, as a number. + 1. Add |chunk| << |nextShift| to |value|. + 1. If |value| is greater than or equal to 231, throw an error. + 1. Increase |nextShift| by 5. +1. Advance |position| by 1. +1. Return |value| * |sign|. + + return the result of decoding a base64 VLQ from |segment| starting at position |position|, +and increase |position| by the number of bytes consumed. Return null if it's not possible to do so. + +NOTE: In addition to returning the decoded value, this algorithm updates the [=position variable=] +in the calling algorithm. Resolving Sources {#resolving-sources} -------------------------------------- -If the sources are not absolute URLs after prepending the [=sourceRoot=], the sources are +If the sources are not absolute URLs after prepending the [=json/sourceRoot=], the sources are resolved relative to the SourceMap (like resolving the script `src` attribute in an HTML document). +To decode source map sources given a [=/URL=] |baseURL|, +a [=string=] or null |sourceRoot|, +a [=list=] of [=strings=] |sources|, +a [=list=] of [=strings=] |sourcesContent|, +and a [=list=] of numbers |ignoredSources|, +run the following steps: +1. Let |decodedSources| be a new empty [=list=]. +1. If |sourceRoot| is not null, then: + 1. Set |sourceRootURL| to the result of [=URL parser|URL parsing=] |sourceRoot| with |baseURL|. + + 1. If |sourceRootURL| is failure, [=optionally report an error=]. + 1. Else, set |baseURL| to |sourceRootURL|. +1. [=For each=] |source| of |sources| with index |index|: + 1. Let |decodedSource| be a new [=decoded source=] whose [=decoded source/URL=] is null, + [=decoded source/content=] is null, and [=decoded source/ignored=] is false. + 1. If |source| is not null: + 1. Let |sourceURL| be the result of [=URL parser|URL parsing=] |source| with |baseURL|. + 1. If |sourceURL| is failure, [=optionally report an error=]. + 1. Else, set |decodedSource|'s [=decoded source/URL=] to |sourceURL|. + 1. If |index| is in |ignoredSources|, set |decodedSource|'s [=decoded source/ignored=] to true. + 1. If |sourcesContent|'s [=list/size=] is greater than or equal to |index|, set + |decodedSource|'s [=decoded source/content=] to |sourcesContent|[|index|]. + 1. [=list/Append=] |decodedSource| to |decodedSources|. +1. Return |decodedSources|. + Extensions {#extensions} ------------------------ @@ -307,8 +570,8 @@ an alternate representation of a map is supported: ``` The index map follows the form of the standard map. Like the regular source map, -the file format is JSON with a top-level object. It shares the [=version=] and -[=file=] field from the regular source map, but gains a new [=sections=] field. +the file format is JSON with a top-level object. It shares the [=json/version=] and +[=json/file=] field from the regular source map, but gains a new [=sections=] field. sections is an array of [=Section=] objects. @@ -316,7 +579,7 @@ the file format is JSON with a top-level object. It shares the [=version=] and Section objects have the following fields: -* offset is an object with two fields, `line` and `column`, +* offset is an object with two fields, `line` and `column`, that represent the offset into generated code that the referenced source map represents. @@ -349,7 +612,7 @@ support in order to add an HTTP header and the second requires an annotation in Source maps are linked through URLs as defined in [[URL]]; in particular, characters outside the set permitted to appear in URIs must be percent-encoded -and it may be a data URI. Using a data URI along with [=sourcesContent=] allows +and it may be a data URI. Using a data URI along with [=json/sourcesContent=] allows for a completely self-contained source map. The HTTP `sourcemap` header has precedence over a source annotation, and if both are present, @@ -539,9 +802,9 @@ supports `/* ... */`-style comments. #### Extraction methods for WebAssembly binaries To extract a Source Map URL from a WebAssembly source given -a [=byte sequence=] |bytes|, run the following steps: +a [=byte sequence=] |segment|, run the following steps: -1. Let |module| be [=module_decode=](|bytes|). +1. Let |module| be [=module_decode=](|segment|). 1. If |module| is error, return null. 1. [=For each=] [=custom section=] |customSection| of |module|, 1. Let |name| be the `name` of |customSection|, [=UTF-8 decode without BOM or fail|decoded as UTF-8=]. @@ -592,7 +855,7 @@ JavaScript-style single-line comments. Fetching Source Maps {#fetching-source-maps} ============================================ -To fetch a source map given a [=URL=] |url|, run the following steps: +To fetch a source map given a [=/URL=] |url|, run the following steps: 1. Let |promise| be [=a new promise=]. 1. Let |request| be a new [=request=] whose [=request/URL=] is |url|.