From 67a1a23c8a09f205113a36f92390e4b787974ecd Mon Sep 17 00:00:00 2001 From: Shery <382254994@qq.com> Date: Sat, 16 Jun 2018 13:58:53 +0800 Subject: [PATCH] =?UTF-8?q?=E6=88=96=E8=AE=B8=E4=BD=A0=E5=B9=B6=E4=B8=8D?= =?UTF-8?q?=E9=9C=80=E8=A6=81=20Rust=20=E5=92=8C=20WASM=20=E6=9D=A5?= =?UTF-8?q?=E6=8F=90=E5=8D=87=20JS=20=E7=9A=84=E6=89=A7=E8=A1=8C=E6=95=88?= =?UTF-8?q?=E7=8E=87=20=E2=80=94=20=E7=AC=AC=E4=B8=80=E9=83=A8=E5=88=86=20?= =?UTF-8?q?(#3965)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * 或许你并不需要 Rust 和 WASM 来提升 JS 的执行效率 — 第一部分 或许你并不需要 Rust 和 WASM 来提升 JS 的执行效率 — 第一部分 * 校对修改-1 校对修改-1 * 修正格式问题 --- ...ou-dont-need-rust-to-speed-up-your-js-1.md | 185 +++++++++--------- 1 file changed, 92 insertions(+), 93 deletions(-) diff --git a/TODO1/maybe-you-dont-need-rust-to-speed-up-your-js-1.md b/TODO1/maybe-you-dont-need-rust-to-speed-up-your-js-1.md index 33f8f253190..05a2ae0affc 100644 --- a/TODO1/maybe-you-dont-need-rust-to-speed-up-your-js-1.md +++ b/TODO1/maybe-you-dont-need-rust-to-speed-up-your-js-1.md @@ -2,20 +2,20 @@ > * 原文作者:[Vyacheslav Egorov](http://mrale.ph/) > * 译文出自:[掘金翻译计划](https://github.com/xitu/gold-miner) > * 本文永久链接:[https://github.com/xitu/gold-miner/blob/master/TODO1/maybe-you-dont-need-rust-to-speed-up-your-js-1.md](https://github.com/xitu/gold-miner/blob/master/TODO1/maybe-you-dont-need-rust-to-speed-up-your-js-1.md) -> * 译者: -> * 校对者: +> * 译者:[Shery](https://github.com/shery15) +> * 校对者:[geniusq1981](https://github.com/geniusq1981) -# Maybe you don't need Rust and WASM to speed up your JS — Part 1 +# 或许你并不需要 Rust 和 WASM 来提升 JS 的执行效率 — 第一部分 -Few weeks ago I noticed a blog post [“Oxidizing Source Maps with Rust and WebAssembly”](https://hacks.mozilla.org/2018/01/oxidizing-source-maps-with-rust-and-webassembly/) making rounds on Twitter - talking about performance benefits of replacing plain JavaScript in the core of `source-map` library with a Rust version compiled to WebAssembly. +几个星期前,我在 Twitter 上看到一篇名为 [“Oxidizing Source Maps with Rust and WebAssembly”](https://hacks.mozilla.org/2018/01/oxidizing-source-maps-with-rust-and-webassembly/) 的推文,其内容主要是讨论用 Rust 编写的 WebAssembly 替换 `source-map` 库中纯 JavaScript 编写的核心代码所带来的性能优势。 -This post piqued my interest, not because I am a huge on either Rust or WASM, but rather because I am always curious about language features and optimizations missing in pure JavaScript to achieve similar performance characteristics. +这篇文章使我感兴趣的原因,并不是因为我擅长 Rust 或 WASM,而是因为我总是对语言特性和纯 JavaScript 中缺少的性能优化感到好奇。 -So I checked out the library from GitHub and departed on a small performance investigation, which I am documenting here almost verbatim. +于是我从 GitHub 检出了这个库,然后逐字逐句的记录了这次小型性能研究。 -### Getting the Code +### 获取代码 -For my investigations I was using an _almost_ default x64.release build of the V8 at commit [69abb960c97606df99408e6869d66e014aa0fb51](https://chromium.googlesource.com/v8/v8/+/69abb960c97606df99408e6869d66e014aa0fb51) from January 20th. My only departure from the default configuration is that I enable disassembler via GN flags to be able to dive down to generated machine code if needed. +对于我的研究,当时使用的是**近乎**默认配置的 x64 V8 的发布版本,V8 版本对应着 1 月 20 日的提交历史 commit [69abb960c97606df99408e6869d66e014aa0fb51](https://chromium.googlesource.com/v8/v8/+/69abb960c97606df99408e6869d66e014aa0fb51)。为了能够根据需要深入到生成的机器码,我通过 GN 标志启用了反汇编程序,这是我唯一偏离默认配置的地方。 ``` ╭─ ~/src/v8/v8 ‹master› @@ -26,14 +26,14 @@ use_goma = true v8_enable_disassembler = true ``` -Then I got a checkouts of [`source-map`](https://github.com/mozilla/source-map) package at: +然后我获取了两个版本的 [`source-map`](https://github.com/mozilla/source-map),版本信息如下: -* [commit c97d38b](https://github.com/mozilla/source-map/commit/c97d38b70de088d87b051f81b95c138a74032a43), which was the last commit that updated `dist/source-map.js` before Rust/WASM started landed; -* [commit 51cf770](https://github.com/mozilla/source-map/commit/51cf7708dd70d067dfe04ce36d546f3262b48da3) which was the most recent commit, when I did my investigation; +* [commit c97d38b](https://github.com/mozilla/source-map/commit/c97d38b70de088d87b051f81b95c138a74032a43),在 Rust/WASM 实装前最近一次更新 `dist/source-map.js` 的提交记录; +* [commit 51cf770](https://github.com/mozilla/source-map/commit/51cf7708dd70d067dfe04ce36d546f3262b48da3),当我进行这次调查时的最近一次提交记录; -### Profiling the Pure-JavaScript Version +### 分析纯 JavaScript 版本 -Running benchmark in the pure-JS version was simple: +在纯 JavaScript 版本中进行基准测试很简单: ``` ╭─ ~/src/source-map/bench ‹ c97d38b› @@ -47,7 +47,7 @@ console.timeEnd: iteration, 4644.619000 [Stats samples: 5, total: 23868 ms, mean: 4773.6 ms, stddev: 161.22112144505135 ms] ``` -The first thing that I did was to disable the serialization part of the benchmark: +我做的第一件事是禁用基准测试的序列化部分: ``` diff --git a/bench/bench-shell-bindings.js b/bench/bench-shell-bindings.js @@ -64,7 +64,7 @@ index 811df40..c97d38b 100644 +// print(benchmarkSerializeSourceMap()); ``` -And then threw it into the Linux `perf` profiler: +然后把它放到 Linux 的 `perf` 性能分析工具中: ``` ╭─ ~/src/source-map/bench ‹perf-work› @@ -75,9 +75,9 @@ console.timeEnd: iteration, 4984.464000 [ perf record: Captured and wrote 24.659 MB perf.data (~1077375 samples) ] ``` -Notice that I am passing `--perf-basic-prof` flag to the `d8` binary which instructs V8 to generate an auxiliary mappings file `/tmp/perf-$pid.map`. This file allows `perf report` to understand JIT generated machine code. +请注意,我将 `--perf-basic-prof` 标志传递给了 `d8` 二进制文件,它通知 V8 生成一个辅助映射文件 `/tmp/perf-$pid.map`。该文件允许 `perf report` 理解 JIT 生成的机器码。 -Here is what we get from `perf report --no-children` after zooming on the main execution thread: +这是我们切换到主执行线程后通过 `perf report --no-children` 获得的内容: ``` Overhead Symbol @@ -106,11 +106,11 @@ Overhead Symbol 0.56% v8::internal::IncrementalMarking::RecordWriteSlow ``` -Indeed, just like the [“Oxidizing Source Maps …”](https://hacks.mozilla.org/2018/01/oxidizing-source-maps-with-rust-and-webassembly/) post has stated the benchmark is rather heavy on sort: `doQuickSort` appears at the top of the profile and also several times down the list (which means that it was optimized/deoptimized few times). +事实上, 就像 [“Oxidizing Source Maps …”](https://hacks.mozilla.org/2018/01/oxidizing-source-maps-with-rust-and-webassembly/) 那篇博文说的那样,基准测试相当侧重于排序上:`doQuickSort` 出现在配置文件的顶部,并且在列表中还多次出现(这意味着它已被优化/去优化了几次)。 -### Optimizing Sorting - Argument Adaptation +### 优化排序 — 参数适配 -One thing that jumps out in the profiler are suspicious entries, namely `Builtin:ArgumentsAdaptorTrampoline` and `Builtin:CallFunction_ReceiverIsNullOrUndefined` which seem to be part of the V8 implementation. If we ask `perf report` to expand call chains leading to them then we will notice that these functions are also mostly invoked from the sorting code: +在性能分析器中出现了一些可疑内容,分别是 `Builtin:ArgumentsAdaptorTrampoline` 和 `Builtin:CallFunction_ReceiverIsNullOrUndefined`,它们似乎是V8实现的一部分。如果我们让 `perf report` 追加与它们关联的调用链信息,那么我们会注意到这些函数大多也是从排序代码中调用的: ``` - Builtin:ArgumentsAdaptorTrampoline @@ -127,9 +127,9 @@ One thing that jumps out in the profiler are suspicious entries, namely `Builtin + 1.49% *SourceMapConsumer_parseMappings ../dist/source-map.js:1894 ``` -It is time to look at the code. Quicksort implementation itself lives in [`lib/quick-sort.js`](https://github.com/mozilla/source-map/blob/c97d38b70de088d87b051f81b95c138a74032a43/lib/quick-sort.js) and it is invoked from parsing code in [`lib/source-map-consumer.js`](https://github.com/mozilla/source-map/blob/c97d38b70de088d87b051f81b95c138a74032a43/lib/source-map-consumer.js#L564-L568). Comparison functions used for sorting are [`compareByGeneratedPositionsDeflated`](https://github.com/mozilla/source-map/blob/c97d38b70de088d87b051f81b95c138a74032a43/lib/util.js#L334-L343) and [`compareByOriginalPositions`](https://github.com/mozilla/source-map/blob/c97d38b70de088d87b051f81b95c138a74032a43/lib/util.js#L296-L304). +现在是查看代码的时候了。快速排序实现本身位于 [`lib/quick-sort.js`](https://github.com/mozilla/source-map/blob/c97d38b70de088d87b051f81b95c138a74032a43/lib/quick-sort.js) 中,并通过解析 [`lib/source-map-consumer.js`](https://github.com/mozilla/source-map/blob/c97d38b70de088d87b051f81b95c138a74032a43/lib/source-map-consumer.js#L564-L568) 中的代码进行调用。用于排序的比较函数是 [`compareByGeneratedPositionsDeflated`](https://github.com/mozilla/source-map/blob/c97d38b70de088d87b051f81b95c138a74032a43/lib/util.js#L334-L343) 和 [`compareByOriginalPositions`](https://github.com/mozilla/source-map/blob/c97d38b70de088d87b051f81b95c138a74032a43/lib/util.js#L296-L304)。 -Looking at the definitions of these comparison functions and how they are invoked from quick-sort implementation reveals that the invocation site has mismatching arity: +通过查看这些比较函数是如何定义,以及如何在快速排序中调用,可以发现调用时的参数数量不匹配: ``` function compareByOriginalPositions(mappingA, mappingB, onlyCompareOriginal) { @@ -149,9 +149,9 @@ function doQuickSort(ary, comparator, p, r) { } ``` -Grepping through library sources reveals that outside of tests `quickSort` is only ever called with these two functions. +通过梳理源代码发现除了测试之外,`quickSort` 只被这两个函数调用过。 -What if we fix the invocation arity? +如果我们修复调用参数数量问题会怎么样? ``` diff --git a/dist/source-map.js b/dist/source-map.js @@ -169,7 +169,7 @@ index ade5bb2..2d39b28 100644 } ``` -> Note: I am doing edits directly in `dist/source-map.js` because I did not want to spend time figuring out the build process. +> 注意:因为我不想花时间搞清楚构建过程,所以我直接在 `dist/source-map.js` 中进行编辑。 ``` ╭─ ~/src/source-map/bench ‹perf-work› [Fix comparator invocation arity] @@ -184,17 +184,17 @@ console.timeEnd: iteration, 4140.963000 [Stats samples: 6, total: 24737 ms, mean: 4122.833333333333 ms, stddev: 132.18789657150916 ms] ``` -Just by fixing the arity mismatch we improved benchmark mean on V8 by 14% from 4774 ms to 4123 ms. If we profile the benchmark again we will discover that `ArgumentsAdaptorTrampoline` has completely disappeared from it. Why was it there in the first place? +仅仅通过修正参数不匹配,我们将 V8 的基准测试平均值从 4774 ms 提高到了 4123 ms,提升了 14% 的性能。如果我们再次对基准测试进行性能分析,我们会发现 `ArgumentsAdaptorTrampoline` 已经完全消失。为什么最初它会出现呢? -It turns out that `ArgumentsAdaptorTrampoline` is V8’s mechanism for coping with JavaScript’s variadic calling convention: you can call function that has 3 parameters with 2 arguments - in which case the third parameter will be filled with `undefined`. V8 does this by creating a new frame on the stack, copying arguments down and then invoking the target function: +事实证明,`ArgumentsAdaptorTrampoline` 是 V8 应对 JavaScript 可变参数调用约定的机制:您可以在调用有 3 个参数的函数时只传入 2 个参数 —— 在这种情况下,第三个参数将被填充为 `undefined`。V8 通过在堆栈上创建一个新的帧,接着向下复制参数,然后调用目标函数来完成此操作: -![Argument adaptation](https://mrale.ph/images/2018-02-03/argument-adaptation.png) +![参数适配](https://mrale.ph/images/2018-02-03/argument-adaptation.png) -If you have never heard about _execution stack_, checkout out [Wikipedia](https://en.wikipedia.org/wiki/Call_stack) and Franziska Hinkelmann’s [blog post](https://fhinkel.rocks/2017/10/30/Confused-about-Stack-and-Heap/). +> 如果您从未听说过**执行栈**,请查看[维基百科](https://en.wikipedia.org/wiki/Call_stack) 和 Franziska Hinkelmann 的[博客文章](https://fhinkel.rocks/2017/10/30/Confused-about-Stack-and-Heap/)。 -While such costs might be negligible for cold code, in this code `comparator` was invoked millions of times during benchmark run which magnified overheads of arguments adaptation. +尽管对于真实代码这类开销可以忽略不计,但在这段代码中,`comparator` 函数在基准测试运行期间被调用了数百万次,这扩大了参数适配的开销。 -An attentive reader might also notice that we are now explicitly passing boolean value `false` where previously an implicit `undefined` was used. This does seem to contribute a bit to the performance improvement. If we replace `false` with `void 0` we would get slightly worse numbers: +细心的读者可能还会注意到,现在我们明确地将以前使用隐式 `undefined` 的参数设置为布尔值 `false`。这看起来对性能改进有一定贡献。如果我们用 `void 0` 替换 `false`,我们会得到稍微差一点的测试数据: ``` diff --git a/dist/source-map.js b/dist/source-map.js @@ -225,7 +225,7 @@ console.timeEnd: iteration, 4209.427000 [Stats samples: 6, total: 25610 ms, mean: 4268.333333333333 ms, stddev: 106.38947316346669 ms] ``` -For what it is worth argument adaptation overhead seems to be highly V8 specific. When I benchmark my change against SpiderMonkey, I don’t see any significant performance improvement from matching the arity: +对于参数适配开销的争论似乎是高度针对 V8 的。当我在 SpiderMonkey 下对参数适配进行基准测试时,我看不到采用参数适配后有任何显着的性能提升: ``` ╭─ ~/src/source-map/bench ‹ d052ea4› [Disabled serialization part of the benchmark] @@ -238,19 +238,19 @@ Parsing source map [Stats samples: 8, total: 25397 ms, mean: 3174.625 ms, stddev: 360.4636187025859 ms] ``` -SpiderMonkey shell is now very easy to install thanks to Mathias Bynens’es [jsvu](https://github.com/GoogleChromeLabs/jsvu) tool. +多亏了 Mathias Bynens 的 [jsvu](https://github.com/GoogleChromeLabs/jsvu) 工具,SpiderMonkey shell 现在非常易于安装。 -Let us get back to the sorting code. If we profile the benchmark again we will notice that `ArgumentsAdaptorTrampoline` is gone for good from the profile, but `CallFunction_ReceiverIsNullOrUndefined` is still there. This is not surprising given that we are still calling the `comparator`. +让我们回到排序代码。如果我们再次分析基准测试,我们会注意到 `ArgumentsAdaptorTrampoline` 从结果中消失了,但 `CallFunction_ReceiverIsNullOrUndefined` 仍然存在。这并不奇怪,因为我们仍在调用 `comparator` 函数。 -### Optimizing Sorting - Monomorphisation +### 优化排序 — 单态(monomorphise) -What usually performs better than calling the function? Not calling it! +怎样比调用函数的性能更好呢?不调用它! -The obvious option here is to try and get the comparator inlined into the `doQuickSort`. However the fact that `doQuickSort` is called with different `comparator` functions stands in the way of inlining. +这里明显的选择是尝试将 `comparator` 内联到 `doQuickSort`。然而事实上使用不同 `comparator` 函数调用 `doQuickSort` 阻碍了内联。 -To work around this we can try to monomorphise `doQuickSort` by cloning it. Here is how we do it. +要解决这个问题,我们可以尝试通过克隆 `doQuickSort` 来实现单态(monomorphise)。下面是我们如何做到的。 -We start by wrapping `doQuickSort` and other helpers into `SortTemplate` function: +我们首先使用 `SortTemplate` 函数将 `doQuickSort` 和其他 helpers 包装起来: ``` function SortTemplate(comparator) { @@ -270,7 +270,7 @@ function SortTemplate(comparator) { } ``` -Then we can produce clones of our sorting routines by converting `SortTemplate` into a string and then parsing it back into a function via `Function` constructor: +然后,我们通过先将 `SortTemplate` 函数转换为一个字符串,再通过 `Function` 构造函数将它解析成函数,从而对我们的排序函数进行克隆: ``` function cloneSort(comparator) { @@ -280,7 +280,7 @@ function cloneSort(comparator) { } ``` -Now we can use `cloneSort` to produce a sort function for each comparator we are using: +现在我们可以使用 `cloneSort` 为我们使用的每个 `comparator` 生成一个排序函数: ``` let sortCache = new WeakMap(); // Cache for specialized sorts. @@ -294,7 +294,7 @@ exports.quickSort = function (ary, comparator) { }; ``` -Rerunning benchmark yields: +重新运行基准测试生成的结果: ``` ╭─ ~/src/source-map/bench ‹perf-work› [Clone sorting functions for each comparator] @@ -311,9 +311,9 @@ console.timeEnd: iteration, 3036.211000 [Stats samples: 8, total: 25423 ms, mean: 3177.875 ms, stddev: 181.87633161024556 ms] ``` -We can see that the mean time went from 4268 ms to 3177 ms (25% improvement). +我们可以看到平均时间从 4268 ms 变为 3177 ms(提高了 25%)。 -Profiling reveals the following picture: +分析器显示了以下图片: ``` Overhead Symbol @@ -339,11 +339,11 @@ Overhead Symbol 0.39% Builtin:KeyedLoadIC ``` -Overheads related to invoking `comparator` have now completely disappeared from the profile. +与调用 `comparator` 相关的开销现在已从结果中完全消失。 -At this point I became interested in how much time we spend _parsing_ mappings vs. _sorting_ them. I went into the parsing code and added few `Date.now()` invocations: +这个时候,我开始对我们花了多少时间来**解析**映射和对它们进行**排序**产生了兴趣。我进入到解析部分的代码并添加了几个 `Date.now()` 记录耗时: -I wanted to sprinkle `performance.now()` but SpiderMonkey shell apparently does not support it. +> 我想用 `performance.now()`,但是 SpiderMonkey shell 显然不支持它。 ``` diff --git a/dist/source-map.js b/dist/source-map.js @@ -381,7 +381,7 @@ index 75ebbdf..7312058 100644 }; ``` -This yielded: +这是生成的结果: ``` ╭─ ~/src/source-map/bench ‹perf-work U› [Clone sorting functions for each comparator] @@ -396,15 +396,15 @@ sortOriginal: 896.3589999999995 ^C ``` -Here is how parsing and sorting times look like in V8 and SpiderMonkey per benchmark iteration run: +以下是在 V8 和 SpiderMonkey 中每次迭代运行基准测试时解析映射和排序的耗时: -![Parse and Sort times](https://mrale.ph/images/2018-02-03/parse-sort-0.png) +![解析和排序耗时](https://mrale.ph/images/2018-02-03/parse-sort-0.png) -In V8 we seem to be spending roughly as much time parsing mappings as sorting them. In SpiderMonkey parsing is considerably faster - while sorting is slower. This prompted me to start looking at the parsing code. +在 V8 中,我们花费几乎和排序差不多的时间来进行解析映射。在 SpiderMonkey 中,解析映射速度更快,反而是排序较慢。这促使我开始查看解析代码。 -### Optimizing Parsing - Removing Segment Cache +### 优化解析 — 删除分段缓存 -Lets take a look at the profile again +让我们再看看这个性能分析结果 ``` Overhead Symbol @@ -430,7 +430,7 @@ Overhead Symbol 0.41% Builtin:RecordWrite ``` -Removing the JavaScript code we recognize leaves us with the following: +以下是在我们删除了已知晓的 JavaScript 代码之后剩下的内容: ``` Overhead Symbol @@ -451,7 +451,7 @@ Overhead Symbol 0.41% Builtin:RecordWrite ``` -When I started looking at call chains for individual entries I discovered that many of them pass through `KeyedLoadIC_Megamorphic` into `SourceMapConsumer_parseMappings`. +当我开始查看单个条目的调用链时,我发现其中很多都通过 `KeyedLoadIC_Megamorphic` 传入 `SourceMapConsumer_parseMappings`。 ``` - 1.92% v8::internal::StringTable::LookupStringIfExists_NoAllocate @@ -481,14 +481,14 @@ When I started looking at call chains for individual entries I discovered that m + 1.66% v8::internal::StringTable::LookupString ``` -This sort of call stacks indicated to me that the code is performing a lot of keyed lookups of form `obj[key]` where `key` is dynamically built string. When I looked at the parsing I discovered [the following code](https://github.com/mozilla/source-map/blob/693728299cf87d1482e4c37ae90f5bce8edf899f/lib/source-map-consumer.js#L496-L529): +这种调用堆栈向我表明,代码正在执行很多 obj[key] 的键控查找,同时 key 是动态构建的字符串。当我查看解析代码时,我发现了[以下代码](https://github.com/mozilla/source-map/blob/693728299cf87d1482e4c37ae90f5bce8edf899f/lib/source-map-consumer.js#L496-L529): ``` -// Because each offset is encoded relative to the previous one, -// many segments often have the same encoding. We can exploit this -// fact by caching the parsed variable length fields of each segment, -// allowing us to avoid a second parse if we encounter the same -// segment again. +// 由于每个偏移量都是相对于前一个偏移量进行编码的, +// 因此许多分段通常具有相同的编码。 +// 从而我们可以通过缓存每个分段解析后的可变长度字段, +// 如果我们再次遇到相同的分段, +// 可以不再对他进行解析。 for (end = index; end < length; end++) { if (this._charIsMappingSeparator(aStr, end)) { break; @@ -514,52 +514,51 @@ if (segment) { } ``` -This code is responsible for decoding Base64 VLQ encoded sequences, e.g. a string `A` would be decoded as `[0]` and `UAAAA` gets decoded as `[10,0,0,0,0]`. I suggest checking [this blog post](https://blogs.msdn.microsoft.com/davidni/2016/03/14/source-maps-under-the-hood-vlq-base64-and-yoda/) about source maps internals if you would like to understand the encoding itself better. +该代码负责解码 Base64 VLQ 编码序列,例如,字符串 `A` 将被解码为 `[0]`,并且 `UAAAA` 被解码为 `[10,0,0,0,0]`。如果你想更好地理解编码本身,我建议你查看这篇关于 source maps 内部实现细节的[博客文章](https://blogs.msdn.microsoft.com/davidni/2016/03/14/source-maps-under-the-hood-vlq-base64-and-yoda/)。 -Instead of decoding each sequence independently this code attempts to cache decoded segments: it scans forward until a separator (`,` or `;`) is found, then extracts substring from the current position to the separator and checks if we have previous decoded such segment by looking up the extracted substring in a cache - if we hit the cache we return cached segment, otherwise we parse and cache the segment in the cache. +该代码不是对每个序列进行独立解码,而是试图缓存已解码的分段:它向前扫描直到找到分隔符 (`,` or `;`),然后从当前位置提取子字符串到分隔符,并通过在缓存中查找提取的子字符串来检查我们是否有先前解码过的这种分段——如果我们命中缓存,则返回缓存的分段,否则我们进行解析,并将分段缓存到缓存中。 -Caching (aka [memoization](https://en.wikipedia.org/wiki/Memoization)) is a very powerful optimization technique - however it only makes sense when maintaining the cache itself and looking up cached results is cheaper than performing computation itself again. +缓存(又名[记忆化](https://en.wikipedia.org/wiki/Memoization))是一种非常强大的优化技——然而,它只有在维护缓存本身,以及查找缓存结果比再次执行计算这个过程开销小时才有意义。 -#### Abstract Analysis +#### 抽象分析 -Lets try to compare these two operations abstractly. +让我们尝试抽象地比较这两个操作。 -**On one hand is pure parsing:** +**一种是直接解析:** -Parsing segment looks at each character of a segment once. For each character it performs few comparisons and arithmetic operations to convert a base64 character into an integer value it represents. Then it performs few bitwise operations to incorporate this integer value into a larger integer value. Then it stores decoded value into an array and moves to the next part of the segment. Segments are limited to 5 elements. +解析分段只查看一个分段的每个字符。对于每个字符,它执行少量比较和算术运算,将 base64 字符转换为它所表示的整数值。然后它执行几个按位操作来将此整数值并入较大的整数值。然后它将解码值存储到一个数组中并移动到该段的下一部分。分段不得多于 5 个。 -**On the other hand caching:** +**另一种是缓存:** -1. To look up a cached value we traverse all the characters of the segment once to find its end; -2. We extract the substring, which requires allocation and potentially copying depending on how strings are implemented in a JS VM; -3. We use this string as a key in a dictionary, which: - 1. first requires VM to compute hash for this string (traversing it again and performing various bitwise operations on individual characters), this might also require VM to internalize the string (depending on implementation); - 2. then VM has to perform a hash table lookup, which requires probing and comparing key by value with other keys (which might require it again to look at individual characters in a string); +1. 为了查找缓存的值,我们遍历该段的所有字符以找到其结尾; +2. 我们提取子字符串,这需要分配资源和可能的复制,具体取决于 JS VM 中字符串的实现方式; +3. 我们使用这个字符串作为 Dictionary 对象中的键名,其中: + 1. 首先需要 VM 为该字符串计算散列值(再次遍历它并对单个字符执行各种按位操作),这可能还需要 VM 将字符串内部化(取决于实现方式); + 2. 那么 VM 必须执行散列表查找,这需要通过值与其他键进行探测和比较(这可能需要再次查看字符串中的单个字符); -Overall it seems that direct parsing should be faster, assuming that JS VM does good job with individual arithmetic/bitwise operations, simply because it looks at each individual character only once, where caching requires traversing the segment 2-4 times just to establish whether we hit the cache or not. +总的来看,直接解析应该更快,假设 JS VM 在独立运算/按位操作方面做得很好,仅仅是因为它只查看每个单独的字符一次,而缓存需要遍历该分段 2-4 次,以确定我们是否命中缓存。 -Profile seems to confirm this too: `KeyedLoadIC_Megamorphic` is a stub used by V8 to implement keyed lookups like `cachedSegments[str]` in the code above. +性能分析似乎也证实了这一点:`KeyedLoadIC_Megamorphic` 是 V8 用于实现上面代码中类似 `cachedSegments[str]` 等键控查找的存根。 -Based on these observations I set out to do few experiments. First I checked how big `cachedSegments` cache is at the end of the parsing. The smaller it is the more efficient caching would be. +基于这些观察,我着手做了几个实验。首先,我检查了解析结尾有多大的 `cachedSegments` 缓存。它越小缓存效率越高。 -Turns out that it grows quite big: +结果发现它变得相当大: ``` Object.keys(cachedSegments).length = 155478 ``` -#### Standalone Microbenchmarks +#### 独立微型基准测试(Microbenchmarks) -Now I decided to write a small standalone benchmark: +现在我决定写一个小的独立基准测试: ``` -// Generate a string with [n] segments, segments repeat in a cycle of length -// [v] i.e. segments number 0, v, 2*v, ... are all equal and so are -// 1, 1 + v, 1 + 2*v, ... -// Use [base] as a base value in a segment - this parameter allows to make -// segments long. +// 用 [n] 个分段生成一个字符串,分段在长度为 [v] 的循环中重复, +// 例如,分段数为 0,v,2 * v,... 都相等, +// 因此是 1, 1 + v, 1 + 2 * v, ... +// 使用 [base] 作为分段中的基本值 —— 这个参数允许分段很长。 // -// Note: the bigger [v], the bigger [cachedSegments] cache is. +// 注意:[v] 越大,[cachedSegments] 缓存越大。 function makeString(n, v, base) { var arr = []; for (var i = 0; i < n; i++) { @@ -568,15 +567,15 @@ function makeString(n, v, base) { return arr.join(';') + ';'; } -// Run function [f] against the string [str]. +// 对字符串 [str] 运行函数 [f]。 function bench(f, str) { for (var i = 0; i < 1000; i++) { f(str); } } -// Measure and report performance of [f] against [str]. -// It has [v] different segments. +// 衡量并报告 [f] 对 [str] 的表现。 +// 它有 [v] 个不同的分段。 function measure(v, str, f) { var start = Date.now(); bench(f, str); @@ -586,13 +585,13 @@ function measure(v, str, f) { async function measureAll() { for (let v = 1; v <= 256; v *= 2) { - // Make a string with 1000 total segments and [v] different ones - // so that [cachedSegments] has [v] cached segments. + // 制作一个包含 1000 个分段的字符串和 [v] 个不同的字符串 + // 因此 [cachedSegments] 具有 [v] 个缓存分段。 let str = makeString(1000, v, 1024 * 1024); let arr = encoder.encode(str); - // Run 10 iterations for each way of decoding. + // 针对每种解码方式运行 10 次迭代。 for (var j = 0; j < 10; j++) { measure(j, i, str, decodeCached); measure(j, i, str, decodeNoCaching); @@ -606,7 +605,7 @@ async function measureAll() { function nextTick() { return new Promise((resolve) => setTimeout(resolve)); } ``` -**以上为本文的第一部分,更多内容详见 [Maybe you don't need Rust and WASM to speed up your JS — 第二部分](https://github.com/xitu/gold-miner/blob/master/TODO1/maybe-you-dont-need-rust-to-speed-up-your-js-2.md)。** +**以上为本文的第一部分,更多内容详见 [或许你并不需要 Rust 和 WASM 来提升 JS 的执行效率 — 第二部分](https://github.com/xitu/gold-miner/blob/master/TODO1/maybe-you-dont-need-rust-to-speed-up-your-js-2.md)。** > 如果发现译文存在错误或其他需要改进的地方,欢迎到 [掘金翻译计划](https://github.com/xitu/gold-miner) 对译文进行修改并 PR,也可获得相应奖励积分。文章开头的 **本文永久链接** 即为本文在 GitHub 上的 MarkDown 链接。