From d007396bb4cbd45413ffaa0ca216551d3ee9b74d Mon Sep 17 00:00:00 2001 From: keepmovingljzy <75712827+keepmovingljzy@users.noreply.github.com> Date: Wed, 16 Dec 2020 16:02:43 +0800 Subject: [PATCH 1/5] =?UTF-8?q?=E6=8E=A2=E7=B4=A2=20Android=20=E4=B8=AD?= =?UTF-8?q?=E7=9A=84=20ConstraintLayout=202.0=20(#7749)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * 提交翻译信息 Helper Objects翻译为辅助布局,其实是和下面的文章内容相关的。 * Update exploring-constraintlayout-2-0-in-android.md * Update exploring-constraintlayout-2-0-in-android.md * Update exploring-constraintlayout-2-0-in-android.md * Update exploring-constraintlayout-2-0-in-android.md --- ...ploring-constraintlayout-2-0-in-android.md | 78 +++++++++---------- 1 file changed, 39 insertions(+), 39 deletions(-) diff --git a/article/2020/exploring-constraintlayout-2-0-in-android.md b/article/2020/exploring-constraintlayout-2-0-in-android.md index a0b7c1678ef..6fe2a085ba3 100644 --- a/article/2020/exploring-constraintlayout-2-0-in-android.md +++ b/article/2020/exploring-constraintlayout-2-0-in-android.md @@ -1,58 +1,58 @@ -> * 原文地址:[Exploring ConstraintLayout 2.0 in Android](https://medium.com/better-programming/exploring-constraintlayout-2-0-in-android-317584003ee9) -> * 原文作者:[Siva Ganesh Kantamani](https://medium.com/@sgkantamani) -> * 译文出自:[掘金翻译计划](https://github.com/xitu/gold-miner) -> * 本文永久链接:[https://github.com/xitu/gold-miner/blob/master/article/2020/exploring-constraintlayout-2-0-in-android.md](https://github.com/xitu/gold-miner/blob/master/article/2020/exploring-constraintlayout-2-0-in-android.md) -> * 译者: -> * 校对者: +> - 原文地址:[Exploring ConstraintLayout 2.0 in Android](https://medium.com/better-programming/exploring-constraintlayout-2-0-in-android-317584003ee9) +> - 原文作者:[Siva Ganesh Kantamani](https://medium.com/@sgkantamani) +> - 译文出自:[掘金翻译计划](https://github.com/xitu/gold-miner) +> - 本文永久链接:[https://github.com/xitu/gold-miner/blob/master/article/2020/exploring-constraintlayout-2-0-in-android.md](https://github.com/xitu/gold-miner/blob/master/article/2020/exploring-constraintlayout-2-0-in-android.md) +> - 译者:[keepmovingljzy](https://github.com/keepmovingljzy) +> - 校对者:[regon-cao](https://github.com/regon-cao)、[happySteveQi](https://github.com/happySteveQi) -# Exploring ConstraintLayout 2.0 in Android +# 探索 Android 中的 ConstraintLayout 2.0 ![Photo by [rafzin p](https://unsplash.com/@rafzin?utm_source=medium&utm_medium=referral) on [Unsplash](https://unsplash.com?utm_source=medium&utm_medium=referral)](https://cdn-images-1.medium.com/max/8942/0*goSdyD-yGtjIfUCP) -## Introduction +## 介绍 -`ConstraintLayout` is one of the powerful Jetpack libraries that allows developers to create complex and responsive UI quickly with interactive tooling built into Android Studio, in order to preview your XML. +`ConstraintLayout` 是功能强大的 Jetpack 库之一,开发人员可以使用 Android Studio 内置的交互工具结合该库快速创建复杂且响应迅速的布局,同时预览该 XML 布局。 -One of the significant advantages of `ConstraintLayout` is that we can build complex UI with a flat view hierarchy (no nested view groups). This result is drawing a lesser number of layers, which increases the performance. + `ConstraintLayout` 的一大优点是我们可以使用单层视图层次结构(无嵌套视图)构建复杂的布局。由于绘制的视图层数更少,性能自然就提高了。 -#### A few key features of ConstraintLayout +#### ConstraintLayout的一些关键功能 -1. We can position the views relatively one another. -2. We can center the views using bias or other views. -3. We can specify the aspect ratio of the view. -4. We can group and chain the views. +1. 我们可以指定视图的相对位置。 +2. 我们可以使用偏移或其他视图来居中视图。 +3. 我们可以指定视图的宽高比例。 +4. 我们可以对视图进行分组和链接。 -#### A few helper objects +#### 一些辅助布局 -Helper objects are the objects that are not visible to the user but come in handy to align developers’ views. +辅助布局是用户看不见的布局,但是可以方便开发人员对齐视图。 -* `Guideline` -* `Barrier` -* `Placeholder` +- `Guideline` +- `Barrier` +- `Placeholder` -To learn more about `ConstraintLayout` v1.0, read this [article](https://medium.com/better-programming/essential-components-of-constraintlayout-7f4026a1eb87). +要了解 `ConstraintLayout` v1.0 的更多信息,请阅读这篇[文章](https://medium.com/better-programming/essential-components-of-constraintlayout-7f4026a1eb87)。 ## ConstraintLayout 2.0 -Enough with the history lessons. It’s time to integrate v2.0 of `ConstraintLayout` into your project. To do so, add the following line under the dependencies tag in `build.gradle` file. +在充分了解完它的历史之后,是时候将`ConstraintLayout` v2.0 集成到您的项目中了。要引入它需要在 build.gradle 文件中添加如下依赖。 ``` implementation “androidx.constraintlayout:constraintlayout:2.0.1” ``` -This version brings several new features to `ConstraintLayout`; let’s start digging into them without any delay. +这个版本为`ConstraintLayout`带来了几个新功能。让我们马上来研究一下吧。 ## Flow -`Flow` is a new virtual layout added in v2, similar to the group in v1. It’s a combination of `Chain` and `Group`, with special powers. In simple words, `Flow` chains the views with dynamic size at runtime. +`Flow` 是 v2 版本中新增的虚拟布局方式,类似于v1版本中的 `group` 。它是 `Chain` 和 `Group` 布局的一个结合,具有特殊的功能。简而言之就是 `Flow` 在运行时根据布局的大小动态链接视图。 -Similar to `Group`, `Flow` also takes the reference view IDs and creates a `Chain` behavior. One of the vital advantages that `Flow` offers is `wrapMode` (a way to configure the views when they overflow). Out of the box, we’ve three modes to choose from: `none`, `aligned`, and `chain`. +与 `Group` 类似,`Flow` 同样也是通过获取视图的ID并创建 `Chain` 所具有的行为。使用 `Flow` 布局重要优势之一是 `wrapMode`(一种在视图溢出时配置视图的方法)。添加该布局属性即可使用,我们提供三种模式供您选择:`none`,`aligned` 和 `chain`。 ![Flow mode : none, chain and aligned](https://cdn-images-1.medium.com/max/2000/0*RK2f87Te_cm259Gg) -* `[wrap none](https://developer.android.com/reference/androidx/constraintlayout/helper/widget/Flow#wrap_none) `: Creates a chain out of the referenced views -* `[wrap chain](https://developer.android.com/reference/androidx/constraintlayout/helper/widget/Flow#wrap_chain)` : Creates multiple chains (one after the other) only if the referenced views do not fit -* `[wrap aligned](https://developer.android.com/reference/androidx/constraintlayout/helper/widget/Flow#wrap_aligned)` : Similar to `wrap chain`, but will align the views by creating rows and columns +- `[wrap none](https://developer.android.com/reference/androidx/constraintlayout/helper/widget/Flow#wrap_none) `: 所有引用的视图形成一条链 +- `[wrap chain](https://developer.android.com/reference/androidx/constraintlayout/helper/widget/Flow#wrap_chain)` : 仅当引用的视图不适合时才创建多个链(一个接一个) +- `[wrap aligned](https://developer.android.com/reference/androidx/constraintlayout/helper/widget/Flow#wrap_aligned)` : 与 `wrap chain` 类似,但是将通过创建行和列来对齐视图 ```XML @@ -116,17 +116,17 @@ Similar to `Group`, `Flow` also takes the reference view IDs and creates a `Chai ``` -This feature seems simple, but we can create flow layouts using `ConstraintLayout` 2.0. We no longer need to use flow layout libraries anymore. +这个功能看起来很简单,不过我们可以使用 `ConstraintLayout` 2.0 创建流式布局。不再需要使用其他流式布局库。 -Before `ConstraintLayout` 2.0, we had to calculate the remaining space after rendering each view to make sure the next view fits in there, else we’ve to align it in the next line. But now we need to use `Flow`. +在 `ConstraintLayout` 2.0 之前,我们必须在渲染每个视图之后计算剩余空间,来确定下一个视图是否有足够的空间适合存放,否则我们必须将其对齐到下一行。但是现在我们只要使用 `Flow` 就可以了。 -To learn more about `Flow`, [read the official docs](https://developer.android.com/reference/androidx/constraintlayout/helper/widget/Flow). +要了解有关 `Flow` 的更多信息,请[阅读官方文档](https://developer.android.com/reference/androidx/constraintlayout/helper/widget/Flow)。 ## Layer -`Layer` is the new helper in `ConstraintLayout` 2.0, similar to `Guideline`s and `Barrier`s. We can create a virtual layer like a group with multiple referenced views. Once the views are referenced, we can apply transformations on those views using `Layer`. +`Layer` 是 `ConstraintLayout` 2.0 中的新增的辅助布局方式,类似于 `Guideline` 和 `Barrier` 的辅助布局方式。我们可以通过创建一个虚拟图层,类似与一个父视图(Layer)中有多个子视图的方式。一旦子视图引用了父视图,我们就可以使用 `Layer` 对这些视图进行转换。 -It’s similar to a `Group` helper, where we can bind multiple views and perform basic actions like visibility (visible and gone). With `Layer`, we can take it to the next level. We can apply animations to `rotate`, `translate`, or `scale `multiple views together. +它类似于 `Group` 布局,我们可以通过它绑定多个视图并设置其可见性(可见和消失)等基本操作。一旦视图被 `Layer` 引用,我们的视图就可以使用 `Layer` 给我们带来的那些转换功能了。我们可以对多个视图做旋转,平移,缩放或者组合动画。 ```XML 如果发现译文存在错误或其他需要改进的地方,欢迎到 [掘金翻译计划](https://github.com/xitu/gold-miner) 对译文进行修改并 PR,也可获得相应奖励积分。文章开头的 **本文永久链接** 即为本文在 GitHub 上的 MarkDown 链接。 ---- +------ > [掘金翻译计划](https://github.com/xitu/gold-miner) 是一个翻译优质互联网技术文章的社区,文章来源为 [掘金](https://juejin.im) 上的英文分享文章。内容覆盖 [Android](https://github.com/xitu/gold-miner#android)、[iOS](https://github.com/xitu/gold-miner#ios)、[前端](https://github.com/xitu/gold-miner#前端)、[后端](https://github.com/xitu/gold-miner#后端)、[区块链](https://github.com/xitu/gold-miner#区块链)、[产品](https://github.com/xitu/gold-miner#产品)、[设计](https://github.com/xitu/gold-miner#设计)、[人工智能](https://github.com/xitu/gold-miner#人工智能)等领域,想要查看更多优质译文请持续关注 [掘金翻译计划](https://github.com/xitu/gold-miner)、[官方微博](http://weibo.com/juejinfanyi)、[知乎专栏](https://zhuanlan.zhihu.com/juejinfanyi)。 From a1f762ddeec15086634a98f15c58fc5e45a22a02 Mon Sep 17 00:00:00 2001 From: lsvih Date: Thu, 17 Dec 2020 11:11:20 +0800 Subject: [PATCH 2/5] Create will-webtransport-replace-webrtc-in-near-future.md (#7776) * Create will-webtransport-replace-webrtc-in-near-future.md * Update will-webtransport-replace-webrtc-in-near-future.md --- ...transport-replace-webrtc-in-near-future.md | 88 +++++++++++++++++++ 1 file changed, 88 insertions(+) create mode 100644 article/2020/will-webtransport-replace-webrtc-in-near-future.md diff --git a/article/2020/will-webtransport-replace-webrtc-in-near-future.md b/article/2020/will-webtransport-replace-webrtc-in-near-future.md new file mode 100644 index 00000000000..e2e2ffabcb7 --- /dev/null +++ b/article/2020/will-webtransport-replace-webrtc-in-near-future.md @@ -0,0 +1,88 @@ +> * 原文地址:[Will WebTransport Replace WebRTC in Near Future?](https://blog.bitsrc.io/will-webtransport-replace-webrtc-in-near-future-436c4f7f3484) +> * 原文作者:[Charuka E Bandara](https://medium.com/@charuka95) +> * 译文出自:[掘金翻译计划](https://github.com/xitu/gold-miner) +> * 本文永久链接:[https://github.com/xitu/gold-miner/blob/master/article/2020/will-webtransport-replace-webrtc-in-near-future.md](https://github.com/xitu/gold-miner/blob/master/article/2020/will-webtransport-replace-webrtc-in-near-future.md) +> * 译者: +> * 校对者: + +# Will WebTransport Replace WebRTC in Near Future? + +![Photo by [Gabriel Benois](https://unsplash.com/@gabrielbenois) on [Unsplash](https://unsplash.com/)](https://cdn-images-1.medium.com/max/2000/0*4MaUNhpUTKLuBX14) + +Video and audio conferencing on the web have become popular in the modern era. In the good old days, it required an intermediary server to transfer data between two parties. Since it was slow and grainy, there were many innovations to improve the underlying technology to overcome its limitations. + +![Diagram by the author: The basic architecture of the WebSockets](https://cdn-images-1.medium.com/max/2000/1*UZMYYV48pGhgjkcEh0lPNg.png) + +During 2010 the Google engineers introduced WebRTC to solve some of these challenges. Today we use it almost everywhere. + +## Introducing WebRTC + +WebRTC, Web Real-time communication is the protocol (collection of APIs) that allows direct communication between browsers. These APIs support exchanging files, information, or any data. It sounds like WebSockets. However, it is not. + +![Diagram by the author: The basic architecture of WebRTC](https://cdn-images-1.medium.com/max/2140/1*ZtTqRURkQA2nqRgrrCjwTg.png) + +As we discussed, communication happens between browsers without requiring the direct involvement of the server. However, the server needs to facilitate sharing each other’s IP address at the beginning. Still, it’s faster than communicating through a server. + +Then you might wonder, why do we need a new protocol? The reason is that as time passes, technologies evolve, and we can find some of its limitations coming to the surface. + +## So what are the limitations with WebRTC? + +#### The head of the line blocking (HOL) at the TCP layer + +This is the same problem that we have with HTTP/2. When using HTTP/2, multiple requests will be sent to servers as encapsulated streams. Therefore in a given instance, multiple requests will use a single TCP connection. + +Suppose two GET requests are having six packets each. While sending a GET request, if one packet is damaged or lost during the transmission, the TCP stream makes the entire stream wait until that packet is re-transmitted and received by the server. So the TCP HOL will occur. + +> Don’t get confused between the TCP vs HTTP HOL. These are two things. Only TCP HOL becomes an issue here. + +Since WebRTC is built on top of HTTP/2, this issue can occur in any scenario such as file transmission, video conferencing. + +#### Clients have to initiate the connection + +In this case, the limitation is tricky because the client has to initiate a connection to avoid networking issues or security issues. That is how things work. The challenge in WebRTC is that no one else can send any information without the client’s awareness. + +However, HTTP push tried to get rid of this by creating a new stream. Here, the server creates a new stream and then push content to the client. However, it was not successful. So. lately, Google has removed that approach from Chrome. + +So, to address these issues, here comes the all-new WebTransport. + +## What is WebTransport? + +WebTransport is a pluggable protocol for client-server communication, built on top of HTTP/2, HTTP/3, and QUIC. It is designed to replace WebSockets going ‘QUIC-native.’ + +> You can think of it as WebRTC, but optimize for 80/20 Rule. + +> QUIC is a web API that uses the QUIC protocol in a bidirectional, non-HTTP transport, which is served over UDP, similar to an independent TCP that drastically reduced connection setup latency. The main functionality is two-way communications between a web client and a QUIC server with steam APIs. + +Besides, WebTransport has the support for multiple streams, unidirectional streams, out-of-order delivery, reliable and unreliable transport. + +## Overcoming the challenges with WebTransport + +#### WebTransport is on top of QUIC + +WebTransport is an interface that can talk to HTTP/2 based, HTTP/3 based, and QUIC based protocols. So, it has the advantage that HTTP and non-HTTP traffic can share the same network port. + +Besides, since QUIC operates over UDP, each stream is independent. Switching to UDP is an advantage to reduce the impact of TCP head of line blocking. Therefore any lost packet only halts the stream that it belongs to, while the other streams can go on. + +#### WebTransport supports multiple protocols + +WebTransport supports unidirectional streams (indefinitely long streams of bytes in one direction), bidirectional streams (full-duplex streams), and datagrams (small/out-of-order/unreliable messages). So, there are some key usages of WebTransport which are, + +* WebTransport can request over HTTP and receiving data **pushed** **out-of-order** (reliably and unreliable) over the same network connection. +* WebTransport can send data (reliable and unreliable) to the server using a QUIC unidirectional send stream. +* WebTransport can receive data pushed from the server using unidirectional receive streams. + +So in the gaming industry, WebTransport will play a significant role because of its capability of receiving media pushed from the server with minimal latency. + +## Conclusion + +In my opinion, WebRTC is doing quite well, and people use it for many years now. Obviously, with the changing technological world, there are situations where even the latency of milliseconds matters. As we discussed, industries like online gaming would reap the clear benefits of WebTransport. + +> WebSocket based WebRTC is not the fastest approach anymore. + +In this case, the powerful WebTransport will address the issue of Web Socket based WebRTC. By considering all these advantages, I believe WebTransport will replace WebRTC. But it will take some time for people to adapt. + +> 如果发现译文存在错误或其他需要改进的地方,欢迎到 [掘金翻译计划](https://github.com/xitu/gold-miner) 对译文进行修改并 PR,也可获得相应奖励积分。文章开头的 **本文永久链接** 即为本文在 GitHub 上的 MarkDown 链接。 + +--- + +> [掘金翻译计划](https://github.com/xitu/gold-miner) 是一个翻译优质互联网技术文章的社区,文章来源为 [掘金](https://juejin.im) 上的英文分享文章。内容覆盖 [Android](https://github.com/xitu/gold-miner#android)、[iOS](https://github.com/xitu/gold-miner#ios)、[前端](https://github.com/xitu/gold-miner#前端)、[后端](https://github.com/xitu/gold-miner#后端)、[区块链](https://github.com/xitu/gold-miner#区块链)、[产品](https://github.com/xitu/gold-miner#产品)、[设计](https://github.com/xitu/gold-miner#设计)、[人工智能](https://github.com/xitu/gold-miner#人工智能)等领域,想要查看更多优质译文请持续关注 [掘金翻译计划](https://github.com/xitu/gold-miner)、[官方微博](http://weibo.com/juejinfanyi)、[知乎专栏](https://zhuanlan.zhihu.com/juejinfanyi)。 From dce17ec9cec04d97198879f1cf6e4729853d3e76 Mon Sep 17 00:00:00 2001 From: lsvih Date: Thu, 17 Dec 2020 11:16:19 +0800 Subject: [PATCH 3/5] Create fundamentals-of-caching-web-applications.md (#7778) * Create fundamentals-of-caching-web-applications.md * Update fundamentals-of-caching-web-applications.md * Update fundamentals-of-caching-web-applications.md --- ...undamentals-of-caching-web-applications.md | 124 ++++++++++++++++++ 1 file changed, 124 insertions(+) create mode 100644 article/2020/fundamentals-of-caching-web-applications.md diff --git a/article/2020/fundamentals-of-caching-web-applications.md b/article/2020/fundamentals-of-caching-web-applications.md new file mode 100644 index 00000000000..7dd545f9726 --- /dev/null +++ b/article/2020/fundamentals-of-caching-web-applications.md @@ -0,0 +1,124 @@ +> * 原文地址:[Fundamentals of Caching Web Applications](https://blog.bitsrc.io/fundamentals-of-caching-web-applications-a215c4333cbb) +> * 原文作者:[Mahdhi Rezvi](https://medium.com/@mahdhirezvi) +> * 译文出自:[掘金翻译计划](https://github.com/xitu/gold-miner) +> * 本文永久链接:[https://github.com/xitu/gold-miner/blob/master/article/2020/fundamentals-of-caching-web-applications.md](https://github.com/xitu/gold-miner/blob/master/article/2020/fundamentals-of-caching-web-applications.md) +> * 译者: +> * 校对者: + +# Fundamentals of Caching Web Applications + +![Photo by [Yuiizaa September](https://unsplash.com/@yuiizaa?utm_source=medium&utm_medium=referral) on [Unsplash](https://unsplash.com?utm_source=medium&utm_medium=referral)](https://cdn-images-1.medium.com/max/9458/0*0OwYJoWVEwP_rPjk) + +Web applications have come a long way from the early days. A typical web application development goes through several stages of design, development, and testing before it is ready for release. As soon as your web app gets released, it will be accessed by real-life users on a daily basis. If your web app becomes popular, it will be accessed by at least several million users on a daily basis. Although this sounds exciting, this would incur a lot of running costs. + +Apart from cost, complex calculations and R/W operations can take time for completion. This means that your user should wait for the completion of the operation which can be a bad user experience if the wait becomes too long. + +System designers use several strategies to rectify these issues. **Caching** is one of them. Let’s have a better look at caching. + +## What is Caching in Web Applications? + +A web cache is simply a component that is capable of storing HTTP responses temporarily which can be used for subsequent HTTP requests as long as certain conditions are met. + +Web caching is a key design feature of the HTTP protocol intended to reduce network traffic while enhancing the presumed responsiveness of the system as a whole. Caches are found at every stage of the content journey from the original server to the browser. + +In simple terms, web caching enables you to reuse HTTP responses that have been stored in the cache, with HTTP requests of similar nature. Let’s think of a simple example where a user requests a certain type of product(books) from the server. Assume that this whole process takes around 670 milliseconds to complete. If the user, later in the day, does this same query, rather than doing the same computation again and spending 670 milliseconds, the HTTP response stored in the cache can be returned to the user. This will reduce the response time drastically. In real-life scenarios, this can come under 50 milliseconds. + +## Advantages of Caching + +There are several advantages of caching from the perspective of the consumer and provider. + +#### Decreased Bandwith Costs + +As mentioned before, content can be cached at various points in the path of the HTTP request from the consumer to the server. When the content is cached closer to the user, it would result in the request traveling a lesser distance that would lead to a reduction of bandwidth costs. + +#### Improved Responsiveness + +Since the caches are maintained closer to the user, this removes the need for a full round trip to the server. The closer the cache, the more instantaneous the response would be. This would directly have a positive impact on user experience. + +#### Increased Performance on the Same Hardware + +Due to the similar requests being catered by the cache, your server hardware can focus on requests which need the processing power. Aggressive caching can further increase this performance enhancement. + +#### Content Availability Even During Network Failures + +When certain cache policies are used, content can be served to end users from the cache for a short period of time, in the event of a server failure. This can be very helpful as it allows consumers to perform basic tasks without the failure of the origin server affecting them. + +## Disadvantages of Caching + +Similar to the advantages, there are several disadvantages to caching as well. + +#### The Cache is Deleted During Server Restart + +Whenever your server is restarted, your cache data gets deleted as well. This is because cache is volatile and is lost when power is lost. But you can maintain policies where you write the cache to your disk at regular intervals to persist the cached data even during server restart. + +#### Serving Stale Data + +One of the main issues of caching is serving stale data. Stale data is data that is not updated and contains a previous version of the data. If you’ve cached a query of products, but in the meantime, the product manager has deleted four products, the users will get listings to products that don’t exist. This can be complicated to identify and fix. + +## Where Can You Cache? + +As previously mentioned, content can be cached at various locations in the path of the request. + +#### Browser Cache + +Web browsers retain a small cache of their own. Usually, the browser sets the policy that determines the most important items to cache. This could be user-specific content or content that is perceived to be costly to download and likely to be recovered. To disable the caching of a resource, you can set the response header as below. + +``` +Cache-Control: no-store +``` + +#### Intermediary Caching Proxies + +Any server that lies between the consumer device and your server infrastructure can cache content as desired. These caches may be maintained by ISPs or other independent parties. + +#### Reverse Cache + +You can implement your own cache infrastructure in your backend services. In this approach, content can be served from the point of external contact, without proceeding through to your backend servers. You can use services like Redis, Memcache to achieve this. + +Read more about the `Cache-Control` header over [here](https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching#Controlling%20caching). + +![Source: [MDN Docs](https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching)](https://cdn-images-1.medium.com/max/2000/0*QaYpasQXpfIKwiTV.png) + +## What Can Be Cached? + +All types of content are cacheable. Just because they are cacheable, does not mean that they should be cached. + +#### Cache Friendly + +The below content are more cache-friendly as they do not change frequently and therefore can be cached for longer periods. + +* Media Content +* JavaScript libraries +* Style sheets +* Images, Logos, Icons + +#### Moderately Cache Friendly + +The below content can be cached, but extra caution should be taken as these type of content can change regularly. + +* Frequently modified JS and CSS +* HTML pages +* Content request with authentication cookies + +#### Never Cache + +The below type of content should never be cached as they can lead to security concerns. + +* Highly sensitive content such as banking information, etc. +* User-specific should most often not be cached as it is regularly updated. + +## Why Do You Need a Caching Strategy? + +In a real-world situation, you cannot implement aggressive caching as it would probably return stale data most of the time. This is why a custom made caching policy should be in place to balance between implementing long-term caching and responding to the demands of a changing site by implementing suitable cache eviction algorithms. Since each system is unique and has its own set of requirements, adequate time should be spent on creating cache policies. + +The key to a perfect caching policy is to tread a fine line that promotes aggressive caching whenever possible while leaving openings to invalidate entries in the future when changes are made. + +--- + +The intention of this article is to provide you with an introduction to the fundamentals of caching in web applications. I have skipped several topics such as control headers, caching infrastructures, guidelines for developing cache policies, etc as they are a tad too advanced for this introduction. + +> 如果发现译文存在错误或其他需要改进的地方,欢迎到 [掘金翻译计划](https://github.com/xitu/gold-miner) 对译文进行修改并 PR,也可获得相应奖励积分。文章开头的 **本文永久链接** 即为本文在 GitHub 上的 MarkDown 链接。 + +--- + +> [掘金翻译计划](https://github.com/xitu/gold-miner) 是一个翻译优质互联网技术文章的社区,文章来源为 [掘金](https://juejin.im) 上的英文分享文章。内容覆盖 [Android](https://github.com/xitu/gold-miner#android)、[iOS](https://github.com/xitu/gold-miner#ios)、[前端](https://github.com/xitu/gold-miner#前端)、[后端](https://github.com/xitu/gold-miner#后端)、[区块链](https://github.com/xitu/gold-miner#区块链)、[产品](https://github.com/xitu/gold-miner#产品)、[设计](https://github.com/xitu/gold-miner#设计)、[人工智能](https://github.com/xitu/gold-miner#人工智能)等领域,想要查看更多优质译文请持续关注 [掘金翻译计划](https://github.com/xitu/gold-miner)、[官方微博](http://weibo.com/juejinfanyi)、[知乎专栏](https://zhuanlan.zhihu.com/juejinfanyi)。 From edef2d1e588b09e22a6ff41858e0df2b8bf0fa1d Mon Sep 17 00:00:00 2001 From: lsvih Date: Thu, 17 Dec 2020 11:27:02 +0800 Subject: [PATCH 4/5] Create why-is-my-data-drifting.md (#7780) --- article/2020/why-is-my-data-drifting.md | 102 ++++++++++++++++++++++++ 1 file changed, 102 insertions(+) create mode 100644 article/2020/why-is-my-data-drifting.md diff --git a/article/2020/why-is-my-data-drifting.md b/article/2020/why-is-my-data-drifting.md new file mode 100644 index 00000000000..789aa303d1d --- /dev/null +++ b/article/2020/why-is-my-data-drifting.md @@ -0,0 +1,102 @@ +> * 原文地址:[Why Is My Data Drifting?](https://medium.com/data-from-the-trenches/why-is-my-data-drifting-a8ecc74920a5) +> * 原文作者:[Simona Maggio](https://medium.com/@maggio.simona) +> * 译文出自:[掘金翻译计划](https://github.com/xitu/gold-miner) +> * 本文永久链接:[https://github.com/xitu/gold-miner/blob/master/article/2020/why-is-my-data-drifting.md](https://github.com/xitu/gold-miner/blob/master/article/2020/why-is-my-data-drifting.md) +> * 译者: +> * 校对者: + +# Why Is My Data Drifting? + +Machine learning (ML) models deployed in production are usually paired with systems to monitor possible dataset drift. MLOps systems are designed to trigger alerts when drift is detected, but in order to make decisions about the strategy to follow next, we also need to understand what is actually changing in our data and what kind of abnormality the model is facing. + +This post describes how to leverage a domain-discriminative classifier to identify the most atypical features and samples and shows how to use SHapley Additive exPlanations (SHAP) to boost the analysis of the data corruption. + +![Atypical fall leaf (by [Jeremy Thomas](https://unsplash.com/@jeremythomasphoto?utm_source=medium&utm_medium=referral) on [Unsplash](https://unsplash.com?utm_source=medium&utm_medium=referral))](https://cdn-images-1.medium.com/max/10368/0*NRGXc4k1hPb5d4jw) + +## A Data Corruption Scenario + +Aberrations can appear in incoming data for many reasons: noisy data collection, poorly performing sensors, data poisoning attacks, and more. These examples of data corruptions are a type of covariate shift that can be efficiently captured by drift detectors analyzing the feature distributions. For a refresher on dataset shift, have a look at [this blog post](https://medium.com/data-from-the-trenches/a-primer-on-data-drift-18789ef252a6) [1]. + +Now, imagine being a data scientist working on the popular [adult dataset](https://www.openml.org/d/1590), trying to predict whether a person earns over $50,000 a year, given their age, education, job, etc. + +We train a predictor for this binary task on a random split of the dataset, constituting our source training set. We are happy with the trained model and deploy it in production together with a drift monitoring system. + +The remaining part of the adult dataset represents the dataset provided at production time. Unfortunately, a part of this target-domain dataset is corrupted. + +![Figure 1: Constant values used to corrupt 25% of the target-domain samples.](https://cdn-images-1.medium.com/max/2000/0*-5yCxLV4zV0SIsNR) + +For illustration purposes, we poison 25% of the target-domain dataset, applying a constant replacement shift. This corrupts a random set of features, namely **race**, **marital_status**, **fnlwgt,** and **education_num**. The numeric features are corrupted by replacing their value with the median of the feature distribution, while the categorical features are corrupted by replacing their value with a fixed random category. + +In this example, 25% of the target-domain samples have constant values for the four drifted features as shown in Figure 1. The drift detectors deployed to monitor data changes correctly trigger an alert. Now what? + +## How to Find the Most Drifted Samples? + +A domain-discriminative classifier can rescue us. This secondary ML model, trained on half the source training set and half the new target-domain dataset, aims to predict whether a sample belongs to the **Old Domain** or the **New Domain**. + +A domain classifier is actually a popular drift detector, as detailed [here](https://medium.com/data-from-the-trenches/towards-reliable-ml-ops-with-drift-detectors-5da1bdb29c63) [2], thus the good news is that it’s not only good at detecting changes, but also at identifying atypical samples. If you already have a trained domain classifier in your monitoring system, you get a bonus novelty detector for free. + +As a first guess, we can use the domain classifier probability score for the class **New Domain** as a **drift score** and highlight the top-k most atypical samples. But if there are hundreds of features, it is hard to make sense of the extracted top atypical samples. We need to narrow down the search by identifying the most drifted features. + +In order to achieve this, we can for instance assume that the features most important for the domain discrimination are more correlated with the corruption. In this case, we can use a feature importance measure suitable for the domain classifier —for instance, a Mean Decrease of Impurity (MDI) for a random forest domain classifier. + +There are many feature importance measures in the ML space and all have their own limitations. This is also one of the reasons why Shapley values have been introduced in the ML world through SHapley Additive exPlanations. For an introduction on the Shapley values and SHAP, have a look at the awesome [**Interpretable Machine Learning** book](https://christophm.github.io/interpretable-ml-book/shapley.html) [3]. + +## Explaining the Drift + +Using the [SHAP package](https://github.com/slundberg/shap) [4], we can explain the domain classifier outcome, specifically finding for a given sample how the different features contribute to the probability of belonging to the **New domain.** Looking at the Shapley values of the top atypical samples, we can thus understand what makes the domain classifier predict that a sample is indeed a novelty, thus uncovering the drifted features. + +![Figure 2. Comparison of importance ranks attributed to features: the lower the rank, the more drifted the feature is considered to be. The SHAP rank is based on average absolute Shapley values per feature in the whole test set. The domain classifier rank is given by the Mean Decrease of Impurity due to a feature.](https://cdn-images-1.medium.com/max/2110/0*odM1VlEqPkGFGFMv) + +In our adult dataset, in Figure 2 we compare the domain classifier feature importance and the SHAP feature importance (the average of all Shapley values in absolute value for a feature). We observe that they assign different ranks to the features, with SHAP correctly capturing the top-3 corrupted features. The choice of the importance measure has an impact on the identification of drifted features, thus it is essential to prefer techniques more reliable than the impurity criterion. + +Instead of arbitrarily selecting the top-3 drifted features, one way of identifying drifted features is to compare the feature importance with a uniform importance (1/n_features) corresponding to undistinguishable domains. Then, we would spot the features that stand out, like in Figure 3 below, where **race**, **marital_status** and **fnlwgt** clearly show up. + +![Figure 3. Average absolute Shapley values per feature in the target-domain dataset. Features with importance higher than the uniform importance (black line) are likely to be drifted.](https://cdn-images-1.medium.com/max/3200/0*ss3XMB38A7knxllZ) + +If we plot the Shapley values for the entire target-domain dataset in Figure 4, highlighting all the true drifted samples in red, we can see that the Shapley values are quite expressive to find both the atypical samples and the atypical features. In each row of the summary plot, the same target-domain samples are represented as dots at the location of their Shapley values for a specific feature shown on the left. Here, we can observe the bimodal distributions for the atypical features selected previously (**race**, **marital_status** and **fnlwgt**), as well as for **education_num,** which is the last drifted feature to catch. + +![Figure 4. SHAP summary plot of the feature attribution for the target-domain samples. In each row, the same target-domain samples are represented as colored dots at the location of their Shapley values for a specific feature shown on the left. The color indicates whether the sample is truly atypical (red) or normal (blue).](https://cdn-images-1.medium.com/max/2100/0*H3b0e5CYaUpv_ry7) + +By the efficiency property of the Shapley values, the domain classifier predicted score for a sample is the sum of its Shapley values for all the features. Thus, from the plot in Figure 4 above we can infer that the uncorrupted features have little (but non-zero) impact in predicting the **New domain** class, as their Shapley values are concentrated around zero, especially for the atypical samples (red dots). + +## Directly Visualizing the Drifted Samples + +We’re ready now to wrap up and use those tools to actually highlight the suspicious samples and the atypical features. + +First let’s have a look at the top-10 most atypical features and samples, as we could be lucky enough to perhaps visually understand what’s going on. + +![Figure 5. Top-10 most atypical samples according to the domain classifier probability score for the class New domain. Columns are sorted by the SHAP-based feature importance.](https://cdn-images-1.medium.com/max/2000/0*sUF74nnxq9_PSXEo) + +In this specific situation, we would probably recognize easily (and find suspicious) that all retrieved samples have constant values for some features, but this might not be the case in general. However, in some drift scenarios where the shift occurs at the distribution level, such as selection bias, looking at individual samples is not very useful. They would just be regular samples from a subpopulation of the source dataset, thus technically not an aberration. However, as we cannot know beforehand what kind of shift we are facing, it’s still a good idea to have a look at the individual samples ! + +A SHAP decision plot displaying the top-100 most atypical samples, like the one in Figure 6, where each curve represents one atypical sample, can help us see what is drifting. We also see it going towards higher domain classifier drift probabilities. + +![Figure 6. SHAP decision plot. Each curve represents one of the top-100 most atypical samples. The top features are the most contributing to make the sample atypical and ‘pushing’ the domain classifier probability for New domain towards higher values.](https://cdn-images-1.medium.com/max/2350/0*TMcQpeyCp2sajxxu) + +In this case, all the aberrations are due to the same corrupted features, but in the instance where groups of samples are drifting for different reasons, the SHAP decision plot would highlight these trends very efficiently. + +Of course nothing can replace a standard analysis of feature distributions, especially now that we can select the most suspicious features to focus on. In Figure 8, we can look at the distribution of the drifted features for the top-100 atypical samples in red, and compare them with the baseline of samples from the source domain training set. As discriminative analysis is more intuitive for humans, this is a simple way to highlight what kind of drift is going on in the new dataset. In this example, looking at the feature distributions we can immediately spot that feature values are constant and don’t respect the expected distribution. + +![Figure 8. Distributions of the drifted features for the top-100 atypical samples (red) against the normal baseline (blue) from the source dataset.](https://cdn-images-1.medium.com/max/2052/0*d1xcJT1QnIHlzi7s) + +## Takeaways + +When monitoring deployed models for unexpected data changes, we can take advantage of drift detectors, such as the domain classifier, to also identify atypical samples in case of drift alert. We can streamline the analysis of a drift scenario by highlighting the most drifted features to investigate. This selection can be done thanks to feature importance measures of the domain classifier. + +Beware, though, of possible inconsistencies of feature importance measures and, if you can afford more computation, consider using SHAP for a more accurate drift-related relevance measure. Finally, combining useful SHAP visual tools with a discriminative analysis of drifted feature distributions with respect to the unshifted baseline will make your drift analysis simpler and more effective. + +**References** + +[1] [A Primer on Data Drift](https://medium.com/data-from-the-trenches/a-primer-on-data-drift-18789ef252a6) + +[2] [Domain Classifier — Towards reliable MLOps with Drift Detectors](https://medium.com/data-from-the-trenches/towards-reliable-ml-ops-with-drift-detectors-5da1bdb29c63) + +[3] [Shapley Values — Interpretable Machine Learning — C. Molnar](https://christophm.github.io/interpretable-ml-book/shapley.html) + +[4] [SHapley Additive exPlanations package](https://github.com/slundberg/shap) + +> 如果发现译文存在错误或其他需要改进的地方,欢迎到 [掘金翻译计划](https://github.com/xitu/gold-miner) 对译文进行修改并 PR,也可获得相应奖励积分。文章开头的 **本文永久链接** 即为本文在 GitHub 上的 MarkDown 链接。 + +--- + +> [掘金翻译计划](https://github.com/xitu/gold-miner) 是一个翻译优质互联网技术文章的社区,文章来源为 [掘金](https://juejin.im) 上的英文分享文章。内容覆盖 [Android](https://github.com/xitu/gold-miner#android)、[iOS](https://github.com/xitu/gold-miner#ios)、[前端](https://github.com/xitu/gold-miner#前端)、[后端](https://github.com/xitu/gold-miner#后端)、[区块链](https://github.com/xitu/gold-miner#区块链)、[产品](https://github.com/xitu/gold-miner#产品)、[设计](https://github.com/xitu/gold-miner#设计)、[人工智能](https://github.com/xitu/gold-miner#人工智能)等领域,想要查看更多优质译文请持续关注 [掘金翻译计划](https://github.com/xitu/gold-miner)、[官方微博](http://weibo.com/juejinfanyi)、[知乎专栏](https://zhuanlan.zhihu.com/juejinfanyi)。 From 924e99a5f64b049c0bd056812daf201c65aa8fe7 Mon Sep 17 00:00:00 2001 From: Cai Yundong Date: Thu, 17 Dec 2020 14:51:16 +0800 Subject: [PATCH 5/5] =?UTF-8?q?=E6=9C=BA=E5=99=A8=E5=AD=A6=E4=B9=A0?= =?UTF-8?q?=E7=B3=BB=E7=BB=9F=E8=AE=BE=E8=AE=A1=E7=9B=B8=E5=85=B3=E9=9D=A2?= =?UTF-8?q?=E8=AF=95=E9=97=AE=E9=A2=98=E7=9A=84=E5=89=96=E6=9E=90=20(#7602?= =?UTF-8?q?)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Update the-anatomy-of-a-machine-learning-system-design-interview-question.md first commit * Update the-anatomy-of-a-machine-learning-system-design-interview-question.md add name * Update the-anatomy-of-a-machine-learning-system-design-interview-question.md name change * Update the-anatomy-of-a-machine-learning-system-design-interview-question.md fine tunning * Update the-anatomy-of-a-machine-learning-system-design-interview-question.md last commit * Update the-anatomy-of-a-machine-learning-system-design-interview-question.md 修改完毕 * Update the-anatomy-of-a-machine-learning-system-design-interview-question.md 校对更新 * Update the-anatomy-of-a-machine-learning-system-design-interview-question.md 校对完毕 --- ...arning-system-design-interview-question.md | 232 +++++++++--------- 1 file changed, 116 insertions(+), 116 deletions(-) diff --git a/article/2020/the-anatomy-of-a-machine-learning-system-design-interview-question.md b/article/2020/the-anatomy-of-a-machine-learning-system-design-interview-question.md index 7f42f245031..90b52a1dd25 100644 --- a/article/2020/the-anatomy-of-a-machine-learning-system-design-interview-question.md +++ b/article/2020/the-anatomy-of-a-machine-learning-system-design-interview-question.md @@ -2,214 +2,214 @@ > * 原文作者:[The Educative Team](https://medium.com/@educative) > * 译文出自:[掘金翻译计划](https://github.com/xitu/gold-miner) > * 本文永久链接:[https://github.com/xitu/gold-miner/blob/master/article/2020/the-anatomy-of-a-machine-learning-system-design-interview-question.md](https://github.com/xitu/gold-miner/blob/master/article/2020/the-anatomy-of-a-machine-learning-system-design-interview-question.md) -> * 译者: -> * 校对者: +> * 译者:[caiyundong](http://github.com/caiyundong) +> * 校对者:[zenblo](http://github.com/zenblo), [NieZhuZhu](http://github.com/NieZhuZhu) -# The Anatomy of a Machine Learning System Design Interview Question +# 针对机器学习系统设计面试相关问题的剖析 ![Image credit: Author](https://cdn-images-1.medium.com/max/2048/1*Ep0crcTbOZBLZwVW8W3JDQ.png) -Machine learning system design interviews have become increasingly common as more industries adopt ML systems. While similar in some ways to generic system design interviews, ML interviews are different enough to trip up even the most seasoned developers. The most common problem is to get stuck or intimidated by the large scale of most ML solutions. +随着越来越多的行业采用机器学习(Machine Learning, ML)系统,机器学习系统设计相关的面试变得越来越普遍。虽然机器学习系统设计的面试在某些方面与普通系统设计面试相似,但它们的差异足以难倒最老练的开发人员。最常见的问题是,面试者会被大规模的机器学习解决方案难倒。 -Today we’ll prepare you for your next ML system design interview by breaking down how they’re unique, what you should do to prepare, and the five steps you should use to solve any ML problem. +今天我们通过学习机器学习系统设计的独特性、解决机器学习问题的五个步骤以及面试的准备工作来为你下一次机器学习系统设计的面试做好准备。 -**Today we’ll cover:** +**今天我们将介绍:** -* What’s different about the ML system design interview -* How to prepare for an ML system design interview -* Five steps to solve any ML system design problem -* Wrapping up and resources +* 机器学习系统设计的面试有什么不同 +* 如何准备机器学习系统设计面试 +* 解决任何机器学习系统设计问题的五个步骤 +* 本文总结 -## What’s Different About the ML System Design Interview +## 机器学习系统设计的面试有什么不同 -The general setup of a machine learning system design interview is similar to a generic SDI. For both, you’ll be placed with an interviewer for 45 to 60 minutes and be asked to think through the components of a program. +机器学习系统设计面试的一般设置与普通系统设计面试类似。对于这两种情况,面试官都会在 45 至 60 分钟的时间内,要求你仔细考虑系统的各个组成部分。 -ML interviews generally focus more on the macro-level (like architecture, recommendation systems, and scaling) and avoid deeper design discussions on topics like availability and reliability. +机器学习系统设计面试通常更侧重于宏观层面(如体系结构,推荐系统和可扩展性),而尽量避免比如可用性和可靠性等更深入的设计讨论。 -In ML interviews, you’ll be asked high-level questions on how you’d set up each component of a system (data gathering, ML algorithm, signals) to handle a heavy workload and still adapt quickly. +在机器学习系统设计面试中,你会遇到一些高层次面试问题,比如说明如何构建系统的每个组件(数据收集,机器学习算法,标志)用以处理繁重的工作量并能保持快速适应的能力。 -For the machine learning SDI, you’ll be expected to explain how your program acquires data and achieves scalability. +对于机器学习系统设计面试,你需要能够解释算法程序如何获取数据并实现可扩展性。 -## How to Prepare for an ML System Design Interview +## 如何准备机器学习系统设计面试 -An ML system design interview will test your knowledge of two things: your knowledge of the setups and design choices behind large-scale ML systems and your ability to articulate ML concepts as you apply them. +机器学习系统设计面试主要评估两方面:对大型机器学习系统背后的构建和设计选择的相关知识,以及在应用机器学习概念时清楚表达的能力。 -Let’s look at three ways to prepare both your knowledge and articulation. +接下来是关于面试前知识准备以及面试表达的三种方法。 -## Know the Common ML Interview Questions +## 熟悉常见的机器学习面试问题 -The best way to prepare for these questions is to practice ML SDI problems on your own. There are only a few types of ML interview questions asked in modern ML SDIs. +准备这些问题的最好方法是自己练习机器学习系统设计面试问题。在现代机器学习系统设计面试中,仅会询问几种类型的面试问题。 -**The most common are iterations of:** +**最常见的是以下重点话题的变形问题:** -* “Create a Twitter-style social media feed to display relevant posts to users.” -* “Build a recommendation system to suggest products/services.” -* “Design a Google-style search engine that tailors results to the user.” -* “Build an advertising system that presents personalized ads to users.” -* “Design an ML system that identifies bad actors in a social network.” +* “创建一个 Twitter 风格的媒体反馈平台(Feed),以向用户展示他们感兴趣的帖子。” +* “建立一个推荐系统来推荐产品或者服务。” +* “设计一个 Google 风格的搜索引擎为用户量身定制结果。” +* “建立一个向用户展示个性化广告的广告系统。” +* “设计一个识别社交网络中不良行为者的机器学习系统。” -Search in the target job’s description for mentions of specific systems you’d work with and study similar systems for the interview. For jobs without a clear leaning toward any question type, focus on the media feed and recommendation systems, as these are the two most asked questions. +在目标职位的描述中搜索将要使用的特定系统,并研究类似的系统来准备面试。对于没有明确倾向于任何问题类型的工作,请专注于媒体反馈和推荐系统,因为这两个是最常见的问题。 -## Focus on the 4 Parts of Every ML Solution +## 关注每个机器学习解决方案的四个部分 ![Image source: Author](https://cdn-images-1.medium.com/max/2000/0*Dp0BQjXD6BO-tbnO) -**Each ML solution has four major parts:** +**每个机器学习解决方案都有四个主要部分:** -* Machine learning algorithm -* Training data -* Signals (sometimes called **features**) -* Validation and metrics +* 机器学习算法 +* 训练数据 +* 标志(有时称为**特征**) +* 验证和评价指标 -For **algorithms**, what algorithm will you choose and why? Deep learning, linear regression, random forest? What are the strengths and weaknesses of each? What do they accomplish per your system’s needs? +对于**算法**,你将选择哪种算法,为什么选择?深度学习,线性回归,还是随机森林?每个算法的优点和缺点是什么?根据你的系统需求他们能完成什么工作? -For **data**, where will you get test data? What data points will you draw from? How many data points will you handle? +对于**数据**,你将在哪里获得测试数据?你将提取哪些数据点?你将处理多少数据点? -For **signals**, what metric does your program use to determine relevant data? Will you signal to focus on one aspect of the data or synthesize it from multiple? How long does it take to determine data relevancy? +对于**标志**,你的程序使用什么评价指标来确定相关数据?你会关注数据的某一方面还是将其从多个方面进行综合?确定数据相关性需要多长时间? -For **metrics**, what metrics will you track for success and program learning? How would you measure the success of your system? How will you validate your hypothesis? +对于**评价指标**,你将使用哪些指标跟踪模型是否训练成功和更新程序的学习?你将如何衡量系统的成功?你将如何验证你的假设? -## Practice Explaining Out Loud +## 进行大声讲解练习 -Many interviewees will study concepts and algorithms but fail to practice the spoken component of the interview. +许多参加面试者对概念和算法做了很深入的研究,但没有练习面试时的表述能力。 -Practice explaining your system’s architecture aloud throughout the process. Narrate any decisions you make, briefly explaining why you made that choice. This is a great opportunity to show the interviewer how you think, not just what you know. +练习在整个过程中大声地解释系统的体系结构。叙述你所做的任何决定,并简要解释你做出该选择的原因。这是一个向面试官展示你的想法,而不仅仅是你所知道的知识的绝好机会。 -Also, practice your answers to common probing questions. The interviewer will ask you to clarify any decision points or uncertainties in your program. Make sure you can justify the design choices you make at any point in the process. +此外,练习常见问题的答案。面试官会要求你澄清程序中的任何决策点或不确定性。你要确保可以证明在过程中的任何时候所做的设计选择都是合理的。 -**Some common probe questions are:** +**一些常见的探究问题:** -* How will this program perform at scale? -* How will you acquire your training data? -* What will you do to keep latency low? +* 该程序大规模执行的表现会怎么样? +* 你将如何获取训练数据? +* 你将如何保持较低的延迟? -## 5 Steps to Solve Any ML System Design Problem +## 解决任何机器学习系统设计问题的五个步骤 -An ML SDI interview will usually have a strict time limit of either 45 or 60 minutes, with five minutes at the start and end for introductions/wrap up. +机器学习系统设计面试通常会有严格的时间限制,即 45 或 60 分钟,包含在开始和结束时用 5 分钟做介绍和总结。 -So generally, you’ll be expected to cover all key areas of your ML program in 35 to 50 minutes. It’s important to come with a structured plan of how you’ll draft the system, to ensure you stay on track. +因此通常来讲,你需要在 35 到 50 分钟内描述清楚你的机器学习程序中的所有关键内容。因此如何介绍整个系统做一个结构化的计划很重要,以确保你能够让整个过程有序进行。 -Next, we’ll look at how to break up your time to ace any ML question. To help understand the process, we’ll also demonstrate each step through an example feed-type question in a 45-minute interview. +接下来,我们将研究如何分配时间解决任何一个机器学习问题。为了帮助你了解这一过程,我们将展示一个 45 分钟的采访示例,通过一个反馈类型的问题,来演示每个步骤。 -You can adapt these steps to a 60-minute interview if you scale up the time of each step. +如果你增加每个步骤的时间,则可以将这些步骤调整为 60 分钟的面试时间。 -**Our question is: Create a content feed to display personalized posts to users.** +**我们的面试问题是:创建一个内容反馈以向用户显示个性化帖子。** ![](https://cdn-images-1.medium.com/max/4830/1*089x6xZcQvuXGZY3A9AtRQ.png) -#### Step 1. Clarify requirements (5 minutes) +#### 步骤 1:明确需求(5 分钟) -For the first five minutes, we’ll clarify our **system goal** and **requirements** with the interviewer. These interview questions are deliberately vague to make you directly ask for the information you’ll need. Your clarifying questions will help steer your design and decide your system’s end goal. +在最初的五分钟内,我们将向面试官明确我们的**系统目标**和**需求**。 这些面试问题往往刻意含糊不清,使得你需要直接要求必要的信息。你提出的澄清问题将有助于指导你的设计并确定系统的最终目标。 -**Some common clarifying questions would be:** +**一些常见的澄清问题:** -* How many users do we expect this program to handle? -* What metrics are we currently tracking? -* What do we want to achieve with this system? What do we want to optimize for? -* What type of input do we expect? +* 我们希望该程序可以处理多少个用户? +* 我们当前正在跟踪哪些指标? +* 我们想用这个系统实现什么?我们要优化什么? +* 我们期望什么类型的输入? -**Step 1. Example** +**步骤 1 的示例** -If we were clarifying the feed question, we’d ask: +如果我们要讲解示例的内容反馈问题,我们可以问: -* What type of feed will this be? Purely text? Text and images? -* How many users do we expect to have? How many posts does each make per day? -* What metric does our system optimize for? Do we want more engagement per post or to increase the number of posts? -* Do we have a target latency? -* How quickly will our system apply new learning? +* 这将是哪种内容反馈?纯文字?文字和图片? +* 我们期望有多少用户?每人每天发布多少个帖子? +* 我们的系统针对什么指标需要进行优化?我们要增加每个帖子的参与度还是增加帖子的数量? +* 我们有目标延迟吗? +* 我们的系统将多快应用新的学习模型? -#### Step 2. High-level design (5 minutes) +#### 步骤 2:高层次设计(5 分钟) -For the next five minutes, create a high-level design that handles data from input to use. Chart this visually and connect all components that interact. The interviewer will ask probing questions as you build, so look out for questions that suggest you’re missing a component. +在接下来的五分钟内,创建一个高层设计来处理从输入到使用的数据。直观地绘制图表并连接所有交互的组件。面试官在构建时会询问一些探索性问题,因此请注意一些面试官提示你缺少组件的问题。 -Remember to keep this abstract: Decide how many layers, how data will enter the system, how data will be parsed, and how you will decide on relevant data. +记住要保持概括:确定多少层,数据将如何进入系统,数据将如何解析以及如何决定相关数据。 -Make sure to explicitly mention any choices you make for scalability or response time. +确保明确提及你为可扩展性或响应时间所做的任何选择。 -**Step 2. Example** +**步骤 2 的示例** -We’d write that our training data is from our current social media platform. Fresh live data will enter the system each time a new post is created, based on the creator’s location, the popularity of the creator’s past posts, and the accounts that follow that creator. +我们会说我们的培训数据来自我们当前的社交媒体平台。每次创建新帖子时,根据创建者的地理位置,创建者过去的帖子的受欢迎程度以及帖子关注者的帐户,新的实时数据将进入系统。 -We’ll use these metrics to determine how relevant a post is to a user. Relevancy will be determined when the app is launched. Our goal is to increase engagement per post. +我们将使用这些指标来确定帖子与用户的相关性。相关性将在启动应用程序时确定。我们的目标是增加每个帖子的参与度。 -#### Step 3: Data deep dive (10 minutes) +#### 步骤 3:深入研究数据(10 分钟) -For the next ten minutes, take a deep dive to explain your data. Make sure to cover both training data and live data. Think about how the data will need to transform through the process. +在接下来的十分钟内,请深入解释下你的数据。确保同时涵盖训练数据和实时数据。思考在整个转换过程中需要什么样的数据。 -ML interviewers are looking for candidates who understand the importance of data sampling. You’ll be expected to clarify where you’d get the training data, what data points you’d use that are present within the current system, and what data you want to begin tracking. +机器学习面试官更想要找到懂数据采样重要性的面试者。你需要表述清楚是从哪里获取的训练数据,当前系统使用到了哪些数据以及需要跟踪哪些数据。 -This differs from a generic SDI, where the interviewee only considers what happens to the data after it enters the program flow. +这与普通的系统设计不同,普通的系统设计仅需要面试者考虑数据进入程序后会发生什么情况。 -**For training data, consider:** +**对于训练数据,需要注意:** -* What source will I get my training data from? -* How will I ensure it is unbiased? +* 我的训练数据源来自哪里? +* 我如何确保它是无偏见的? -**For live data, consider:** +**对于实时数据,需要注意:** -* What signal will I use in my data? -* Why is this signal relevant? -* Are there any situations in which this signal would not reflect my desired outcome? -* How reactive is my algorithm? Will changes occur within ten minutes or ten hours, etc.? -* How much data can my program handle at once? Does it perform worse with more input? +* 我将在数据中使用什么标志(特征)? +* 为什么和这个标志有关? +* 是否有这个标志无法反映我想要的结果的情况? +* 我的算法响应速度如何?是否会存在十分钟或者十小时的延时? +* 我的程序一次可以处理多少数据?如果输入增多,效果会变差吗? -**Step 3. Example** +**步骤 3 的示例** -We’ll expect each user to follow 300 accounts and each account to make an average of three posts per day. We’ll have three layers of data evaluation to keep latency low when the system evaluates the 1,000 posts. The first layer quickly cuts a majority of posts based on the posts’ popularity. +我们希望每个用户关注 300 个帐户,并且每个帐户平均每天发布 3 条帖子。在系统评估 1000 个帖子时我们将分三层进行数据评估使延迟保持在较低水平。第一层会根据帖子的受欢迎程度快速去除大部分帖子。 -The second layer uses locational data to cut posts based on locality. This is our second-quickest layer. The third layer will be the longest and will cut posts using cross-engagement data between the follower and followed. +第二层使用位置数据根据位置删除帖子。这是我们第二快的层。第三层将是耗时最长的,将使用关注者和被关注者之间的交叉参与数据来去除帖子。 -#### Step 4. Machine learning algorithms (10 minutes) +#### 步骤 4:机器学习算法(10 分钟) -For the next ten minutes, break down your choice of machine learning algorithm(s) to the interviewer. Each algorithm handles certain tasks differently, and the interviewer will want you to know the strengths and weaknesses of different algorithms. +在接下来的十分钟内,将你选择的机器学习算法分解给面试官听。每种算法处理某些任务的方式都不同,面试官会希望你了解不同算法的优缺点。 -If you use several algorithms to handle scale, mention how their results will factor together and your reasons for choosing multiple algorithms. +如果你使用多种算法来实现可扩展,请解释它们的结果如何被一起考虑以及你选择多种算法的原因。 -Make sure to mention how each algorithm utilizes your signals to create a cohesive solution. The same signal may not be as effective in one algorithm as it is in another. +请务必提及每种算法如何利用你的标志来创建内聚的解决方案。同一标志在一种算法中可能不如在另一种算法中有效。 -**Step 4. Example** +**步骤 4 的示例** -We’ll use the feedforward neural network algorithm to predict relevancy. This algorithm works well with our creator/user interactions signal because it forms predictions from non-circular connection webs. +我们将使用前馈神经网络算法来预测相关性。该算法与我们的创建者/用户互动标志配合得很好,因为它可以形成来自非环状连接网的预测。 -#### Step 5. Experimentation (5 minutes) +#### 步骤 5:实验(5 分钟) -In the final five minutes, stake a hypothesis of what your system will accomplish. This section is a sort of conclusion for your program where you can summarize how the components together achieve a certain goal. +在最后的五分钟中,提出系统能够完成的假设。本节是你的程序的一种总结,你可以在其中总结组件如何共同实现特定目标。 -Your hypothesis may be broad, like “posts ordered by relevance will get more engagement than chronological,” or it may be specific, like “the addition of a location signal will increase engagement by 0.5%.” +你的假设可能很宽泛,例如“按相关性排序的帖子比按时间顺序排列的参与度更高”,也可能很具体,例如“添加位置标志将使参与度提高 0.5%”。 -From here, explain how you’d test this hypothesis. Make sure to cover the initial offline evaluations and also how to evaluate online. +从这里说明你要如何检验该假设。确保涵盖最初的离线评估以及如何在线评估。 -* What offline evaluations would you use? -* How big an offline data set would you need? -* When online, what will you do if there is a bug? -* How will you track user satisfaction with the change? -* Do you count engagement using comments or using any form of interaction with the post? +* 你将使用哪些离线评估? +* 你需要多大的离线数据集? +* 在线时如果存在错误,你将怎么办? +* 你将如何跟踪用户对变更的满意度? +* 你是否通过评论或与帖子进行任何形式的互动来计算参与度? -ML engineers constantly test out hypotheses in their day-to-day work. A focus on experimentation will set you apart from other applicants, as it shows that you can synthesize the functionality of your program and possess the right mindset for the job. +机器学习工程师在日常工作中会不断检验假设。对于实验部分的专注将把你与其他面试者区分开,因为它表明你可以综合程序的功能并拥有正确的工作思路。 -**Step 5. Example** +**步骤 5 的示例** -Our relevancy-based feed will increase user engagement by 0.5%. We’ll first use offline models programmed to simulate users and see what types of posts come through to the feed. +我们基于相关性的内容反馈将使用户参与度提高 0.5%。 我们将首先使用离线模型来模拟用户并查看将哪些类型的帖子发送到反馈系统。 -Once we move online, we’ll track posts with the keywords “update” and “relevance” to determine effectiveness. +在上线之后,我们将通过“更新”和“相关性”这两个关键字来跟踪帖子,以确定效果。 -## 5-Step Summary +## 五步骤总结 -Step 1. Clarify requirements (5 minutes) +步骤 1:明确要求(5 分钟) -Step 2. High-level design (5 minutes) +步骤 2:高层次设计(5 分钟) -Step 3. Data deep dive (10 minutes) +步骤 3:深入研究数据(10 分钟) -Step 4. Machine learning algorithms (10 minutes) +步骤 4:机器学习算法(10 分钟) -Step 5. Experimentation (5 minutes) +步骤 5:实验(5 分钟) -## Wrapping Up and Resources +## 本文总结 -You now have everything you need to ace your next ML interview. By preparing ML study material and a timed solution plan, you’ll set yourself apart from others still unfamiliar with this rising interview type. +现在,你拥有了完成一次机器学习面试所需要的一切。 通过准备机器学习的面试资料和 45 分钟计时的解决方案,你将从那些还不熟悉这种不断升级面试类型的面试者中脱颖而出。 -Happy interviewing! +祝面试愉快! > 如果发现译文存在错误或其他需要改进的地方,欢迎到 [掘金翻译计划](https://github.com/xitu/gold-miner) 对译文进行修改并 PR,也可获得相应奖励积分。文章开头的 **本文永久链接** 即为本文在 GitHub 上的 MarkDown 链接。