-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
b5e19a2
commit af4d347
Showing
2 changed files
with
74 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
# 设计 Twitter 搜索 | ||
|
||
Twitter 是最大的社交网络服务之一,用户可以在其中分享照片、新闻和基于文本的消息。本章将设计一个能够存储和搜索用户推文的服务。 | ||
|
||
**类似问题** | ||
|
||
推文搜索。 | ||
|
||
**难度级别** | ||
|
||
中等。 | ||
|
||
## 1. 什么是 Twitter 搜索? | ||
|
||
Twitter 用户可以随时更新他们的状态。每条状态(称为推文)由纯文本组成,我们的目标是设计一个系统,能够对所有用户推文进行搜索。 | ||
|
||
## 2. 系统需求和目标 | ||
|
||
- 假设 Twitter 总共有 15 亿用户,其中 8 亿是每日活跃用户。 | ||
- Twitter 平均每天接收 4 亿条推文。 | ||
- 每条推文的平均大小为 300 字节。 | ||
- 假设每天有 5 亿次搜索。 | ||
- 搜索查询将包含由 AND/OR 组合的多个单词。 | ||
|
||
我们需要设计一个能够高效存储和查询推文的系统。 | ||
|
||
--- | ||
|
||
## 3. 容量估算和约束 | ||
|
||
**存储容量:** | ||
|
||
- 每天新增推文数量:4 亿。 | ||
- 每条推文大小:300 字节。 | ||
- 每天总存储需求: | ||
|
||
``` | ||
400M * 300 => 120GB/day | ||
``` | ||
|
||
- 每秒新增存储需求: | ||
|
||
``` | ||
120GB / 24hours / 3600sec ~= 1.38MB/second | ||
``` | ||
|
||
## 4. 系统 API | ||
|
||
我们可以通过 SOAP 或 REST API 来暴露服务功能,以下是搜索 API 的定义: | ||
|
||
``` | ||
search(api_dev_key, search_terms, maximum_results_to_return, sort, page_token) | ||
``` | ||
|
||
**参数:** | ||
|
||
- **api_dev_key (string)**: 注册账户的开发者密钥。此密钥可用于根据分配的配额限制用户请求。 | ||
- **search_terms (string)**: 包含搜索词的字符串。 | ||
- **maximum_results_to_return (number)**: 返回的推文数量。 | ||
- **sort (number)**: 可选排序模式: | ||
- 按最新优先(0 - 默认) | ||
- 按最佳匹配(1) | ||
- 按最多点赞(2) | ||
- **page_token (string)**: 指定要返回结果集中哪一页的令牌。 | ||
|
||
**返回值:** | ||
|
||
- **JSON**: 包含与搜索查询匹配的推文列表的信息。每个结果条目可包括以下字段: | ||
- 用户 ID 和姓名 | ||
- 推文文本 | ||
- 推文 ID | ||
- 创建时间 | ||
- 点赞数等。 |