Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

geekibli / java-interview Public

Notifications You must be signed in to change notification settings
Fork 0
Star 5

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Breadcrumbs

java-interview
mds

/

Mass Data.md

Copy path

Latest commit

History

26 lines (21 loc) · 1.96 KB

Breadcrumbs

java-interview
mds

/

Mass Data.md

File metadata and controls

26 lines (21 loc) · 1.96 KB

海量数据设计问题

这部分经常用法的阶梯思路：

分治
哈希&一致性哈希
前缀树
bitmap
大顶堆&小顶堆

常见的题目：

如何从大量的 URL 中找出相同的 URL？
如何从大量数据中找出高频词？
如何找出某一天访问百度网站最多的 IP？
如何在大量的数据中找出不重复的整数？
如何在大量的数据中判断一个数是否存在？
如何查询最热门的查询串？
如何统计不同电话号码的个数？
如何从 5 亿个数中找出中位数？
如何按照 query 的频度排序？
如何找出排名前 500 的数？
讲讲大数据中 TopK 问题的常用套路？
设计一个敏感词过滤系统，有什么思路？链接1

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.