Skip to content

Commit

Permalink
Improve pattern and reduce size (#245)
Browse files Browse the repository at this point in the history
  • Loading branch information
omrilotan authored Feb 27, 2024
1 parent c86e260 commit b489f2c
Show file tree
Hide file tree
Showing 7 changed files with 28 additions and 41 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Changelog

## [5.1.1](https://github.com/omrilotan/isbot/compare/v5.1.0...v5.1.1)

- Reduce pattern size by introducing the substring ".com" and improve generic pattern

## [5.1.0](https://github.com/omrilotan/isbot/compare/v5.0.0...v5.1.0)

- Build now compatibile with older Javascript version: es2016
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# isbot 🤖/👨‍🦰

[![](https://img.shields.io/npm/v/isbot?style=flat-square)](https://www.npmjs.com/package/isbot) [![](https://img.shields.io/npm/dt/isbot?style=flat-square)](https://www.npmjs.com/package/isbot) [![](https://img.shields.io/circleci/build/github/omrilotan/isbot?style=flat-square)](https://circleci.com/gh/omrilotan/isbot) [![](https://img.shields.io/github/last-commit/omrilotan/isbot?style=flat-square)](https://github.com/omrilotan/isbot/graphs/commit-activity) [![](https://data.jsdelivr.com/v1/package/npm/isbot/badge)](https://www.jsdelivr.com/package/npm/isbot)
[![](https://img.shields.io/npm/v/isbot?style=flat-square)](https://www.npmjs.com/package/isbot) [![](https://img.shields.io/circleci/build/github/omrilotan/isbot?style=flat-square)](https://circleci.com/gh/omrilotan/isbot) [![](https://img.shields.io/github/last-commit/omrilotan/isbot?style=flat-square)](https://github.com/omrilotan/isbot/graphs/commit-activity) [![](https://img.shields.io/npm/dt/isbot?style=flat-square)](https://www.npmjs.com/package/isbot) [![](https://data.jsdelivr.com/v1/package/npm/isbot/badge)](https://www.jsdelivr.com/package/npm/isbot)

[![](./page/isbot.svg)](https://isbot.js.org)

Expand Down
8 changes: 5 additions & 3 deletions fixtures/browsers.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,6 @@ Arora:
- "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN) AppleWebKit/523.15 (KHTML, like Gecko, Safari/419.3) Arora/0.3 (Change: 287 c9dfb30)"
- Mozilla/5.0 (X11; U; Linux; hu-HU) AppleWebKit/523.15 (KHTML, like Gecko, Safari/419.3) Arora/0.4
Avant:
- Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; Avant Browser [avantbrowser.com]; Hotbar 4.4.5.0)
- Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Avant Browser; Avant Browser; .NET CLR 1.1.4322)
- Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Avant Browser; Avant Browser; InfoPath.2)
- Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/4.0; Avant Browser; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.2; .NET4.0C; Tablet PC 2.0; .NET4.0E; Avant Browser)
Expand Down Expand Up @@ -542,7 +541,6 @@ Sleipnir:
- Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/534.59.10 (KHTML, like Gecko) Version/5.1 Safari/6534.59.10 Sleipnir/4.5.1
- Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36 Sleipnir/6.2.14
SlimBrowser:
- Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SlimBrowser [flashpeak.com]; SV1)
- Mozilla/5.0 (SmartHub; Linux/SmartTV) AppleWebKit/606 (KHTML, like Gecko) SlimBrowser/11.0.8.0 Safari/606 OMI/4.8.0.129.PIXEL_UNICORN2.12
- Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; SlimBrowser/7.00; MASBJS; rv:11.0) like Gecko
Snapchat:
Expand Down Expand Up @@ -690,8 +688,12 @@ ZZZ Glitches and Misidentified Browsers - These browsers are legit user agent ev
- Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB6.5; .NET CLR 2.0.50727; staticlogin:productcboxf09&actlogin&infoZmlsZW5hbWU9UG93ZXJ3b3JkMjAwOU94Zi4yNTI2OS40MDExLmV4ZSZtYWM9N0RDMTUwREU5MUEyNERBOTlBODYxREY3NjQ0Nzc1NDYmcGFzc3BvcnQ9JnZlcnN
- Mozilla/5.0 (compatible; Lucidworks-Anda/2.0/0.10; +; )
- Mozilla/5.0 (en-us) AppleWebKit/525.13 (KHTML, like Gecko) Version/3.1 Safari/525.13
- mWebView.getSettings().setUserAgentString(\x22Mozilla/5.0 (Amiga; U; AmigaOS 1.3; en; rv:1.8.1.19);
- ozilla/5.0 (Linux; Android 10) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/115.0.5790.166 Mobile DuckDuckGo/5 Safari/537.36
- User-Agent:Mozilla/5.0 (compatible; MSIE 11.0.190; Windows NT 10.0; .NET CLR 1.0.3705;)
- User-Agent:Mozilla/5.0 (compatible; MSIE 11.0.190; Windows Phone OS 10.0; Trident/5.0; IEMobile/9.0; NOKIA; Lumia 710)
- User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 11_3_1) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/92.0 Safari /535.7
- User-Agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.74 Safari/537.36 Edg/90.0.818.62
- User-Agent:Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0 Safari /537.36
ZZZ Insignificat bots - These bots have very low appearance rate and are not worth blocking:
- Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; Banca Caboto s.p.a.)
- Opera/9.70 (Linux armv7l ; U; turbotabbee/TSV2.0/1.02Q; fr) Presto/2.2
14 changes: 8 additions & 6 deletions fixtures/crawlers.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
"8":
- Mozilla/5.0 (compatible; 008/0.83; http://www.80legs.com/webcrawler.html) Gecko/2008032620
2ip.ru:
- 2ip.ru CMS Detector (https://2ip.ru/cms/)
360Spider:
- Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1; 360Spider
Aboundexbot:
Expand Down Expand Up @@ -126,11 +130,14 @@ Catchpoint:
CATExplorador:
- CATExplorador/1.0beta (sistemes at domini dot cat; http://domini.cat/catexplorador.html)
ccBot crawler:
- CCBot/1.0 (+https://commoncrawl.org/bot.html)
- CCBot/2.0 (http://commoncrawl.org/faq/)
Censys:
- Mozilla/5.0 (compatible; CensysInspect/1.1; +https://about.censys.io/)
CF-UC:
- CF-UC User Agent v.1d.374049
Chat-GPT:
- Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot
Chrome Headless:
- Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/100.0.4896.88 Safari/537.36
- Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/83.0.4103.61 Safari/537.36
Expand Down Expand Up @@ -779,12 +786,6 @@ Uptimebot:
- Mozilla/5.0 (compatible; Uptimebot/1.0; +http://www.uptime.com/uptimebot)
URLAppendBot:
- Mozilla/5.0 (compatible; URLAppendBot/1.0; +http://www.profound.net/urlappendbot.html)
User-Agent prefix error:
- User-Agent:Mozilla/5.0 (compatible; MSIE 11.0.190; Windows NT 10.0; .NET CLR 1.0.3705;)
- User-Agent:Mozilla/5.0 (compatible; MSIE 11.0.190; Windows Phone OS 10.0; Trident/5.0; IEMobile/9.0; NOKIA; Lumia 710)
- User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 11_3_1) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/92.0 Safari /535.7
- User-Agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.74 Safari/537.36 Edg/90.0.818.62
- User-Agent:Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0 Safari /537.36
Vagabondo:
- Mozilla/4.0 (compatible; Vagabondo/4.0; http://webagent.wise-guys.nl/; http://www.wise-guys.nl/)
Var:
Expand Down Expand Up @@ -900,6 +901,7 @@ ZZZ Miscellaneous Glitches and Errornous User Agent Strings:
- default_user_agent
- ipad
- iphone 6 plus;afengineurl=https://intoli.com:443;traceId=63028f8e-c5fc-4846-993f-59a96268a85d
- Mozilla/5.0 (compatible; 007ac9 Crawler; http://crawler.007ac9.net/)
- Mozilla/5.0 (Linux; Android 10; FIG-AL10 Build/HUAWEIFIG-AL10; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/105.0.0.0MQQBrowser/6.2 TBS/045223 Mobile Safari/537.36 MMWEBID/1214 MicroMessenger/7.0.14.1660(0x27000E39) Process/tools NetType/4G Language/zh_CN ABI/arm64 WeChat/arm64 wechatdevtools qcloudcdn-xinan
- Mozilla/5.0 (Linux; Android 10; M6 Note Build/N2G47H; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/105.0.0.0MQQBrowser/6.2 TBS/045223 Mobile Safari/537.36 MMWEBID/9551 MicroMessenger/7.0.14.1660(0x27000E37) Process/tools NetType/4G Language/zh_CN ABI/arm64 WeChat/arm64 wechatdevtools qcloudcdn-xinan
- pisya
Expand Down
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "isbot",
"version": "5.1.0",
"version": "5.1.1",
"description": "🤖/👨‍🦰 Recognise bots/crawlers/spiders using the user agent string.",
"keywords": [
"bot",
Expand Down
35 changes: 7 additions & 28 deletions src/patterns.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,41 +2,36 @@
" daum[ /]",
" deusu/",
" yadirectfetcher",
"(?:^| )site",
"(?:^|[^g])news",
"(?<! (?:channel/|google/))google(?!(app|/google| pixel))",
"(?<! cu)bot(?:[^\\w]|_|$)",
"(?<! ya(?:yandex)?)search",
"(?<!(?:lib))http",
"(?<![hg]m)score",
"@[a-z]",
"\\(at\\)[a-z]",
"\\[at\\][a-z]",
"@",
"\\(\\)",
"\\.com",
"^12345",
"^<",
"^[\\w \\.\\-\\(?:\\):]+(?:/v?\\d+(\\.\\d+)?(?:\\.\\d{1,10})?)?(?:,|$)",
"^[\\w \\.\\-\\(?:\\):]+(?:/v?\\d+(?:\\.\\d+)?(?:\\.\\d{1,10})*?)?(?:,|$)",
"^[^ ]{50,}$",
"^\\w+/[\\w\\(\\)]*$",
"^active",
"^ad muncher",
"^amaya",
"^anglesharp/",
"^avsdevicesdk/",
"^bidtellect/",
"^biglotron",
"^bot",
"^btwebclient/",
"^clamav[ /]",
"^client/",
"^cobweb/",
"^coccoc",
"^custom",
"^ddg[_-]android",
"^discourse",
"^dispatch/\\d",
"^downcast/",
"^duckduckgo",
"^facebook",
"^fdm[ /]\\d",
"^getright/",
"^gozilla/",
"^hatena",
Expand All @@ -46,19 +41,14 @@
"^jeode/",
"^jetty/",
"^jigsaw",
"^linkdex",
"^metauri",
"^microsoft bits",
"^movabletype",
"^mozilla/\\d\\.\\d \\(compatible;?\\)$",
"^mozilla/\\d\\.\\d \\w*$",
"^navermailapp",
"^netsurf",
"^nuclei",
"^offline explorer",
"^php",
"^postman",
"^postrank",
"^python",
"^rank",
"^read",
Expand All @@ -72,23 +62,19 @@
"^taringa",
"^thumbor/",
"^track",
"^tumblr/",
"^user-agent:",
"^valid",
"^venus/fedoraplanet",
"^w3c",
"^webbandit/",
"^webcopier",
"^wget",
"^whatsapp",
"^wordpress",
"^xenu link sleuth",
"^yahoo",
"^yandex",
"^zdm/\\d",
"^zoom marketplace/",
"^{{.*}}$",
"adbeat\\.com",
"appinsights",
"archive",
"ask jeeves/teoma",
"bit\\.ly/",
Expand All @@ -103,32 +89,29 @@
"classifier",
"cloud",
"crawl",
"cryptoapi",
"dareboost",
"datanyze",
"dataprovider",
"dejaclick",
"dmbrowser",
"download",
"evc-batch/",
"feed",
"firephp",
"freesafeip",
"gomezagent",
"headless",
"httrack",
"hubspot marketing grader",
"hydra",
"ibisbrowser",
"images",
"insight",
"inspect",
"iplabel",
"ips-agent",
"java(?!;)",
"library",
"mail\\.ru/",
"manager",
"monitor",
"neustar wpm",
"node",
"nutch",
Expand All @@ -143,7 +126,6 @@
"preview",
"proxy",
"ptst[ /]\\d",
"reader",
"reputation",
"resolver",
"retriever",
Expand All @@ -160,21 +142,18 @@
"spider",
"splash",
"statuscake",
"stumbleupon\\.com",
"supercleaner",
"synapse",
"synthetic",
"tools",
"torrent",
"trace",
"transcoder",
"twingly recon",
"url",
"virtuoso",
"wappalyzer",
"webglance",
"webkit2png",
"whatcms/",
"wordpress",
"zgrab"
]
4 changes: 2 additions & 2 deletions tests/spec/test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ describe("isbot", () => {
});
test("isbotMatches: find all patterns in bot user agent string", () => {
expect(isbotMatches(BOT_USER_AGENT_EXAMPLE)).toContain("Google");
expect(isbotMatches(BOT_USER_AGENT_EXAMPLE)).toHaveLength(3);
expect(isbotMatches(BOT_USER_AGENT_EXAMPLE)).toHaveLength(4);
});
test("isbotPattern: find first pattern in bot user agent string", () => {
expect(isbotPattern(BOT_USER_AGENT_EXAMPLE)).toBe(
Expand All @@ -57,7 +57,7 @@ describe("isbot", () => {
expect(isbotPatterns(BOT_USER_AGENT_EXAMPLE)).toContain(
"(?<! (?:channel/|google/))google(?!(app|/google| pixel))",
);
expect(isbotPatterns(BOT_USER_AGENT_EXAMPLE)).toHaveLength(3);
expect(isbotPatterns(BOT_USER_AGENT_EXAMPLE)).toHaveLength(4);
});
test("createIsbot: create custom isbot function with custom pattern", () => {
const customIsbot = createIsbot(/bot/i);
Expand Down

0 comments on commit b489f2c

Please sign in to comment.