Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: optimize performance (replace he with entities) #6497

Merged
merged 1 commit into from
Dec 22, 2020

Conversation

oppilate
Copy link
Contributor

Before:

➜ ab -k -c 20 -n 250 http://127.0.0.1:1200/initium/feature/zh-hant           
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 100 requests
Completed 200 requests
Finished 250 requests


Server Software:        
Server Hostname:        127.0.0.1
Server Port:            1200

Document Path:          /initium/feature/zh-hant
Document Length:        224091 bytes

Concurrency Level:      20
Time taken for tests:   42.454 seconds
Complete requests:      250
Failed requests:        0
Keep-Alive requests:    250
Total transferred:      56123724 bytes
HTML transferred:       56022750 bytes
Requests per second:    5.89 [#/sec] (mean)
Time per request:       3396.299 [ms] (mean)
Time per request:       169.815 [ms] (mean, across all concurrent requests)
Transfer rate:          1291.01 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.2      0       1
Processing:   131 2773 580.9   2786    6833
Waiting:      131 2772 580.8   2786    6832
Total:        131 2773 580.7   2786    6833

Percentage of the requests served within a certain time (ms)
  50%   2786
  66%   2927
  75%   3030
  80%   3120
  90%   3341
  95%   3400
  98%   3540
  99%   3547
 100%   6833 (longest request)

After:

➜ ab -k -c 20 -n 250 http://127.0.0.1:1200/initium/feature/zh-hant
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 100 requests
Completed 200 requests
Finished 250 requests


Server Software:        
Server Hostname:        127.0.0.1
Server Port:            1200

Document Path:          /initium/feature/zh-hant
Document Length:        224164 bytes

Concurrency Level:      20
Time taken for tests:   14.996 seconds
Complete requests:      250
Failed requests:        0
Keep-Alive requests:    250
Total transferred:      56141974 bytes
HTML transferred:       56041000 bytes
Requests per second:    16.67 [#/sec] (mean)
Time per request:       1199.655 [ms] (mean)
Time per request:       59.983 [ms] (mean, across all concurrent requests)
Transfer rate:          3656.13 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.4      0       3
Processing:    61  704 374.9    650    6173
Waiting:       60  704 374.8    650    6170
Total:         61  704 374.9    650    6173

Percentage of the requests served within a certain time (ms)
  50%    650
  66%    686
  75%    710
  80%    734
  90%    877
  95%   1029
  98%   1101
  99%   1102
 100%   6173 (longest request)

@vercel
Copy link

vercel bot commented Dec 22, 2020

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/diy/rsshub/kjxy7ezkd
✅ Preview: https://rsshub-git-fork-oppilate-speedup.diy.vercel.app

@DIYgod
Copy link
Owner

DIYgod commented Dec 22, 2020

entities should be moved from devDependencies to dependencies

"entities": "2.1.0",

@oppilate
Copy link
Contributor Author

Wired. So yarn add entities won't add it to dependencies if it's already in devDependencies. Fixed now.

@oppilate
Copy link
Contributor Author

And parse5 used by cheerio accounts for another 1/3 processing time. Cheerio is capable of handling htmlparser2's parsed dom tree, which is reportedly 5 times faster than parse5. But that requires one more line in each cheerio parsing call:

// Usage as of htmlparser2 version 3:
const htmlparser2 = require('htmlparser2');
const dom = htmlparser2.parseDOM(document, options);

const $ = cheerio.load(dom);

I won't do it right now. Just to take a note here.

@HenryQW HenryQW changed the title chore: optimize performance chore: optimize performance (replace he with entities) Dec 22, 2020
@HenryQW
Copy link
Collaborator

HenryQW commented Dec 22, 2020

htmlparser2 is the default now.

cheeriojs/cheerio#985

@HenryQW HenryQW merged commit dfdb963 into DIYgod:master Dec 22, 2020
@oppilate
Copy link
Contributor Author

oppilate commented Dec 22, 2020

htmlparser2 is the default now.

Well, actually it's htmlparser2 for XML and parse5 for HTML: cheeriojs/cheerio#866 (comment). In RSSHub only the HTML part is used (if I didn't miss anything), which matches what I see in the profiling result.

Anyway, that's just a 3/2 speed up, not so dramatic like 3 times speedup in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants