-
-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch wikilinks redis requests on article parsing #1719
Conversation
Codecov ReportBase: 68.96% // Head: 69.52% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #1719 +/- ##
==========================================
+ Coverage 68.96% 69.52% +0.56%
==========================================
Files 26 26
Lines 2436 2468 +32
Branches 477 483 +6
==========================================
+ Hits 1680 1716 +36
+ Misses 591 585 -6
- Partials 165 167 +2
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
@uriesk 5% is quite good actually. Don't forget to remove draft when you are so far. |
58d5a26
to
65509ad
Compare
@kelson42 i was hoping for more. It's hard to benchmark because the bandwidth fluctuates by more than those 5% got a question Edit: |
af568c6
to
922f20c
Compare
This looks good now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made a superficial review. It seems to work but I see no speedup at all. That said, the principle is better. I think, and this would be something additional, that each function would benefit of a small comment explaining for what it's for, in particular rewriteUrlNoArticleCheck()
.
Quite a bit of code has been moved/changed, just hope that nothing fondamental in the link treatment has been changed. That only the architecturing has changed.
If OK for @pavel-karatsiuba, I would merge it and give it a try, but don't really expect much impact anyway.
Thought so. That being said, spaming less requests to redis is always a good idea and it's one potential issue out of the way.
|
@pavel-karatsiuba works on #1699 which should be ready very soon. |
Instead of checking every wikilink one-by-one if it is going to be fetched, batch all links of an article together.
Use
HEXISTS
instead ofHGET
for checking the existence of a field.I did some basic benchmarks with smaller wikis and it is about 5% faster at article scrapping phase. I think only the zimfarm on larger wikis can tell us the actual impact.
closes #1718