Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REGRESSION] Seach ranking broken on as of kiwix/kiwix-serve:3.2.0-2 #742

Closed
thavelick opened this issue Apr 3, 2022 · 5 comments
Closed
Assignees
Milestone

Comments

@thavelick
Copy link
Contributor

thavelick commented Apr 3, 2022

Description

As of the release of kiwix-tools-3.2.0-2 search ranking seems to be completely broken and there are some missing results.

Since there hasn't actually been any code changes on kiwix-tools since this bug started, it must be related to a recent regression on libkiwix.

The problem happens starting with: http://mirror.download.kiwix.org/release/kiwix-tools/kiwix-tools_linux-x86_64-3.2.0-2.tar.gz and does not happen with http://mirror.download.kiwix.org/release/kiwix-tools/kiwix-tools_linux-x86_64-3.2.0-1.tar.gz. This means the bug emerged sometime after 02-Feb-2022 and before 28-Mar-2022.

Steps to reproduce

  1. Install docker
  2. Acquire wikipedia_en_all_maxi_2021-12.zim
  3. Run docker run -v /path/to/your/zimfiles -p 8081:80 kiwix/kiwix-serve:3.2.0-2 wikipedia_en_all_maxi_2021-12.zim
  4. In your browser connect to localhost:8081
  5. Seach for something like eddie murphy cop or kiwix offline. Note that the search result quality is extremely poor
  6. Run docker run -v /path/to/your/zimfiles -p 8080:80 kiwix/kiwix-serve:3.2.0-1 wikipedia_en_all_maxi_2021-12.zim
  7. In your browser connect to localhost:8080.
  8. Seach for something like eddie murphy cop or kiwix offline. Note that the search result quality is high.

Alternatively: build master of kiwix-tools and libkiwix from source and you'll see similar results running kiwix-serve.

Screenshots

In each of these screens, poor results from 3.2.0-2 are on the left, while good results from 3.2.0-1 on are on the right:

20220403_09h21m29s_grim
20220403_09h23m31s_grim

@thavelick
Copy link
Contributor Author

Looking at the resutls count in my screenshots, I'm realizing now that the quality results may not even be in the result set, which means this may not be a ranking issue at all. A result can't rank well if it doesn't exist.

@thavelick
Copy link
Contributor Author

Well now I'm confused. I can no longer reproduce this building from source with kiwix-build. Maybe there was a problem with some dependency of libkiwix like libzim that's been fixed in the last few days? I can still reproduce this with the docker based steps above, so there's still a reall issue, but I suspect this may simply be fixed by cutting a new release of kiwix-tools/kiwix-serve with all it's dependencies updated. What happened and what fixed it is still a mystery to me.

@mgautierfr
Copy link
Member

As said in #722, there is a fix in master in date of 29th march.
Before that, we were internally searching for the given pattern prefixed with "FT:"
On current master, searching for "eddy murphy cop" on wikipedia_en_all_maxi_2020-08.zim (older content but similar) returns correct results.
But searching for "FT:eddy murphy cop" returns results pretty close from your wrong result (Adil El Arbi and Bilall Fallah as first result, ...)

So this is probably the same issue.
kiwix-tools 3.2.0-2 is build with libkiwix 10.1.0
kiwix-tools 3.2.0-1 is build with libkiwix 10.0.1

The faulty commit in PR #620 has been merged between 10.0.1 and 10.1.0.

@thavelick can you confirm this on your side ?

@thavelick
Copy link
Contributor Author

Yeah, I'm good with master now. Thanks!

@holta
Copy link

holta commented Apr 5, 2022

Yeah, I'm good with master now. Thanks!

Likewise, Thanks all for the fix!

( https://download.kiwix.org/nightly/2022-04-05/ appears to repair kiwix-tools 3.2.0-2's serious deterioration with full-text / FT search results. )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants