-
-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Xapian-based catalog search #460
Conversation
CI fails because of the said dependence on kiwix/kiwix-build#477 |
Codecov Report
@@ Coverage Diff @@
## master #460 +/- ##
==========================================
+ Coverage 62.65% 63.00% +0.35%
==========================================
Files 50 50
Lines 3433 3490 +57
Branches 1736 1773 +37
==========================================
+ Hits 2151 2199 +48
- Misses 1280 1289 +9
Partials 2 2
Continue to review full report at Codecov.
|
029f5d4
to
72c566c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not using xapian to do the full query (size, language, creator, name, local, remote, ...) ?
As kiwix-lib now use xapian we must also update meson.
On libzim we use icu (https://github.com/openzim/libzim/blob/master/src/writer/xapianIndexer.cpp#L39-L40) to translate between different ISO language code.
src/library.cpp
Outdated
const auto flags = Xapian::QueryParser::FLAG_PHRASE | ||
| Xapian::QueryParser::FLAG_BOOLEAN | ||
| Xapian::QueryParser::FLAG_LOVEHATE | ||
| Xapian::QueryParser::FLAG_WILDCARD; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not using FLAG_PARTIAL ?
It would allow the user to not enter the end wildcard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It only makes sense if the search endpoint is going to be used for suggestions. @kelson42 What's your opinion on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The end point is for now not used for completion/suggestions but as a "normal" search. Not sure if we have a lot to loose to add FLAG_PARTIAL
but the default choice seems for me, for the moment, to keep it aside. @mgautierfr Maybe you have arguments to use it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used in kiwix-desktop to search in the library (local or remote). Search updated "in live" (no need to validate the search).
It seems legit that if the user search for "wiki" he gets result for all ("wikipedia", "wikispecies", ...)
If we don't, he will have no result until he enters a full word.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am reluctant to use the partial search mode in the backend since there is no indication of where the editing/cursor location in the text is. Though typically text is typed left to right (at least in the languages I am concerned with), it is also possible that the cursor is moved to another location and the text is edited in the middle. It is the front end (which is responsible for the editing of the text search string) that can add a hint about the cursor location in the form of a wildcard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am reluctant to use the partial search mode in the backend
It is the front end [...] than can add a hint [...] in form of a wildcard
I globally agree with that.
However, as the documentation of FLAG_PARTIAL said, the wildcard should be added only in some situation (not a phrase, no a binary search, not ...). It could be pretty difficult to detect in the frontend (and we don't want to have all frontend implement it).
Maybe we could add a "boolean flag" in the api telling if we want a partial search or not and let xapian add it (if it is needed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could add a "boolean flag" in the api telling if we want a partial search or not and let xapian add it (if it is needed).
Done this way.
That was my intention too, but I chose to stick to what #106 asks for. I would like to keep this PR small and enhance the Xapian based catalog search in a follow-up PR. We will have to define how to handle tags. |
Seems legit |
I believe this is rollbacking a commit made a few months ago ?! |
Reminding that this PR is based on #459. It would be better to review and merge that one first. |
72c566c
to
b020798
Compare
Can't we rely on the required indirect xapian dependency brought by https://github.com/kiwix/kiwix-lib/blob/f7608c378ee2d1250ce225c0aaefe2452299d481/meson.build#L46-L49 |
No. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small (style) remark on the new commits. I will not block for that.
I still "Request Changes" as we need to update our dependencies to add xapian.
e3f8b36
to
d68dcfd
Compare
39842b2
to
b7b0bdb
Compare
d68dcfd
to
3d34078
Compare
@mgautierfr @veloman-yunkan Probably time now to rebase ad complete the review process of this PR. |
I am done with my part |
1. Get the subset of books matching the q (title/description) parameter of the search 2. Filter out books not matching the other parameters of the search. Stage 1. currently works in the old way, but will be replaced by Xapian based search in subsequent commits.
The search text in the catalog query is interpreted as partial by default, but partial query mode can be disabled in C++. The latter possibility is not exposed via the /catalog/search kiwix-serve endpoint, though.
Language code is converted from ISO 639-3 to ISO 639 (which is understood by Xapian) via ICU. The previous approach via an explicit map had its advantages since Xapian has more than one stemmer implementations for some languages (selectable via Xapian-specific identifiers). This commit relies on the defaults associated with the ISO 639 language codes.
3d34078
to
20b487d
Compare
Didn't kiwix/kiwix-build#479 have to be merged before this one? |
@veloman-yunkan I have merged it |
This should have been done back in PR #460
This should have been done back in PR #460
This should have been done back in PR #460
This should have been done back in PR #460
This should have been done back in PR #460
Fixes #106
Depends on kiwix/kiwix-build#477. Must be merged after #459.
Changes in functionality:
Notes:
lang
query parameter is used.