Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterative searcher #166

Closed
automactic opened this issue Aug 8, 2018 · 16 comments
Closed

Iterative searcher #166

automactic opened this issue Aug 8, 2018 · 16 comments
Assignees
Milestone

Comments

@automactic
Copy link
Member

I remember @mgautierfr told me he is planning on replace getNext styled searcher with an iterator. Do we have plan on if and when this will be finished? Thanks!

@kelson42
Copy link
Collaborator

kelson42 commented Aug 9, 2018

@automactic What is the problem you want to fix?

@automactic
Copy link
Member Author

automactic commented Aug 9, 2018

There is no problem. @mgautierfr talked about updating the searcher to an iterator. I am also trying to do some modifications of my searching async system. I just want to time those two things together.

@stale
Copy link

stale bot commented Nov 21, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

@stale stale bot added the stale label Nov 21, 2019
@kelson42 kelson42 removed the question label Apr 3, 2021
@stale stale bot removed the stale label Apr 3, 2021
@kelson42
Copy link
Collaborator

kelson42 commented Apr 3, 2021

To me, now, we clearly need that.

@automactic
Copy link
Member Author

Could you elaborate why we need it? To me, getNext works, so I'd say it is not required

@kelson42
Copy link
Collaborator

kelson42 commented Apr 4, 2021

@automactic The iterator is a C++ standard pattern to do this kind of operation and I want to use it as far as possible in place of custom API. But this is not the key point we are going to change (even if this is still open to discussion):

  • I want to avoid to buffer the list of results (this should come more or less directly from Xapian)
  • We should be able to iterate through all the results (not only the N first like currently)
  • The search object should "live" longer even if other search requests have be ran (currently it lives only until next iteration). This implies the creation of a search cache.
  • Most of this should happen in the libzim and not in the libkiwix

I hope to somehow document with Matthieu the approach we want to follow this week as a GSoC student will work on that during the summer.

@maneeshpm
Copy link
Contributor

maneeshpm commented Apr 4, 2021

@kelson42 We already have a search_iterator implementation in libzim here that is based on the Xapian::MSetIterator. In kiwix-lib this iterator is wrapped inside a _Result class internally. The getNextResult() method is just incrementing the iterator and returning it wrapped in a _Result object here. This is perhaps done for simplicity(?). You want us to be able to use this iterator directly without a wrapper in kiwix-lib right? or do you have a different implementation idea.

@kelson42
Copy link
Collaborator

kelson42 commented Apr 4, 2021

@maneeshpm I will let @mgautierfr further answer

@mgautierfr
Copy link
Member

mgautierfr commented Apr 6, 2021

This is perhaps done for simplicity(?)

Definitively not :) It was this way a long time ago (before we had a xapian database).
I've keep this API for compatibility reason but it complicate all the stack.

You want us to be able to use this iterator directly without a wrapper in kiwix-lib right?

Yes. See #430

@maneeshpm
Copy link
Contributor

Thanks @mgautierfr! It makes sense now. I agree with your comment on the linked issue. The distinction between libzim and kiwix-lib needs to be clear. This issue gets fixed once we drop the wrappers(or even without dropping them) and adapt the existing kiwix-lib to implement libzim directly.

@kelson42
Copy link
Collaborator

kelson42 commented Apr 6, 2021

@automactic @maneeshpm https://github.com/kiwix/kiwix-tools/issues/97#issuecomment-814111663 gives a good insight in how should be the new searcher architecture.

@kelson42
Copy link
Collaborator

Somehow similar to #146

@kelson42 kelson42 added this to the 10.0.0 milestone May 13, 2021
@kelson42
Copy link
Collaborator

@maneeshpm Might be the riht moment to work on this, or do we have unmet dependences for this ticket?

@maneeshpm
Copy link
Contributor

@kelson42 I feel like this will automatically come as a by-product of dropping the wrappers. When we drop the wrappers, we will shift to using libzim types, and the return type of a search in libzim is an iterator(via search_iterator). So I think it should be better to work on that ticket first(it will simplify solving a lot of other pending tickets as well).

@kelson42
Copy link
Collaborator

@maneeshpm Good to me.

@maneeshpm
Copy link
Contributor

@kelson42 With the merge of #536 we now have exposed zim::SearchIterator to libkiwix and already using it in places like search_renderer and internal_server. I believe this was the expectation from this ticket. If some projects are still using the kiwix::Searcher/kiwix::Reader, we need to go ahead and drop it in favour of libzim structures after the merge of #576.

As far as this issue is concerned, I would call this fixed. We can close this if you and @mgautierfr agree.

@kelson42 kelson42 closed this as completed Jul 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants