Skip to content

Differences in results compared to Lucene #53

Answered by xhluca
ignorejjj asked this question in Q&A
Discussion options

You must be logged in to vote

Yeah that's normal, we can see the difference in NDCG@10 in the report: https://arxiv.org/abs/2407.03618

There could be a few reasons:

  1. We use a different tokenizer from the Lucene library, which I was not able to find the exact implementation for
  2. The use of stemmer and choice of stopwords can affect the final scores
  3. Our scoring method might differ, we base on the Kamphuis+ survey whereas Pyserini uses Lucene behind the scene, which changes its algorithm (if you can find it, please feel free to share it here)

Overall I think the best way to verify if this is the exact BM25 scoring you want is to implement bm25 manually (should be less than 50lines) and compare it against one of the suppo…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@ignorejjj
Comment options

Answer selected by ignorejjj
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants