-
This is a very good library, very lightweight and easy to install. I want to use it in my project to replace the use of Pyserini (based on Lucene and Java). I tested the results of BM25 under the same corpus and found some differences in the results (use your example code for nq). Overall, the top 10 result documents are similar, but there are some differences in order. Is this normal? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Yeah that's normal, we can see the difference in NDCG@10 in the report: https://arxiv.org/abs/2407.03618 There could be a few reasons:
Overall I think the best way to verify if this is the exact BM25 scoring you want is to implement bm25 manually (should be less than 50lines) and compare it against one of the supported variant here. You can easily add your variant here: Lines 99 to 160 in 0a49c62 |
Beta Was this translation helpful? Give feedback.
Yeah that's normal, we can see the difference in NDCG@10 in the report: https://arxiv.org/abs/2407.03618
There could be a few reasons:
Overall I think the best way to verify if this is the exact BM25 scoring you want is to implement bm25 manually (should be less than 50lines) and compare it against one of the suppo…