This repo contains the checkpoints and source code for our paper Scaling Sparse and Dense Retrieval in Decoder-Only LLMs.
To use scaling_retriever, first install the requirement packages:
pip install -r requirements.txt
conda install -c pytorch faiss-cpu=1.8.0
We provide two retrieval paradigms: sparse retrieval and dense retrieval. For sparse models:
from transformers import AutoTokenizer
from scaling_retriever.modeling.llm_encoder import LlamaBiSparse
model = LlamaBiSparse.load_from_lora("hzeng/Lion-SP-1B-llama3-marco-mntp")
tokenizer = AutoTokenizer.from_pretrained( "hzeng/Lion-SP-1B-llama3-marco-mntp")
For dense models:
from transformers import AutoTokenizer
from scaling_retriever.modeling.llm_encoder import LlamaBiDense
model = LlamaBiDense.load_from_lora("hzeng/Lion-DS-1B-llama3-marco-mntp")
tokenizer = AutoTokenizer.from_pretrained( "hzeng/Lion-DS-1B-llama3-marco-mntp")
queries = ["What is the capital of France?", "Who wrote '1984'?"]
passages = [
"Paris is the capital of France.",
"George Orwell wrote '1984'."
]
tokenized_queries = tokenizer(queries,
max_length=192,
truncation=True, padding="longest", return_tensors="pt")
tokenized_passages = tokenizer(passages,
max_length=192,
truncation=True, padding="longest", return_tensors="pt")
quey_embeds = model.query_encode(**tokenized_queries)
doc_embeds = model.doc_encode(**tokenized_passages)
scores = torch.matmul(quey_embeds, doc_embeds.T)
print(scores.tolist())
# sparse retrieval scores:
# [
# [14.835160255432129, 0.026406031101942062],
# [0.005473464727401733, 13.909822463989258]
# ]
# dense retrieval scores:
# [
# [0.2877607047557831, 0.13211995363235474],
# [0.1040663793683052, 0.29219019412994385]
# ]
Before running the evaluation scripts, download the required data from the following link:
🔗 MSMARCO Evaluation and Training Data
Once downloaded, place the files in the current directory to ensure proper access during evaluation.
The evaluation benchmarks for MSMARCO include MSMARCO Dev, TREC DL 2019, and TREC DL 2020. To evaluate the 1B sparse model (hzeng/Lion-SP-1B-llama3-marco-mntp), run:
bash scripts/eval_sparse.sh
To evaluate the 8B sparse model, modify line 7 in scripts/eval_sparse.sh
:
Change the model name to: hzeng/Lion-SP-8B-llama3-marco-mntp
Then re-run the script:
bash scripts/eval_sparse.sh
For efficient evaluation, please ensure that you use more than 32 CPUs, as using fewer cores may significantly slow down retrieval.
Our implementation utilizes multi-threading for retrieval in an inverted index, and an insufficient number of CPUs may lead to unexpected performance issues.
Expected runtime: On MS MARCO Dev, retrieval typically completes in ~15 minutes under optimal CPU conditions.
To evaluate the 1B dense model (hzeng/Lion-DS-1B-llama3-marco-mntp), run:
bash scripts/eval_dense.sh
To evaluate the 8B sparse model, modify line 7 in scripts/eval_dense.sh
:
Change the model name to: hzeng/Lion-DS-8B-llama3-marco-mntp
Then re-run the script:
bash scripts/eval_dense.sh
bash scripts/beir/eval_beir_sparse.sh
The default setting is to evaluate hzeng/Lion-SP-1B-llama3-marco-mntp
, change hzeng/Lion-SP-8B-llama3-marco-mntp
in line7 for the 8B model.
bash scripts/beir/eval_beir_dense.sh
The default setting is to evaluate hzeng/Lion-DS-1B-llama3-marco-mntp
, change hzeng/Lion-DS-8B-llama3-marco-mntp
in line7 for the 8B model.
TODO