Final project for course on deep learning for nlp (IA376E/1s2020 @ Unicamp). This is an implementation of a Two Tower model for solving the problem of document retrieval (and passage ranking) in the dataset MSMarco. The project also uses queries generated using doc2query algotithm. The project is implemented using PyTorch and PyTorch Lighning, deep learning frameworks for Python.
The final article and the plan of work can be found in docs/
.
One can import the model in python or use as a script.
Example of training using model as module:
from src.model import TwoTower
from pytorch_lightning import Trainer
model = TwoTower(**model_args)
trainer = Trainer(**trainer_args)
trainer.fit(model)
Example of training using train script:
python -m src.train --gpus 1 --batch_size 32
There's also a colab notebook showing the usage in notebooks/train.ipynb
and notebooks/example.ipynb
.