Skip to content

Latest commit

 

History

History
18 lines (18 loc) · 1.56 KB

README.md

File metadata and controls

18 lines (18 loc) · 1.56 KB

About the project

I built a Movie Search Engine by fine-tuning the BERT model on Bing queries and Quora question triplets.

Search Engine

I implemented BERT using PyTorch and Hugging Face transformers library. I used BERT Small with 6 layers and 10 attention heads petrained by Google. I fine-tuned two models for symmetric and asymmetric search engines. The fine-tuning pipeline for both algorithms consisted of filtering data and fine-tuning model using unsupervised approach, i.e Multi-negative ranking Loss with hard negatives. For assymetric model I used a sample from MsMarco tripletes dataset (Bing queries) and for symmetic model I used Quora question/answer Triplets. Both datasets are available at Paraphrase Data.

Movies Dataset and Indexing

I implemeted th search engine on a IMDB Movie dataset by building a Movie recommender. I used Annoy library to index movie plots to speed up the search process.

Autocomplete

I implemented an autocomplete in my Search Engine by builing a Trie pefix tree from scratch using movie titles. The autocomplete algorithm is not case sensitive.

Repository Structure

Collecting Data and Notebooks

The ML branch containes fine-tuning data and ipython notebook with the training process

Django project

The master branch contains a Django project