You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I built a Movie Search Engine by fine-tuning the BERT model on Bing queries and Quora question triplets.
Search Engine
I implemented BERT using PyTorch and Hugging Face transformers library. I used BERT Small with 6 layers and 10 attention heads petrained by Google.
I fine-tuned two models for symmetric and asymmetric search engines. The fine-tuning pipeline for both algorithms consisted of filtering data and fine-tuning model
using unsupervised approach, i.e Multi-negative ranking Loss with hard negatives. For assymetric model I used a sample from MsMarco tripletes
dataset (Bing queries) and for symmetic model I used Quora question/answer Triplets. Both datasets are available at Paraphrase Data.
Movies Dataset and Indexing
I implemeted th search engine on a IMDB Movie dataset by building a Movie recommender. I used Annoy library to index movie plots to speed up the search process.
Autocomplete
I implemented an autocomplete in my Search Engine by builing a Trie pefix tree from scratch using movie titles. The autocomplete algorithm is not case sensitive.
Repository Structure
Collecting Data and Notebooks
The ML branch containes fine-tuning data and ipython notebook with the training process