Skip to content

antonkravchenko2001/Movie_Search

Repository files navigation

About the project

I built a Movie Search Engine by fine-tuning the BERT model on Bing queries and Quora question triplets.

Search Engine

I implemented BERT using PyTorch and Hugging Face transformers library. I used BERT Small with 6 layers and 10 attention heads petrained by Google. I fine-tuned two models for symmetric and asymmetric search engines. The fine-tuning pipeline for both algorithms consisted of filtering data and fine-tuning model using unsupervised approach, i.e Multi-negative ranking Loss with hard negatives. For assymetric model I used a sample from MsMarco tripletes dataset (Bing queries) and for symmetic model I used Quora question/answer Triplets. Both datasets are available at Paraphrase Data.

Movies Dataset and Indexing

I implemeted th search engine on a IMDB Movie dataset by building a Movie recommender. I used Annoy library to index movie plots to speed up the search process.

Autocomplete

I implemented an autocomplete in my Search Engine by builing a Trie pefix tree from scratch using movie titles. The autocomplete algorithm is not case sensitive.

Repository Structure

Collecting Data and Notebooks

The ML branch containes fine-tuning data and ipython notebook with the training process

Django project

The master branch contains a Django project

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published