This repository makes use of Jaccard similarity to eliminate extremely similar strings and acts as a second level of removing duplicates in a dataset.
A comparison between the two string similarity methods-Fuzzywuzzy and Jaccard Similarity has also been done. When tested on a large dataset,Jaccard similarity proved to be faster and more efficient when compared to the Fuzzywuzzy library.The notebook also contains a comparison of the two methods wrt time.