Skip to content

This project aims to reduce string redundancy by utilizing some of the string similarity concepts of NLP .

License

Notifications You must be signed in to change notification settings

ankithasudarshan/String-Similarity

Repository files navigation

String-Similarity

This repository makes use of Jaccard similarity to eliminate extremely similar strings and acts as a second level of removing duplicates in a dataset.

A comparison between the two string similarity methods-Fuzzywuzzy and Jaccard Similarity has also been done. When tested on a large dataset,Jaccard similarity proved to be faster and more efficient when compared to the Fuzzywuzzy library.The notebook also contains a comparison of the two methods wrt time.

About

This project aims to reduce string redundancy by utilizing some of the string similarity concepts of NLP .

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published