In the famous lawsuit against American tobacco companies, 14 million documents were collected and digitized. In order to facilitate the use of these documents by the lawyers, setting up an automatic classification system is needed. We assume that this task was entrusted to us.
We have in our possession a random sample of the documents : 3482 text files. Our main mission is to extract the text from these files and train some classifiers. The details of the methodology and the tasks performed are presented in the notebook.
-
Notifications
You must be signed in to change notification settings - Fork 0
License
Benakrab/NLP-Tobacoo
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published