Skip to content

Latest commit

 

History

History
16 lines (10 loc) · 461 Bytes

README.md

File metadata and controls

16 lines (10 loc) · 461 Bytes

prizma

A Feature Extraction and Selection Tool for Categorizing Text Documents

  • Read directory structured and csv formatted datasets

  • Directory to CSV dataset conversion

  • Support for subcategories

  • Feature Extraction including n-grams terms

  • Best Terms selection based on TF-IDF, Mutual Information, Information Gain, and other metrics

  • Extracted features can be saved in WEKA ARFF format.

  • A more detailed documentation is comming soon...