Skip to content

Latest commit

 

History

History

nyt

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

New York Times annotated corpus

This dataset has two sets of maually-labeled categories, locations and topics. The category names are in locations.txt and topics.txt, respectively. Every line in phrase_text.txt is one document, and every line in label.txt is the ground-truth labels of the corresponding document. The document labels are only used in classification evaluation and not needed for topic mining.