title: Corpus and Text Analysis
subtitle: Text Analytics, day 2
place: Karen Blixens Vej 4 (room 27.0.09)
time: November 11, 2016, 9 AM to 4 PM.
instructor: Kristoffer L. Nielbo (KLN)
contact: kln@cas.au.dk
Text analytics (~ text mining) is a heterogeneous research field that focuses on extraction of meaningful patterns from unstructured and text-heavy data. The meaningful patterns are typically extracted by applying statistical learning (i.e., machine learning) to target data sets from large non-relational databases. In this one-day introductory course, we will go through a generic text analytics pipeline with particular focus on available tools for data preparation and modeling/analysis.
Time | Content | Instructor |
---|---|---|
09:00-10:00 | Text Analytics |
KLN |
10:00-10:30 | Generic Tools |
KLN |
10:30-11:00 | break | |
11:00-12:00 | Data Preparation |
KLN |
12:00-12:30 | Concerns about Preprocessing |
Munksgaard |
12:30-13:30 | lunch break | |
13:30-14:00 | Sentiments |
KLN |
14:00-14:15 | break | |
14:00-15:00 | Clustering |
KLN |
15:00-15:45 | Classification |
KLN |
15:45-16:00 | course evaluation |
Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84.
Brücher, H., Knolmayer, G., & Mittermayer, M.-A. (2002). Document classification methods for organizing explicit knowledge. Institut fur Wirtschaftsinformatik der Universität Bern.
Radovanović, M., & Ivanović, M. (2008). Text mining: Approaches and applications. Novi Sad J. Math, 38(3), 227–234.
Reagan, A., Tivnan, B., Williams, J. R., Danforth, C. M., & Dodds, P. S. (2015). Benchmarking sentiment analysis methods for large-scale texts: A case for using continuum-scored words and word shift graphs. arXiv Preprint arXiv:1512.00531.
Tangherlini, T. R., & Leonard, P. (2013). Trawling in the Sea of the Great Unread: Sub-corpus topic modeling and Humanities research. Poetics, 41(6), 725–749.
While neither mandatory nor strictly necessary, participants will benefit from installing R and Python.