GitHub - kln-courses/corpustextanalysis

title: Corpus and Text Analysis
subtitle: Text Analytics, day 2
place: Karen Blixens Vej 4 (room 27.0.09)
time: November 11, 2016, 9 AM to 4 PM.
instructor: Kristoffer L. Nielbo  (KLN)
contact: kln@cas.au.dk

Text Analytics

Text analytics (~ text mining) is a heterogeneous research field that focuses on extraction of meaningful patterns from unstructured and text-heavy data. The meaningful patterns are typically extracted by applying statistical learning (i.e., machine learning) to target data sets from large non-relational databases. In this one-day introductory course, we will go through a generic text analytics pipeline with particular focus on available tools for data preparation and modeling/analysis.

Program

Time	Content	Instructor
09:00-10:00	`Text Analytics`	KLN
10:00-10:30	`Generic Tools`	KLN
10:30-11:00	break
11:00-12:00	`Data Preparation`	KLN
12:00-12:30	`Concerns about Preprocessing`	Munksgaard
12:30-13:30	lunch break
13:30-14:00	`Sentiments`	KLN
14:00-14:15	break
14:00-15:00	`Clustering`	KLN
15:00-15:45	`Classification`	KLN
15:45-16:00	course evaluation

Reading material

Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84.

Brücher, H., Knolmayer, G., & Mittermayer, M.-A. (2002). Document classification methods for organizing explicit knowledge. Institut fur Wirtschaftsinformatik der Universität Bern.

Radovanović, M., & Ivanović, M. (2008). Text mining: Approaches and applications. Novi Sad J. Math, 38(3), 227–234.

Reagan, A., Tivnan, B., Williams, J. R., Danforth, C. M., & Dodds, P. S. (2015). Benchmarking sentiment analysis methods for large-scale texts: A case for using continuum-scored words and word shift graphs. arXiv Preprint arXiv:1512.00531.

Tangherlini, T. R., & Leonard, P. (2013). Trawling in the Sea of the Great Unread: Sub-corpus topic modeling and Humanities research. Poetics, 41(6), 725–749.

Other

While neither mandatory nor strictly necessary, participants will benefit from installing R and Python.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
code_R		code_R
code_py		code_py
data		data
slides		slides
04-files_update.md		04-files_update.md
Munksgaard_abstract.md		Munksgaard_abstract.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Analytics

Program

Reading material

Other

About

Releases

Packages

Languages

kln-courses/corpustextanalysis

Folders and files

Latest commit

History

Repository files navigation

Text Analytics

Program

Reading material

Other

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages