Skip to content

Computational Linguistics

Gabriel Bodard edited this page Dec 2, 2021 · 14 revisions

Sunoikisis Digital Classics: Fall 2021

Session 9: Computational Linguistics

Thursday Dec 2, 17:15–18:45 CET

Convenors: Alek Keersmaekers (KU Leuven), Martina Astrid Rodda (Oxford)

Youtube link: https://youtu.be/hPGw1yNTZUs

Slides: Combined slides (PDF)

Session outline

This session will introduce some questions and approaches in Computational Linguistics. We will start by discussing how the approaches we will examine today build on what we saw in the previous sessions on search tools, text analysis, treebanking and translation alignment. We will also give a broad overview of Computational Linguistics as applied to ancient languages: what are the main questions it tries to address and the work that has been done for Greek and Latin, an introduction to the most important concepts and a discussion of the challenges that Greek and Latin present. We will also include two case studies: the first will illustrate how a computational approach can be used to study literary features, specifically the behaviour of formulae in early Greek epic poetry (Homer, Hesiod, and the Homeric hymns). Quantitative data shows that the behaviour of recurring set phrases in early epic poetry is measurably different from both set phrases in non-epic material and recurring expressions in later epic. The second case study discusses so-called ‘transformer’-based approaches to natural language processing, which use neural networks trained on a large corpus to obtain detailed mathematical representations about the usage of a given word. It will show how Electra, a transformer model tailored to languages with a smaller corpus, can considerably improve the state of the art for natural language processing in Greek through a number of examples.

Seminar readings

  • Yannis Assael, Thea Sommerschield & Jonathan Prag. 2019. “Restoring ancient text using deep learning: a case study on Greek epigraphy.” Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 6368–6375. Available: https://arxiv.org/abs/1910.06262
  • Florent Perek. “Using Distributional Semantics to Study Syntactic Productivity in Diachrony: A Case Study.” Linguistics 54 (2016): 149–88. Available: https://doi.org/10.1515/ling-2015-0043.

Further reading

  • Kristina Gulordava. 2018. Word order variation and dependency length minimisation : a cross-linguistic computational approach. Thèse de doctorat : Univ. Genève. Available: https://doi.org/10.13097/archive-ouverte/unige:106855. (Esp. chapter 3, “The DLM principle and word order variability at the language level,” pp. 64-106).
  • Jenset, Gard B. 2013. “Mapping Meaning with Distributional Methods: A Diachronic Corpus-Based Study of Existential There.” Journal of Historical Linguistics 3: 272–306.
  • Alek Keersmaekers. 2020. “Automatic Semantic Role Labeling in Ancient Greek Using Distributional Semantic Modeling.” Proceedings of 1st Workshop on Language Technologies for Historical and Ancient Languages, pp. 59–67. Available: https://www.aclweb.org/anthology/2020.lt4hala-1.9
  • Alek Keersmaekers, 2020. “Creating a richly annotated corpus of papyrological Greek: The possibilities of natural language processing approaches to a highly inflected historical language.” Digital Scholarship in the Humanities 35-1, pp. 67–82. DOI: https://doi.org/10.1093/llc/fqz004 (not open access)
  • Mike Kestermont & Justin A. Stover. 2016. "The Authorship of the Historia Augusta: Two new computational studies." Bulletin of the Institute of Classical Studies 59.2. Pp. 140–157. Available: https://onlinelibrary.wiley.com/doi/epdf/10.1111/j.2041-5370.2016.12043.x
  • Lebani, Gianluca E., Marco Silvio Giuseppe Senaldi, and Alessandro Lenci. 2015. “Modeling Idiom Variability with Entropy and Distributional Semantics.” In Proceedings of the 6th Conference on Quantitative Investigations in Theoretical Linguistics. Tübingen: Universität Tübingen. Available: https://colinglab.humnet.unipi.it/wp-content/uploads/2016/01/Lebani_Senaldi_Lenci.pdf.
  • Barbara McGillivray. 2013. Methods in Latin Computational Linguistics. Brill. (Not open access.)
  • Marton Ribary & Barbara McGillivray. 2020. “A Corpus Approach to Roman Law Based on Justinian’s Digest.” Informatics 7, 44. Available: https://doi.org/10.3390/informatics7040044
  • Rodda, Martina Astrid, Marco Silvio Giuseppe Senaldi, and Alessandro Lenci. 2017. “Panta Rei: Tracking Semantic Change with Distributional Semantics in Ancient Greek.” Italian Journal of Computational Linguistics 3, no. 1: 11–24. Available: http://ceur-ws.org/Vol-1749/paper46.pdf.
  • Rodda, Martina Astrid, Philomen Probert, and Barbara McGillivray. 2019. “Vector Space Models of Ancient Greek Word Meaning, and a Case Study on Homer.” TAL – Traitement Automatique Des Langues 60, no. 3. Available: https://www.repository.cam.ac.uk/bitstream/handle/1810/301542/Rodda%20Probert%20McGillivray%20TAL_Ancient_Greek.pdf.

Other resources

Exercise

  1. Go back to the exercises for sessions 1 (Philological Search Tools) and 3 (Text Analysis with Voyant). Choose one of the texts you already looked at and discuss:
    • How would you apply (one of) the approaches discussed in this session to the analysis of your target text?
    • What resources would you need (e.g. corpora, lemmatised/treebanked texts, software)?
    • Are these resources available, and where?
    • (How) would the approaches discussed in this session allow you to develop your initial research questions from sessions 1 and 3?