Skip to content

Research with Treebanks

Gabriel Bodard edited this page Nov 4, 2021 · 10 revisions

Sunoikisis Digital Classics: Fall 2021

Session 5: Research with Treebanks

Thursday Nov 4, 17:15–18:45 CET

Convenors: Nicole Iu (U of London), Francesco Mambrini (Milan), Marja Vierros (Helsinki)

Youtube link: https://youtu.be/nQgHA7AYf3Y

Slides: Combined slides (PDF)

Session outline

In this session we will give an introduction to the concept of Treebanking (morpho-syntactic annotation) of Greek and Latin sentences (see the exercise below for a fuller tutorial). We will show an example of treebanking a short passage to better understand the syntax and interpretation of the text. There will then follow a discussion of the PapyGreek project, and the research questions that morphological and syntactic annotation and analysis will help us learn about the language of Greek Papyri. We will present the treebank querying tool DendroSearch, and some demonstrate some concrete research questions that can be addressed by querying a treebanking corpus. We will end by presenting an exercise that students should attempt in their own time.

Seminar readings

  • Mambrini, F. (2016). "The Ancient Greek Dependency Treebank: Linguistic Annotation in a Teaching Environment." In Romanello M. & Bodard G, Digital Classics Outside the Echo-Chamber. London: Ubiquity Press. Available: https://doi.org/10.5334/bat.f
  • Vierros, M. (2018). Linguistic Annotation of the Digital Papyrological Corpus: Sematia. In Nicola Reggiani (Editor), Digital Papyrology II: Case Studies on the Digital Edition of Ancient Greek Papyri (pp. 105–118). Berlin, Boston: De Gruyter. Available: https://doi.org/10.1515/9783110547450-006

Further reading

Other resources

Exercise

  1. Watch the Greek and Latin Treebanking Tutorial offered by Vanessa Gorman and Polina Yordanova, and attempt to annotate either the Greek (Aesop) or Latin (Phaedrus) sentences given at the bottom of that session page using Arethusa.

    1. If you prefer to try treebanking on some English text, look instead at the Guidelines for Universal Dependencies syntax, and when entering your text in Arethusa, select "Click to toggle advanced options..." and then choose the format "UD English" from the list.
  2. Let us try to do some queries like the one used in Mambrini and Passarotti 2016 (see above on Further readings).

    • We start by looking at coordinated subjects governed by any type of verb. In the "Perseus family" of treebanks coordinated elements are governed by the coordinating conjunction. This would yield a structure like VERB -[COORD]->Conjunction-[SBJ_CO]->*.
    • Using DendroSearch, look for this structure in Herodotus and the NT using the treebanks converted from the PROIEL project.
    • Bonus question: can you perform the same query using the same PROIEL treebank in UD, using PML-TQ?
    • Even more bonus questions: can you figure out how to output a table with the totals of plural and singular verbs using PML-TQ? Can you use both DendroSearch and PML-TQ to specify some order constraints (subjects before/after the verb)?

Hint 1: coordination is treated differently in UD. You will have to read, first, how the structure is annotated. Read here carefully. Hint 2: PML-TQ is powerful, but it might be a bit intimidating. The documentation is found here.