-
Notifications
You must be signed in to change notification settings - Fork 5
Research with Treebanks
Thursday Nov 4, 17:15–18:45 CET
Convenors: Nicole Iu (U of London), Francesco Mambrini (Milan), Marja Vierros (Helsinki)
Youtube link: https://youtu.be/nQgHA7AYf3Y
Slides: Combined slides (PDF)
In this session we will give an introduction to the concept of Treebanking (morpho-syntactic annotation) of Greek and Latin sentences (see the exercise below for a fuller tutorial). We will show an example of treebanking a short passage to better understand the syntax and interpretation of the text. There will then follow a discussion of the PapyGreek project, and the research questions that morphological and syntactic annotation and analysis will help us learn about the language of Greek Papyri. We will present the treebank querying tool DendroSearch, and some demonstrate some concrete research questions that can be addressed by querying a treebanking corpus. We will end by presenting an exercise that students should attempt in their own time.
- Mambrini, F. (2016). "The Ancient Greek Dependency Treebank: Linguistic Annotation in a Teaching Environment." In Romanello M. & Bodard G, Digital Classics Outside the Echo-Chamber. London: Ubiquity Press. Available: https://doi.org/10.5334/bat.f
- Vierros, M. (2018). Linguistic Annotation of the Digital Papyrological Corpus: Sematia. In Nicola Reggiani (Editor), Digital Papyrology II: Case Studies on the Digital Edition of Ancient Greek Papyri (pp. 105–118). Berlin, Boston: De Gruyter. Available: https://doi.org/10.1515/9783110547450-006
- Bamman, D. and Crane, G. (2011). The Ancient Greek and Latin Dependency Treebanks. Language Technology for Cultural Heritage (pp. 79–98). Available: http://people.ischool.berkeley.edu/~dbamman/pubs/pdf/latech2011.pdf
- Celano, Giuseppe G. A and Gregory Crane (2015). "Semantic Role Annotation in the Ancient Greek Dependency Treebank." In Dickinson Markus et al., Proceedings of the Fourteenth International Workshop on Treebanks and linguistic Theories (TLT14), 26-34. Available: http://tlt14.ipipan.waw.pl/files/4614/5063/3858/TLT14_proceedings.pdf
- Celano, Giuseppe G.A. (2019). "The Dependency Treebanks for Ancient Greek and Latin." In Monica Berti (ed), Digital Classical Philology: Ancient Greek and Latin in the Digital Revolution. De Gruyter. Pp. 279–298. Available: https://doi.org/10.1515/9783110599572-016
- Francesca Dell'Oro, Helena Bermúdez Sabel & Paola Marongiu. 2020. “Implemented to Be Shared: the WoPoss Annotation of Semantic Modality in a Latin Diachronic Corpus.” Sharing the Experience: Workflows for the Digital Humanities. Proceedings of the DARIAH-CH Workshop 2019. Available: https://zenodo.org/record/3739440#.XzqoTZMzZTZ
- Gorman, Robert J. (2019). “Author Identification of Short Texts Using Dependency Treebanks without Vocabulary.” Digital Scholarship in the Humanities. Available: https://doi.org/10.1093/llc/fqz070
- Gorman, Vanessa B. & Robert J. Gorman (2016). “Approaching Questions of Text Reuse in Ancient Greek Using Computational Syntactic Stylometry.” Open Linguistics 2, 500-510. Available: https://doi.org/10.1515/opli-2016-0026
- Haug, Dag. 2015. “Treebanks in historical linguistic research.” In Carlotta Viti (ed.), Perspectives on Historical Syntax, Benjamins, pp. 188-202. http://folk.uio.no/daghaug/historical-treebanks.pdf
- Alek Keersmaekers et al. (2019). “Creating, Enriching and Valorising Treebanks of Ancient Greek: the ongoing Pedalion-project.” Paris. Available at: https://syntaxfest.github.io/syntaxfest19/proceedings/papers/paper_68.pdf
- Mambrini, Francesco, 2019. “Nominal vs Copular Clauses in a Diachronic Corpus of Ancient Greek Historians.” Journal of Greek Linguistics 19, 90-113. Available: https://doi.org/10.1163/15699846-01901003
- Mambrini, F. and Passarotti, M. (2016). "Subject-Verb Agreement with Coordinated Subjects in Ancient Greek. A Treebank-Based Study." Journal of Greek Linguistics 16 (2016:1), 87–116. Available: https://doi.org/10.1163/15699846-01601003
- Reggiani, Nicola, 2017. "New Trends in Papyrology. Quantitative anaysis of textual data: past and future of computational linguistics applied to papyrology." Chapter 7.1 in Digital Papyrology I: Methods, Tools and Trends. De Gruyter. Pp. 178–189. Available: https://doi.org/10.1515/9783110547474-007
- Passarotti, Marco (2019). "The Project of the Index Thomisticus Treebank." In Monica Berti (ed), Digital Classical Philology: Ancient Greek and Latin in the Digital Revolution. De Gruyter. Pp. 299–320. Available: https://doi.org/10.1515/9783110599572-017
- Smith, Neel, 2016. "Morphological Analysis of Historical Languages." Bulletin of the Institute of Classical Studies 59.2, 89–102. Available: https://onlinelibrary.wiley.com/doi/10.1111/j.2041-5370.2016.12040.x
- PapyGreek
- DendroSearch
- Guidelines for the Syntactic Annotation of Latin Treebanks (v. 1.3) (esp p. 3-21; 24; 26)
- Guidelines for the annotation of the Ancient Greek Dependency Treebank 2.0. (esp. Chapter 3, including analysis of the hyperlinked examples)
- Guidelines for annotation of Universal Dependencies (UD)
- PML-TQ online tool to search UD
-
Watch the Greek and Latin Treebanking Tutorial offered by Vanessa Gorman and Polina Yordanova, and attempt to annotate either the Greek (Aesop) or Latin (Phaedrus) sentences given at the bottom of that session page using Arethusa.
- If you prefer to try treebanking on some English text, look instead at the Guidelines for Universal Dependencies syntax, and when entering your text in Arethusa, select "Click to toggle advanced options..." and then choose the format "UD English" from the list.
-
Let us try to do some queries like the one used in Mambrini and Passarotti 2016 (see above on Further readings).
- We start by looking at coordinated subjects governed by any type of verb. In the "Perseus family" of treebanks coordinated elements are governed by the coordinating conjunction. This would yield a structure like
VERB -[COORD]->Conjunction-[SBJ_CO]->*
. - Using DendroSearch, look for this structure in Herodotus and the NT using the treebanks converted from the PROIEL project.
- Bonus question: can you perform the same query using the same PROIEL treebank in UD, using PML-TQ?
- Even more bonus questions: can you figure out how to output a table with the totals of plural and singular verbs using PML-TQ? Can you use both DendroSearch and PML-TQ to specify some order constraints (subjects before/after the verb)?
- We start by looking at coordinated subjects governed by any type of verb. In the "Perseus family" of treebanks coordinated elements are governed by the coordinating conjunction. This would yield a structure like
Hint 1: coordination is treated differently in UD. You will have to read, first, how the structure is annotated. Read here carefully. Hint 2: PML-TQ is powerful, but it might be a bit intimidating. The documentation is found here.