Skip to content

Developing linguistic metrics and making linguistic decisions in annotated data Developing linguistic metrics for the analysis of Ancient Greek

aniseferreira edited this page Mar 27, 2017 · 1 revision

Date: Thursday, March 30, 2017, 17h00-18h15 (CEST time)

Session coordinators: Eleni Bozia (University of Florida) and Anise D'Orange Ferreira (UNESP)

YouTube link: https://www.youtube.com/watch?v=RbereoguSd4

Slides:


Summary

This lecture will give an introduction into the possible usages of annotated data. It will furnish the development of a metric system that is tailored to the analysis of the syntactical construction of Attic oratory, classical and Imperial.

Another part of the session will show some examples of how we’ve been searching and querying Perseus Ancient Greek Dependency Treebank (AGDT) with Tündra @ Weblicht, in the context of undergraduate classes, at UNESP, in Araraquara, Brazil. We use treebank annotated corpus to answer questions and solve problems in learning Greek syntax and treebank annotation. Manual treebank annotation is a way to register our readings syntactically, a process that is not always clear to the annotator and readers, when ancient Greek is the language, even if guidelines are available. Being able to search, query and compare similar language cases of annotation in a database provides comparisons and opportunities for language discussions and decision making in annotation process

Outline

  1. Theoretical background (10 minutes)
  2. Metric System (10 minutes)
  3. Case Studies (5 minutes)
  4. Online tool (10 minutes)
  5. Getting access to Tündra@ Weblicht (5 minutes)
  6. Perseus Ancient Greek Dependency Treebank on Github (corpus) (5 minutes)
  7. Understanding Celano's guide for Tündra (10 minutes)
  8. Looking for: combination of particles, word repetitions, ἄν, participles and MWE (15 minutes)

Required readings

Bamman, D., Crane G. (2006) “The Design and Use of a Latin Dependency Treebank. In Proceedings of the Fifth Workshop on Treebanks and Linguistic Theories (TLT 2006). Prague: ÚFAL MFF UK, 67-78.

Bamman, D. & Crane, G. (2010). Corpus Linguistics, Treebanks and the Reinvention of Philology. Informatik, 1 p.542-551 Available: http://subs.emis.de/LNI/Proceedings/Proceedings176/558.pdf (Portuguese translation: Linguística de corpus, treebanks e a reinvenção da filologia, In Ferreira (org) Introdução aos textos clássicos na era digital do terceiro milênio. P.19-32 http://www.letraria.net/site/introducao-aos-textos-classicos-na-era-digital-do-terceiro-milenio/)

Bamman, D. & Crane, G. (2008). Guidelines for the Syntactic Annotation of the Ancient Greek Dependency Treebank (1) . Available: http://nlp.perseus.tufts.edu/syntax/treebank/agdt/1.1/docs/guidelines.pdf (Portuguese translation + appendix with complementary examples https://drive.google.com/open?id=0BzWgyyc96J7LaVd2R201SE5fTjA)

Bamman D., Passarotti M., Crane G., Raynaud S. (2007) Guidelines for the Syntactic Annotation of Latin Treebanks, «Tufts University Digital Library», 2007.

Bozia, E. (2016). Atticism: the language of 5th-century oratory or a quantifiable stylistic phenomenon? In Celano, G. (ed.) Special Issue on Treebanks. Open Linguistics 2.1. https://www.degruyter.com/view/j/opli.2016.2.issue-1/opli-2016-0029/opli-2016-0029.xml?format=INT

Celano, G. (2014) Guidelines for the Ancient Greek Dependency Treebank 2.0. (https://github.com/PerseusDL/treebank_data/blob/master/AGDT2/guidelines/Greek_guidelines.md

Celano, G. (2016).Querying the Ancient Greek and Latin Dependency Treebank using Tündra. Google drive doc: https://docs.google.com/document/d18dABOV0Y2w6Ax8_oYNwC4CNTazOoybn9uB1wPUjqpE/edt

Further readings

Bamman, D., Passarotti, F., Busa, R., Crane, G. (2008) “The Annotation guidelines of the Latin Dependency Treebank and Index Thomisticus Treebank.” In Proceedings of the 5th SaLTMiL Workshop. Morocco 2008, 71-76.

Celano, G.; Crane, G. & Madjy, S. 2016. “Part of Speech Tagging for Ancient Greek”, Open Linguistics. Volume 2, Issue 1, ISSN (Online) 2300-9969, DOI: https://doi.org/10.1515/opli-2016-0020.

Hajič, J. (1998) “Building a syntactically annotated corpus: The Prague Dependency Treebank.” In E. Hajiˇcová, ed., Issues of Valency and Meaning. Studies in Honor of Jarmila Panevová, 12-19. Prague Karolinum, Charles University Press.

Mambrini, F. (2016) “The Ancient Greek Dependency Treebank: Linguistic Annotation in a Teaching Environment.” In G. Bodard & M. Romanello, eds., Digital Classics Outside the Echo-Chamber: Teaching, Knowledge Exchange & Public Engagement, 83-99. Ubiquity Press: London. Martens, Scott (2013). TüNDRA: A Web Application for Treebank Search and Visualization. In: Proceedings of The Twelfth Workshop on Treebanks and Linguistic Theories (TLT12), Sophia 2013, 133—144. Available:

Pajas, P., Štěpánek, J. (2009) “System for Quering Syntactically Annotated Corpora.” In Proceedings of the ACL-IJCNLP 2009 Software Demonstrations, Suntec, Singapore, 3 August 2009, 33-36.

Passarotti M. (2015) “What you can do with linguistically annotated data. From the Index Thomisticus to the Index Thomisticus Treebank.” In V. Roszak Piotr, eds., Reading Sacred Scripture with Thomas Aquinas. Hermeneutical Tools, Theological Questions and New Perspectives, 3-14. Belgium: Brepols Press.

Essay title

Discuss the possibilities of data querying against the backdrop of the development of metrics particular to certain authors, eras, or literary genres.

Practical exercises

  • Annotate at least two passages that from a philological viewpoint differ with regards to their syntactical construction and use the set metrics to quantify their style. You may also develop extra metrics that are more descriptive to your selections.
  • Search AGDT on Tündra for nominalized participles used in same and different cases, compare annotations and discuss them.
Clone this wiki locally