Skip to content
cesine edited this page May 2, 2011 · 10 revisions

The Concordia LSA will be hosting our first,

Tools For (Field) Linguists Workshop

Facebook Event: http://www.facebook.com/event.php?eid=180073868706498

Saturday April 30 2011

10:30-3:30

Have you ever lost a version of your LaTeX source or your data, and can't get it back? Do you do a lot of repetitive tasks, want to automate them? Tired of writing the glosses for your examples when it could be done automatically? Want to try some programming? Looking for a summer project? The Tools for (Field) Linguists workshop will give you some recipes and tools to save you time so you can focus on your research.

Sample tasks:

  • Finding minimal pairs when creating a phoneme inventory
  • Finding key semantic contexts to distinguish readings
  • Finding morphemes in context
  • Finding function words vs content words in a new language
  • Finding "constructions" like verb+mal in german
  • Finding stimuli for a psycholinguistics experiment
  • Automating the transformation of psycholinguistic results into another format
  • Transliterating alphabets (ex, turn Cyrilic into ascii)
  • Transliterating orthography into phonological representation
The Watchmes for Part 1 are in here for now... we might put them on a server later so its easier for you to watch.

The Watchmes for Part 3 and 4 are in a YouTube playlist.

In this workshop you will learn how to:

  1. use GitHub to view other people's source code, share your code and most importantly, be able to go back in time in your data in case you made a mistake or lost something.
  2. use a Wiki to put keep track of your ideas in your project, collaborate on the same document with someone else, be able to edit at the same time as someone else, and see the changes that were made since you last visited the document (ps, you can turn a wiki into LaTeX for the final version)
  3. use GATE the General Architecture for Text Engineering. It's an OpenSource software which is used by corpus linguists. It has some surprisingly useful functions. We will look at the stock English Discourse analysis pipeline, and teach you how to code your own morpheme finder for whatever language you want (we will practice on Spanish and Quechua).
  4. adapt some Groovy recipes/scripts to find sentences relevant to your work. Working on the latest and greatest analysis of "en mal" in German, or the "tu" in Quebec French? We will practice using existing corpora/wikipedia/blogs/subtitles to find some relevant sentences so that you can get a variety of data and see the patterns early in your research...
What to bring:
  1. A lunch
  2. A laptop (with GATE installed, see below for installers)
  3. Optional: Some sentences/data/morpheme that you're working on or want to work on so you can get started right away in part 4
Windows: https://sourceforge.net/projects/gate/files/gate/6.0/gate-6.0-build3764-installer-win.exe/download

Mac: https://sourceforge.net/projects/gate/files/gate/6.0/gate-6.0-build3764-installer-mac.dmg/download

Linux: https://sourceforge.net/projects/gate/files/gate/6.0/gate-6.0-build3764-installer-other.jar/download

If you have trouble getting GATE installed you can just come early at 10, and we will install it on your laptop for you.

Clone this wiki locally