LingSync and the Online Linguistic
Database (OLD) are new models for the
collection and management of data in
endangered language settings. The LingSync
and OLD projects seek to close a
feedback loop between field linguists, language
communities, software developers,
and computational linguists by creating
web services and user interfaces (UIs)
which facilitate collaborative and inclusive
language documentation. This paper
presents the architectures of these tools
and the resources generated thus far. We
also briefly discuss some of the features
of the systems which are particularly helpful
to endangered languages fieldwork and
which should also be of interest to computational
linguists, these being a service that
automates the identification of utterances
within audio/video, another that automates
the alignment of audio recordings and
transcriptions, and a number of services
that automate the morphological parsing
task. The paper discusses the requirements
of software used for endangered language
documentation, and presents novel data
which demonstrates that users are actively
seeking alternatives despite existing software.
Hisako and Elise went to Carlton University this week to present at FEL 2013, the 17th Conference of the Foundation for Endangered Language. The theme of this year’s workshop was Endangered Languages Beyond Boundaries: Community Connections, Collaborative Approaches, and Cross-Disciplinary Research.
Elise McClay (BA ’12), Erin Olson (BA ’12), Carol Little (BA ’12), Hisako Noguchi (Concordia), Alan Bale (Concordia), Jessica Coon (McGill) and Gina (iLanguage Lab) presented an electronic poster titled “LingSync: Using Technology to Bridge Gaps between Speakers, Learners, and Linguists.”
On May 27th the Mi’gmaq Partnership (Listuguj, McGill, iLanguage) will be hosting its first Computational Field Workshop at McGill. Lab members Hisako, Gina, Josh and Tobin along with Louisa and Carol presented some of their recent scripts and tools developed as part of the partnership.
The workshop will focus on computational tools for transcribing, storing and searching linguistic data. There is a special focus on fieldwork, but it should be of broader interest as well––no background required.
In addition to work by Montreal-based iLanguage Lab, a key partner in the Mi’gmaq Partnership, the workshop will feature a talk and workshop by keynote speaker Alexis Palmer.
What is FieldDB?
FieldDB is a free, open source project developed collectively by field linguists and software developers to make a modular, user-friendly app which can be used to collect, search and share your data.
Who can I use FieldDB with?
FieldDB is a Chrome app, which means it works on Windows, Mac, Linux, Android, iPad, and also offline.
Multiple collaborators can add to the same corpus, and you can encrypt any piece of data, keep it private within your corpus, or make it public to share with the community and other researchers.
How can FieldDB save me time?
FieldDB uses machine learning and computational linguistics to adapt to your existing organization of the data which you import and predict how to gloss it. FieldDB already supports import and export of many common formats, including ELAN, Praat, Toolbox, FLEx, Filemaker Pro, LaTeX, xml, csv and more, but if you have another format you’d like to import or export, Contact Us.
What are the principles behind FieldDB?
If you have a spell checker, you want it to suggest a number of words that are close to the misspelt word. For humans, its easy for us to look at ‘teh’ and know that it is close to ‘the’, but how does the computer know that? A really simple Language Independent way to do it if you don’t have any gold standard data, is to assign costs to the various edits, substitution (2), deletion (1) and insertion (1), and picking the cheapest one.
The table below applies Levenshtein’s algorithm (basically, substitution costs 2) letter by letter. The total distance between the two words, 4 is in the top right corner, because it costs 2 to substitute ‘u’ for ‘i’ and 2 to substitute ‘t’ for ‘k’.
And if you really like it, you can download it from github. Click here to read more about edit distance.
Languages differ in how they overtly mark functions and their arguments, if they overtly mark at all… This month’s iLanguage game shows an example of the “Ergative-Absolutive” system, present in Hindi-Urdu, Walpiri, Inuktitut among others. In reality, its not as complicated as its name might indicate, in fact, we argue its quite logical.
In 9.10.a and 9.10.c we see that the Experiencer of travel is rather consistently is marked with –aq, as is Experiencer of greet in 9.10.b and 9.10.d. What might surprise you if you speak English, French or any other “Nominative-Accusative” language is that the –aq is consistently on the Experiencer, regardless of whether its the subject or the object.
In 9.10.e and 9.10.g we see that 1st person Experiencers appear on the verb, not as pronouns. This might sound familiar if you studied/speak Spanish.
9.10.f is particularly exciting since we don’t have enough data to say what is going on. We recommend stopping the next Yup’ik Eskimo speaker you run into and asking them to give you a verb that ends in a consonant, they might put –aq on the end when you give them a context to bring them to say he xe-ed yesterday…
Hope everyone enjoyed the first instalment of the language games. Our second instalment is right around the corner! So without much further ado here are the long awaited answers!
The eLanguage is Basque!
Basque is a language isolate (surrounded by Indo-European languages). As you can hear from first video, Basque borrows a great deal of vocabulary from Spanish.
In this video you can hear Basque spoken in semi-natural context.
The poster was based on analysis from data culled from iLanguage’s very own Android App AuBlog. Have an Android? Head on over to the market and check it out! Want to see the code? Check us out on github!