Category Archives: FieldLinguistics

Presentation at ComputEL workshop @ ACL 2014

This week Joel and Gina presented some of the work lab members Josh, Theresa, Tobin and Gina and interns ME, Louisa, Elise, Yuliya and Hisako have done on the LingSync project as part of their 20 minute presentation “LingSync & the Online Linguistic Database: New models for the collection and management of data for language communities, linguists and language learners” at the  Computational Approaches to Endangered Languages workshop at the 52nd Annual Meeting of the Association for Computational Linguistics (ACL).

 

 

Abstract:

LingSync and the Online Linguistic
Database (OLD) are new models for the
collection and management of data in
endangered language settings. The LingSync
and OLD projects seek to close a
feedback loop between field linguists, language
communities, software developers,
and computational linguists by creating
web services and user interfaces (UIs)
which facilitate collaborative and inclusive
language documentation. This paper
presents the architectures of these tools
and the resources generated thus far. We
also briefly discuss some of the features
of the systems which are particularly helpful
to endangered languages fieldwork and
which should also be of interest to computational
linguists, these being a service that
automates the identification of utterances
within audio/video, another that automates
the alignment of audio recordings and
transcriptions, and a number of services
that automate the morphological parsing
task. The paper discusses the requirements
of software used for endangered language
documentation, and presents novel data
which demonstrates that users are actively
seeking alternatives despite existing software.

Download full paper as .pdf or .tex


acl_2014

 

Fork me on GitHub

Using Technology to Bridge Gaps @Carlton University

Hisako and Elise went to Carlton University this week to present at FEL 2013, the 17th Conference of the Foundation for Endangered Language. The theme of this year’s workshop was Endangered Languages Beyond Boundaries: Community Connections, Collaborative Approaches, and Cross-Disciplinary Research.

Elise McClay (BA ’12), Erin Olson (BA ’12), Carol Little (BA ’12), Hisako Noguchi (Concordia), Alan Bale (Concordia), Jessica Coon (McGill) and Gina  (iLanguage Lab) presented an electronic poster titled “LingSync: Using Technology to Bridge Gaps between Speakers, Learners, and Linguists.”

Hisako and Elise demo Tobin's app at FEL

Hisako and Elise demo Tobin’s app at FEL 2013

Source code available on GitHub.

Computational Field Workshop @McGill

On May 27th the Mi’gmaq Partnership (Listuguj, McGill, iLanguage) will be hosting its first Computational Field Workshop at McGill. Lab members Hisako, Gina, Josh and Tobin along with Louisa and Carol presented some of their recent scripts and tools developed as part of the partnership.

The workshop will focus on computational tools for transcribing, storing and searching linguistic data. There is a special focus on fieldwork, but it should be of broader interest as well––no background required.

In addition to work by Montreal-based iLanguage Lab, a key partner in the Mi’gmaq Partnership, the workshop will feature a talk and workshop by keynote speaker Alexis Palmer.

More details can be found in the workshop program.

The workshop will be held at the Thompson House, McGill.

The workshop will be held at the Thompson House, McGill.

Dyslexia and dying languages? There’s an app for that.

The Lab’s recent projects get featured on Concordia University’s website.

What do dyslexia in children and endangered languages have in common? Concordia graduate combines her expertise in linguistics and computer programming to tackle both challenges — and more…

http://www.concordia.ca/content/shared/en/news/offices/vpdersg/aar/2012/08/05/dyslexia-and-dying-languages-theres-an-app-for-that.html?rootnav=alumni-friends/news

FieldDB: An on/offline cloud data entry app which adapts to its user’s I-Language.

iLanguage Lab is getting ready to launch FieldDB, a cloud based data entry app created for researchers at McGill, Concordia and University of California Santa Cruz. FieldDB is written in 100% Javascript and uses CouchDB, a NoSQL data store which scales to accomodate large amounts of unstructured data. CouchDB uses Map Reduce to efficiently search across data, a win-win for our clients. FieldDB uses fieldlinguistics and machine learning to automatically adapt to its user’s data. Most importantly, even though FieldDB is a WebApp that runs in your browser, FieldDB can run 100% offline. FieldDB will go into beta testing the first week of July. FieldDB will be officially launched in English and Spanish on August 1st 2012 in Patzun, Guatemala.

FieldDB launch in Patzún Guatemala at CAML.

FieldDB launch in Patzún Guatemala at CAML.

What is FieldDB?

FieldDB is a free, open source project developed collectively by field linguists and software developers to make a modular, user-friendly app which can be used to collect, search and share your data.

Who can I use FieldDB with?

  • FieldDB is a Chrome app, which means it works on Windows, Mac, Linux, Android, iPad, and also offline.
  • Multiple collaborators can add to the same corpus, and you can encrypt any piece of data, keep it private within your corpus, or make it public to share with the community and other researchers.

How can FieldDB save me time?

FieldDB uses machine learning and computational linguistics to adapt to your existing organization of the data which you import and predict how to gloss it. FieldDB already supports import and export of many common formats, including ELAN, Praat, Toolbox, FLEx, Filemaker Pro, LaTeX, xml, csv and more, but if you have another format you’d like to import or export, Contact Us.

What are the principles behind FieldDB?

We designed FieldDB from the ground up to be user-friendly, but also to conform to EMELD and DataOne best practices on formatting, archiving, open access, and security. For more information, see Section 6 of our white paper. We vow never to use your private data, you can find out more in our privacy policy.

Curious how it works? FieldDB is OpenSourced on GitHub

Word Edit Distance Web Widget

If you have a spell checker, you want it to suggest a number of words that are close to the misspelt word. For humans, its easy for us to look at ‘teh’ and know that it is close to ‘the’, but how does the computer know that? A really simple Language Independent way to do it if you don’t have any gold standard data, is to assign costs to the various edits, substitution (2), deletion (1) and insertion (1), and picking the cheapest one.

The table below applies Levenshtein’s algorithm (basically, substitution costs 2) letter by letter. The total distance between the two words, 4 is in the top right corner, because it costs 2 to substitute ‘u’ for ‘i’ and 2 to substitute ‘t’ for ‘k’.

At the Lab, we put together an interactive javascript so that you can input whatever words you like and find out their edit distance. Just enter the words you want to compare!

Word 1:

Word 2:


And if you really like it, you can download it from github.
Click here to read more about edit distance.

Ergative-Absolutive 101

Languages differ in how they overtly mark functions and their arguments, if they overtly mark at all… This month’s iLanguage game shows an example of the “Ergative-Absolutive” system, present in Hindi-Urdu, Walpiri, Inuktitut among others. In reality, its not as complicated as its name might indicate, in fact, we argue its quite logical.

In 9.10.a and 9.10.c we see that the Experiencer of travel is rather consistently is marked with -aq, as is Experiencer of greet in 9.10.b and 9.10.d. What might surprise you if you speak English, French or any other “Nominative-Accusative” language is that the -aq is consistently on the Experiencer, regardless of whether its the subject or the object.

In 9.10.e and 9.10.g we see that 1st person Experiencers appear on the verb, not as pronouns. This might sound familiar if you studied/speak Spanish.

9.10.f is particularly exciting since we don’t have enough data to say what is going on. We recommend stopping the next Yup’ik Eskimo speaker you run into and asking them to give you a verb that ends in a consonant, they might put -aq on the end when you give them a context to bring them to say he xe-ed yesterday

Want to see more language data? Examples are taken from I-Language: An Introduction to Linguistics as Cognitive Science by Daniela Isac and Charles Reiss. http://linguistics.concordia.ca/I-language/

Don't let the vocabulary fool you, despite the article talking about Thor (the god of thunder), this language is Basque, a minority language of Spain. http://eu.wikipedia.org/wiki/Thor

Hope everyone enjoyed the first instalment of the language games.  Our second instalment is right around the corner! So without much further ado here are the long awaited answers!

The eLanguage is Basque!

Basque is a language isolate (surrounded by Indo-European languages). As you can hear from first video, Basque borrows a great deal of vocabulary from Spanish.

In this video you can hear Basque spoken in semi-natural context.

Eliciting Evolving Information Structure and Prosody

iLanguage Lab presented the following poster at the Experimental and Theoretical Advances in Prosody Conference in Montreal, Canada.  

Eliciting evolving information structure and audienceless vs. audience oriented prosodies: experimentation on Android tablets

The poster was based on analysis from data culled from iLanguage’s very own Android App AuBlog.  Have an Android? Head on over to the market and check it out! Want to see the code? Check us out on github!