Category Archives: iLanguage

iLanguageCloud on Google Play

This week Josh released iLanguage Cloud alpha on Google Play. iLanguageCloud is a fun Android application that allows users to generate word clouds using an Android share intent. Clouds can be exported as SVG for use in high-res graphics applications such as InkScape or as png for sharing with friends on Facebook, Google+, Twitter, Flickr or any other social applications a user choses.

Create, save, and share word clouds from text on any website or any another application, share your word cloud with friends and colleagues.

iLanguage Cloud is now available on Google Play
iLanguage Cloud is now available on Google Play

To vote for features, check out the GitHub milestones page.

iLanguage Cloud uses Jason Davies’ D3 word cloud layout engine, you can find his source code on his GitHub repository.

FieldDB: An on/offline cloud data entry app which adapts to its user’s I-Language.

iLanguage Lab is getting ready to launch FieldDB, a cloud based data entry app created for researchers at McGill, Concordia and University of California Santa Cruz. FieldDB is written in 100% Javascript and uses CouchDB, a NoSQL data store which scales to accomodate large amounts of unstructured data. CouchDB uses Map Reduce to efficiently search across data, a win-win for our clients. FieldDB uses fieldlinguistics and machine learning to automatically adapt to its user’s data. Most importantly, even though FieldDB is a WebApp that runs in your browser, FieldDB can run 100% offline. FieldDB will go into beta testing the first week of July. FieldDB will be officially launched in English and Spanish on August 1st 2012 in Patzun, Guatemala.

FieldDB launch in Patzún Guatemala at CAML.
FieldDB launch in Patzún Guatemala at CAML.

What is FieldDB?

FieldDB is a free, open source project developed collectively by field linguists and software developers to make a modular, user-friendly app which can be used to collect, search and share your data.

Who can I use FieldDB with?

  • FieldDB is a Chrome app, which means it works on Windows, Mac, Linux, Android, iPad, and also offline.
  • Multiple collaborators can add to the same corpus, and you can encrypt any piece of data, keep it private within your corpus, or make it public to share with the community and other researchers.

How can FieldDB save me time?

FieldDB uses machine learning and computational linguistics to adapt to your existing organization of the data which you import and predict how to gloss it. FieldDB already supports import and export of many common formats, including ELAN, Praat, Toolbox, FLEx, Filemaker Pro, LaTeX, xml, csv and more, but if you have another format you’d like to import or export, Contact Us.

What are the principles behind FieldDB?

We designed FieldDB from the ground up to be user-friendly, but also to conform to EMELD and DataOne best practices on formatting, archiving, open access, and security. For more information, see Section 6 of our white paper. We vow never to use your private data, you can find out more in our privacy policy.

Curious how it works? FieldDB is OpenSourced on GitHub

Spy or Not?

So you think you would make a good spy?

Emmy and Hisako are proud to present the release of Spy or Not, a gamified psycholinguistics experiment made in collaboration with the Accents Research Lab at Concordia University headed by Dr. Spinu.

It is commonly observed that some people are “Good with Accents.”  Some people can easily imitate various accents of their native language, while others appear struggle with imitation.  This research is dedicated to building free OpenSource phonetics scripts to extract the acoustic components of native speakers and “Good with Accents” speakers to transfer the technical details in a visualizable format to applied linguists on the ground who are working with accented (clinical and non-native) speakers.

In order to collect non-biased judgements from native speakers, a pilot study was designed and run by Dr. Spinu and her students. Images and supporting sound effects were created and the perceptual side of the pilot was disguised as the game “Spy or Not?” The game has since gathered over 8,000 data points by crowdsourcing the judgements to determine the degree (on an 11 point scale) of which participants were “Good with Accents.” This a novel approach to the coding problems that experimenters frequently encounter.

Participation in this project furthers research in phonetics and phonology in addition to experimental methodology in the age of the social web. Our hope is that our readers will Tweet their “Good with Accents” scores and help us get more participants, especially native speakers of Russian English accents, Sussex English accents and South African accents, accents we could never access at the scale we need in a lab setting. Visit the free online game, or play offline by downloading the game at the Chrome Store or on Google Play as a Android App.

Word Edit Distance Web Widget

If you have a spell checker, you want it to suggest a number of words that are close to the misspelt word. For humans, its easy for us to look at ‘teh’ and know that it is close to ‘the’, but how does the computer know that? A really simple Language Independent way to do it if you don’t have any gold standard data, is to assign costs to the various edits, substitution (2), deletion (1) and insertion (1), and picking the cheapest one.

The table below applies Levenshtein’s algorithm (basically, substitution costs 2) letter by letter. The total distance between the two words, 4 is in the top right corner, because it costs 2 to substitute ‘u’ for ‘i’ and 2 to substitute ‘t’ for ‘k’.

At the Lab, we put together an interactive javascript so that you can input whatever words you like and find out their edit distance. Just enter the words you want to compare!

Word 1:

Word 2:

And if you really like it, you can download it from github.
Click here to read more about edit distance.

Ergative-Absolutive 101

Languages differ in how they overtly mark functions and their arguments, if they overtly mark at all… This month’s iLanguage game shows an example of the “Ergative-Absolutive” system, present in Hindi-Urdu, Walpiri, Inuktitut among others. In reality, its not as complicated as its name might indicate, in fact, we argue its quite logical.

In 9.10.a and 9.10.c we see that the Experiencer of travel is rather consistently is marked with –aq, as is Experiencer of greet in 9.10.b and 9.10.d. What might surprise you if you speak English, French or any other “Nominative-Accusative” language is that the –aq is consistently on the Experiencer, regardless of whether its the subject or the object.

In 9.10.e and 9.10.g we see that 1st person Experiencers appear on the verb, not as pronouns. This might sound familiar if you studied/speak Spanish.

9.10.f is particularly exciting since we don’t have enough data to say what is going on. We recommend stopping the next Yup’ik Eskimo speaker you run into and asking them to give you a verb that ends in a consonant, they might put –aq on the end when you give them a context to bring them to say he xe-ed yesterday

Want to see more language data? Examples are taken from I-Language: An Introduction to Linguistics as Cognitive Science by Daniela Isac and Charles Reiss.

Functional Application in LaTeX

This month’s Guess the code shows an example of Functional Application which is an operation in compositional semantics. As you all know, in mathematics and in programming a function takes an input argument from some specified domain and yields an output value. Applying a function f to an argument x yields the value for that argument, which can be written as f(x). In beginner semantics, this same procedure is what happens when verb takes its object.

verb(object) ~ function(argument)

The only mystery in this example is the denotation double brackets which indicate that it is the denotation, not the orthographic word, which is being operated upon.

The code also gives us two trees to show that in English functional application applies to the right, and in Turkish it applies to the left.

In (41) we simplify things and pretend that “hug” is a function, which takes “Mary” as its object.” In reality, in most languages “hug” is a complex predicate, itself the return value of a Functional Application between a function (we call it “little v”) and its object, a root. Sound like Javascript anyone?

\begin{example}Typical example of Functional Appilcation (FA)\\
\K{(a) English}\\
& VP_{}\Below{$_\textsc{fa}$}\B{dl}\B{dr} && =\denote{hug}(\denote{NP})\\
{V}_{>}\Below{\denote{hug}} && NP_e\Below{\denote{Mary}}\\
\K{(b) Turkish} \\
& VP_{}\Below{$_\textsc{fa}$}\B{dl}\B{dr} && \hspace{-.2in} =(\denote{NP e})\denote{sardil-di}\\
NP_e\Below{\denote{Mary-e}} && {V}_{>}\Below{\denote{sarid-di}}\\

You can get the code here:

Don't let the vocabulary fool you, despite the article talking about Thor (the god of thunder), this language is Basque, a minority language of Spain.

Hope everyone enjoyed the first instalment of the language games.  Our second instalment is right around the corner! So without much further ado here are the long awaited answers!

The eLanguage is Basque!

Basque is a language isolate (surrounded by Indo-European languages). As you can hear from first video, Basque borrows a great deal of vocabulary from Spanish.

In this video you can hear Basque spoken in semi-natural context.

Language Games

iLanguage is proud to present its first installment of its monthly Language games! Your first task is to identify the language in the first image.  Then in the next image you have several more tasks; guess the spectrogram, decipher the programming language and analyze data from a natural language. Post your answers in the comment section.  Enjoy!


The stuff people say

iLanguage is all around us.  Each of us, has a unique background and we use language in unique settings that determines how we speak. This is exemplified in the latest internet meme* referred to under the technical term: “Shit _____ say”

*A meme acts as a unit for carrying cultural ideas, symbols or practices, which can be transmitted from one mind to another through writing, speech, gestures, rituals or other imitable phenomena. [Wikipedia]

In these memes, we see phrases associated with specific groups of people.  The obvious candidates show up such as gender, ethnicity, and location.  However, perhaps more revealing is how specific some of these memes are.   There is pretty much one for every subculture, gamers, hipsters, yogis, republicans, atheists etc.

In addition, people take into account the context, by making memes about what people say to specific groups of people, such as twins, tall girls, and pharmacists.  Not only who we are influences how we speak, but who we are speaking to or what we are speaking about. This showcases the role of context in any Natural Language Processing task. Maybe your reaction to these videos is something like this.

What are some phrases, expressions or idioms that are unique to you? What would be included in your “Shit I say” meme?

BrainTripping at NodeJS Hackathon

iLanguageLab members Gina, Emmy and Hisako helped with the data for BrainTripping at last weeks NodeJS Hackathon.

BrainTripping is an artistic interpretation of famous people’s words, with the user invited to mix words of what Jesus, Charles Darwin or Paul Graham said and wrote. Matthew Huebert (@geoshift) had a first glimpse of the idea back at the Disrupt Hackathon, and amazingly, found a team of linguists at the hackathon, who worked on the corpus, data analysis.

Braintripping is a collaboration between your ideas and the words of famous people.
Braintripping is a collaboration between your ideas and the words of famous people.