All posts by mdotedot

Spy or Not?

So you think you would make a good spy?

Emmy and Hisako are proud to present the release of Spy or Not, a gamified psycholinguistics experiment made in collaboration with the Accents Research Lab at Concordia University headed by Dr. Spinu.

It is commonly observed that some people are “Good with Accents.”  Some people can easily imitate various accents of their native language, while others appear struggle with imitation.  This research is dedicated to building free OpenSource phonetics scripts to extract the acoustic components of native speakers and “Good with Accents” speakers to transfer the technical details in a visualizable format to applied linguists on the ground who are working with accented (clinical and non-native) speakers.

In order to collect non-biased judgements from native speakers, a pilot study was designed and run by Dr. Spinu and her students. Images and supporting sound effects were created and the perceptual side of the pilot was disguised as the game “Spy or Not?” The game has since gathered over 8,000 data points by crowdsourcing the judgements to determine the degree (on an 11 point scale) of which participants were “Good with Accents.” This a novel approach to the coding problems that experimenters frequently encounter.

Participation in this project furthers research in phonetics and phonology in addition to experimental methodology in the age of the social web. Our hope is that our readers will Tweet their “Good with Accents” scores and help us get more participants, especially native speakers of Russian English accents, Sussex English accents and South African accents, accents we could never access at the scale we need in a lab setting. Visit the free online game, or play offline by downloading the game at the Chrome Store or on Google Play as a Android App.

Fork me on GitHub

Word Edit Distance Web Widget

If you have a spell checker, you want it to suggest a number of words that are close to the misspelt word. For humans, its easy for us to look at ‘teh’ and know that it is close to ‘the’, but how does the computer know that? A really simple Language Independent way to do it if you don’t have any gold standard data, is to assign costs to the various edits, substitution (2), deletion (1) and insertion (1), and picking the cheapest one.

The table below applies Levenshtein’s algorithm (basically, substitution costs 2) letter by letter. The total distance between the two words, 4 is in the top right corner, because it costs 2 to substitute ‘u’ for ‘i’ and 2 to substitute ‘t’ for ‘k’.

At the Lab, we put together an interactive javascript so that you can input whatever words you like and find out their edit distance. Just enter the words you want to compare!

Word 1:

Word 2:


And if you really like it, you can download it from github.
Click here to read more about edit distance.

Don't let the vocabulary fool you, despite the article talking about Thor (the god of thunder), this language is Basque, a minority language of Spain. http://eu.wikipedia.org/wiki/Thor

Hope everyone enjoyed the first instalment of the language games.  Our second instalment is right around the corner! So without much further ado here are the long awaited answers!

The eLanguage is Basque!

Basque is a language isolate (surrounded by Indo-European languages). As you can hear from first video, Basque borrows a great deal of vocabulary from Spanish.

In this video you can hear Basque spoken in semi-natural context.

Language Games

iLanguage is proud to present its first installment of its monthly Language games! Your first task is to identify the language in the first image.  Then in the next image you have several more tasks; guess the spectrogram, decipher the programming language and analyze data from a natural language. Post your answers in the comment section.  Enjoy!

 

The stuff people say

iLanguage is all around us.  Each of us, has a unique background and we use language in unique settings that determines how we speak. This is exemplified in the latest internet meme* referred to under the technical term: “Shit _____ say”

*A meme acts as a unit for carrying cultural ideas, symbols or practices, which can be transmitted from one mind to another through writing, speech, gestures, rituals or other imitable phenomena. [Wikipedia]

In these memes, we see phrases associated with specific groups of people.  The obvious candidates show up such as gender, ethnicity, and location.  However, perhaps more revealing is how specific some of these memes are.   There is pretty much one for every subculture, gamers, hipsters, yogis, republicans, atheists etc.

In addition, people take into account the context, by making memes about what people say to specific groups of people, such as twins, tall girls, and pharmacists.  Not only who we are influences how we speak, but who we are speaking to or what we are speaking about. This showcases the role of context in any Natural Language Processing task. Maybe your reaction to these videos is something like this.

What are some phrases, expressions or idioms that are unique to you? What would be included in your “Shit I say” meme?

BrainTripping at NodeJS Hackathon

iLanguageLab members Gina, Emmy and Hisako helped with the data for BrainTripping at last weeks NodeJS Hackathon.

BrainTripping is an artistic interpretation of famous people’s words, with the user invited to mix words of what Jesus, Charles Darwin or Paul Graham said and wrote. Matthew Huebert (@geoshift) had a first glimpse of the idea back at the Disrupt Hackathon, and amazingly, found a team of linguists at the hackathon, who worked on the corpus, data analysis.

Braintripping is a collaboration between your ideas and the words of famous people.

Braintripping is a collaboration between your ideas and the words of famous people.

The Bacteria Detecto Droid wins at Montreal RHOK

    iLanguageLab member Gina was part of the “Bacteria Detecto Droid” team which won best use of technology at Random Hacks of Kindness Montreal. Check it out in the 24 Heures!

    …un des deux projets gagnants, un laboratoire bactériologique portable. «C’est une application pour téléphone intelligent qui détecte les bactéries présentes dans l’eau, a-t-il expliqué. Elle est capable de dire si l’eau est dangereuse ou non. Le système accumule les données pour produire une carte».

    «On a montré que ça pouvait être fait par un téléphone, s’est réjoui M. Grassick. Ce n’est pas cher et ça peut être utilisé par des locaux dans des pays en développement.»

    The project was one of over 30 winning RHOK projects around the world, we were also mentioned in the The World article “Geeks without Borders”!

    Other projects were equally ambitious. In Portland, developers created an application to allow medical workers to track disease outbreaks in real-time. In Bangalore, hackers built a job database for unskilled workers. In Montreal, developers created an app that can scan a microscopic photo of bacteria taken from water to test for drinking safety—a key tool for poorer countries.

    The code uses OpenCV, an Computer Vision library to process the images on the Android Client, check out the code on GitHub.

Aphasia Assessment on Android

To get a better picture of the patient’s linguistic profile the Touch BAT app records and analyzes the patient’s voice, eye movements and touch location. Analyzing the touch, video and audio during the test reduces the data entry which must be done post assessment, and also permits review of the information later during treatment. Click on each of the thumbnails to see our results!

 

To learn more, check out the poster presented at the Academy of Aphasia 49th Annual Meeting!
Aphasia Assessment on Android: recording voice, eye-gaze and touch for the BAT

To see the code, check it out on github!

 

 

 

iLanguage Wine and Cheese

Thanks everyone for coming to the Wine and Cheese 0.5! We had more than expected (counting by wine glasses, over 30). I had a ton of fun, it was great to see you again, some of who I haven’t seen in nearly 10 years! I know you were a diverse group but I’m really proud of you for all mingling and trying the games after the talks, even walking around collaborating with members of the other Set. My goal was to bring you back to the initial love you felt for your respective fields, remember the spark that got you started.

The games seemed hard, but not so hard when you realized the tools you needed were either in the experts in the room, or in the data itself…

The linguistic party games bottle-taking-home-winners:
The winner: Olivier
-Bonus points for identifying three extraneous words in his English Wordle
-Full credit for correctly identifying his spectrogram!!! We suspect he consulted with Nadya? If so she gets a bottle too next time I see her :)
-Full credit for asking a linguist to teach him (rather than giving him the answer) how to correctly identify the disambiguation point in “The pit bull attacked by the cat was annoyed” (after ‘by’ it is clear that attacked modifies pit bull, rather than pit bull is the subject of attack).
-Full credit for identifying Groovy as an agglutinative verb final language in the code example, although the example does not demonstrate Groovy as an agglutinative language this could be claimed to be true (ironically the only Groovy expert in the room, got a Groovy example…) Full credit would have also been given for saying Groovy is a Creole with the lexical items from Java and syntax from its substrates of Ruby/Python/Perl/Java speakers.

Gina’s pick for Creativity: Julien
-Full credit for guessing LaTeX is an isolating language, and that he had the code to draw a vowel chart.
-Full credit for correctly guessing the ‘unlockable’ tree for the reading “The door is locked and you have the key to open it, thus the door is unlockable.”
-Full credit for 5 min of researching and yet incorrectly concluding that Indonesian was Malay. For those that don’t know, Indonesian and Malay are arguably the same E-Language, proof that E-Language might be a useful socio-political concept but maybe not a linguistic/NLP concept. They have the same stop/functional words and the same morphemes, if he really needed to distinguish them he needed to look for Named Entities aka Proper Nouns.
-Partial credit for trying the spectrograms and using number of syllables as a heuristique to guess (it was a pretty good try for justifying his answer.. his spectrogram had around 14 syllables, his answer had 18)

Hisako’s pick for Creativity: Dr. Witte
-Full credit for guessing that Croatian was Slovakian (they are both slavic, and have the letter ž, although a quick CTL-f in Wikipedia for the suffix -ija might have helped rule out Slovakian)
-Full credit for reading the comments in his source code and guessing that Java was a creole language, although I kind of doubt that creole is the best way to describe it..
-Partial credit for excessive creativity for saying the exact wrong answer: Linguist examine various linguistic systems and develop a) prescriptive grammars. I think Hisako’s judgment has been warped by TA-ing too many LING courses…

Honorable mention: Dr. Hale
-Full credit for correctly drawing the tree described in his Javascript example

Honorable mention: Emmy (MdotEdot)
-Partial credit for insisting that her Finnish wordle wasn’t Finnish due to the rampant mentions of Newcastle, but realizing it was indeed Finnish due to the productive morphology on Newcastle, Newcastlessa, Newcastlen..

Honorable mention: Dr. Bergler
-I remember Hisako was very impressed but I think some of the cards got thrown away in the clean up so I can’t remember… :)

Honorable mention: Peter
-Hisako was also impressed with your answers but I can’t remember which one it was :)

Hisako and I didn’t have a chance to go around to everyone to give them their answers, but email me a picture of your game card and we will email you the answers for your card and explain anything you didn’t get or anything that isn’t from the domain of your Set (ie, linguistics for non-linguistics, code for non-programmers :)

Eliciting Evolving Information Structure and Prosody

iLanguage Lab presented the following poster at the Experimental and Theoretical Advances in Prosody Conference in Montreal, Canada.  

Eliciting evolving information structure and audienceless vs. audience oriented prosodies: experimentation on Android tablets

The poster was based on analysis from data culled from iLanguage’s very own Android App AuBlog.  Have an Android? Head on over to the market and check it out! Want to see the code? Check us out on github!