This week Josh released iLanguage Cloud alpha on Google Play. iLanguageCloud is a fun Android application that allows users to generate word clouds using an Android share intent. Clouds can be exported as SVG for use in high-res graphics applications such as InkScape or as png for sharing with friends on Facebook, Google+, Twitter, Flickr or any other social applications a user choses.
Create, save, and share word clouds from text on any website or any another application, share your word cloud with friends and colleagues.
What is FieldDB?
FieldDB is a free, open source project developed collectively by field linguists and software developers to make a modular, user-friendly app which can be used to collect, search and share your data.
Who can I use FieldDB with?
FieldDB is a Chrome app, which means it works on Windows, Mac, Linux, Android, iPad, and also offline.
Multiple collaborators can add to the same corpus, and you can encrypt any piece of data, keep it private within your corpus, or make it public to share with the community and other researchers.
How can FieldDB save me time?
FieldDB uses machine learning and computational linguistics to adapt to your existing organization of the data which you import and predict how to gloss it. FieldDB already supports import and export of many common formats, including ELAN, Praat, Toolbox, FLEx, Filemaker Pro, LaTeX, xml, csv and more, but if you have another format you’d like to import or export, Contact Us.
What are the principles behind FieldDB?
Emmy and Hisako are proud to present the release of Spy or Not, a gamified psycholinguistics experiment made in collaboration with the Accents Research Lab at Concordia University headed by Dr. Spinu.
It is commonly observed that some people are “Good with Accents.” Some people can easily imitate various accents of their native language, while others appear struggle with imitation. This research is dedicated to building free OpenSource phonetics scripts to extract the acoustic components of native speakers and “Good with Accents” speakers to transfer the technical details in a visualizable format to applied linguists on the ground who are working with accented (clinical and non-native) speakers.
In order to collect non-biased judgements from native speakers, a pilot study was designed and run by Dr. Spinu and her students. Images and supporting sound effects were created and the perceptual side of the pilot was disguised as the game “Spy or Not?” The game has since gathered over 8,000 data points by crowdsourcing the judgements to determine the degree (on an 11 point scale) of which participants were “Good with Accents.” This a novel approach to the coding problems that experimenters frequently encounter.
Participation in this project furthers research in phonetics and phonology in addition to experimental methodology in the age of the social web. Our hope is that our readers will Tweet their “Good with Accents” scores and help us get more participants, especially native speakers of Russian English accents, Sussex English accents and South African accents, accents we could never access at the scale we need in a lab setting. Visit the free online game, or play offline by downloading the game at the Chrome Store or on Google Play as a Android App.
If you have a spell checker, you want it to suggest a number of words that are close to the misspelt word. For humans, its easy for us to look at ‘teh’ and know that it is close to ‘the’, but how does the computer know that? A really simple Language Independent way to do it if you don’t have any gold standard data, is to assign costs to the various edits, substitution (2), deletion (1) and insertion (1), and picking the cheapest one.
The table below applies Levenshtein’s algorithm (basically, substitution costs 2) letter by letter. The total distance between the two words, 4 is in the top right corner, because it costs 2 to substitute ‘u’ for ‘i’ and 2 to substitute ‘t’ for ‘k’.
And if you really like it, you can download it from github. Click here to read more about edit distance.
Languages differ in how they overtly mark functions and their arguments, if they overtly mark at all… This month’s iLanguage game shows an example of the “Ergative-Absolutive” system, present in Hindi-Urdu, Walpiri, Inuktitut among others. In reality, its not as complicated as its name might indicate, in fact, we argue its quite logical.
In 9.10.a and 9.10.c we see that the Experiencer of travel is rather consistently is marked with –aq, as is Experiencer of greet in 9.10.b and 9.10.d. What might surprise you if you speak English, French or any other “Nominative-Accusative” language is that the –aq is consistently on the Experiencer, regardless of whether its the subject or the object.
In 9.10.e and 9.10.g we see that 1st person Experiencers appear on the verb, not as pronouns. This might sound familiar if you studied/speak Spanish.
9.10.f is particularly exciting since we don’t have enough data to say what is going on. We recommend stopping the next Yup’ik Eskimo speaker you run into and asking them to give you a verb that ends in a consonant, they might put –aq on the end when you give them a context to bring them to say he xe-ed yesterday…
This month’s Guess the code shows an example of Functional Application which is an operation in compositional semantics. As you all know, in mathematics and in programming a function takes an input argument from some specified domain and yields an output value. Applying a function f to an argument x yields the value for that argument, which can be written as f(x). In beginner semantics, this same procedure is what happens when verb takes its object.
verb(object) ~ function(argument)
The only mystery in this example is the denotation double brackets which indicate that it is the denotation, not the orthographic word, which is being operated upon.
iLanguage is proud to present its first installment of its monthly Language games! Your first task is to identify the language in the first image. Then in the next image you have several more tasks; guess the spectrogram, decipher the programming language and analyze data from a natural language. Post your answers in the comment section. Enjoy!
iLanguage is all around us. Each of us, has a unique background and we use language in unique settings that determines how we speak. This is exemplified in the latest internet meme* referred to under the technical term: “Shit _____ say”
*A meme acts as a unit for carrying cultural ideas, symbols or practices, which can be transmitted from one mind to another through writing, speech, gestures, rituals or other imitable phenomena. [Wikipedia]
In these memes, we see phrases associated with specific groups of people. The obvious candidates show up such as gender, ethnicity, and location. However, perhaps more revealing is how specific some of these memes are. There is pretty much one for every subculture, gamers, hipsters, yogis, republicans, atheists etc.
In addition, people take into account the context, by making memes about what people say to specific groups of people, such as twins, tall girls, and pharmacists. Not only who we are influences how we speak, but who we are speaking to or what we are speaking about. This showcases the role of context in any Natural Language Processing task. Maybe your reaction to these videos is something like this.
What are some phrases, expressions or idioms that are unique to you? What would be included in your “Shit I say” meme?
iLanguageLab members Gina, Emmy and Hisako helped with the data for BrainTripping at last weeks NodeJS Hackathon.
BrainTripping is an artistic interpretation of famous people’s words, with the user invited to mix words of what Jesus, Charles Darwin or Paul Graham said and wrote. Matthew Huebert (@geoshift) had a first glimpse of the idea back at the Disrupt Hackathon, and amazingly, found a team of linguists at the hackathon, who worked on the corpus, data analysis.