Last week Gina presented LevelDB for Android and Android Montreal, hosted at Notman House.
LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.
Get the code on GitHub.
In one week the Gamify project has gotten roughly 700 participants from around the world, including Kazakstan!
Of the new visitors (which we assume are coming to play for the first time) they are averaging 3.4 pages per visit, most are completing the experiment, which takes an average of 5 minutes to complete. We won’t know for a few weeks how many of the participants have usable data.
Surprisingly, we had a few installs on the Android Market, many of whom also went through all three stages of the game.
Thanks everyone for playing and sharing our game, our goal is 500 participants from Russia, UK and South Africa. It takes only 5 minutes to spread the word, challenge your friends to beat your score!
If you have a spell checker, you want it to suggest a number of words that are close to the misspelt word. For humans, its easy for us to look at ‘teh’ and know that it is close to ‘the’, but how does the computer know that? A really simple Language Independent way to do it if you don’t have any gold standard data, is to assign costs to the various edits, substitution (2), deletion (1) and insertion (1), and picking the cheapest one.
The table below applies Levenshtein’s algorithm (basically, substitution costs 2) letter by letter. The total distance between the two words, 4 is in the top right corner, because it costs 2 to substitute ‘u’ for ‘i’ and 2 to substitute ‘t’ for ‘k’.
iLanguage is all around us. Each of us, has a unique background and we use language in unique settings that determines how we speak. This is exemplified in the latest internet meme* referred to under the technical term: “Shit _____ say”
*A meme acts as a unit for carrying cultural ideas, symbols or practices, which can be transmitted from one mind to another through writing, speech, gestures, rituals or other imitable phenomena. [Wikipedia]
In these memes, we see phrases associated with specific groups of people. The obvious candidates show up such as gender, ethnicity, and location. However, perhaps more revealing is how specific some of these memes are. There is pretty much one for every subculture, gamers, hipsters, yogis, republicans, atheists etc.
In addition, people take into account the context, by making memes about what people say to specific groups of people, such as twins, tall girls, and pharmacists. Not only who we are influences how we speak, but who we are speaking to or what we are speaking about. This showcases the role of context in any Natural Language Processing task. Maybe your reaction to these videos is something like this.
What are some phrases, expressions or idioms that are unique to you? What would be included in your “Shit I say” meme?
An obvious place where natural language must get filtered through technology is texting. However, with the advent of autocorrect, texting has become a rather perilous endeavor with humorous results.
What makes damnyouautocorrect.com so funny??
1. Although “sex-ting” has made headlines, I am pretty sure that the primary purpose of texting for most people is not to send dirty words, and especially not to one’s parents. This is something Apple appears not to have taken into account when creating their autocorrect algorithm. (Or did they?) For most people, dirty words are less frequent than other types of words, particularly for the domain of texting.
2. On the other hand, one thing the autocorrect algorithm does appear to take into account is part of speech. If autocorrect algorithm just returned the closest word, it might not be the same part of speech than the word intended, resulting in just gibberish. But the mistaken texts are funny because they do make sense, and do have semantic meaning, just not the intended meaning.
If autocorrect had provided a verb “broil” or an adjective “broiled” instead of an noun the result would have just merely been weird rather than funny.
It is precisely this mixture of getting some things right and some things wrong that permits these texts to occur. While autocorrect does take the immediate linguistic context into account, it does not consider the context of the text itself, which is not something easily determined by an algorithm.
What do you think? Do you see any patterns in the autocorrect mistakes? Do you have any observations about texting that you think might improve the algorithm?