- How to use the default system recognizer’s results in your own Android projects,
- How to use the NDK in your projects,
- How to use PocketSphinx (a lightweight recognizer library written in C) on Android
Since Kartuli is an agglutinative language with very rich verb morphology searching for appropriate results is very difficult. Over the past few weeks of observing it seems like most Kartuli speakers prefer to search using Russian search engines, using Russian vocabulary. Mari (who is a lawyer) and Gina decided to create a corpus of law cases in Kartuli, and see if the FieldDB glosser can help build a stemmer that might be used for searching in Georgian.
While Mari was teaching Gina and Esma how to use the Georgian court websites, in the middle she showed them how she modifies her search terms to get some results in supreme court cases, unlike the constitutional court search page which lets you search for an empty string and see all results… This was an illuminating experience of searching as a minority language speaker, so we decided to share it as an unlisted YouTube video despite the poor image quality.
* Requires search to find documents
* Need to use very general search terms to get any results, and results you get are not always relevant to your case you are working on
* Documents are .html which is excellent for machines but Mari didn’t seem to excited about it, we will ask her more later
* Requires no search to find documents
* Documents are in .doc format which users are used to
* Easy to download documents so you can read them offline when you are in the village, or put on a usb key if you are using someone else’s computer for the internet.
Since FieldDB began we have dreamed of being able to import long video/audio elicitation sessions and semi-automatically creating datum which correspond roughly to utterances. After a few weeks of coding and testing, long audio import is ready for users to try.
What is FieldDB?
FieldDB is a free, open source project developed collectively by field linguists and software developers to make a modular, user-friendly app which can be used to collect, search and share your data.
Who can I use FieldDB with?
- FieldDB is a Chrome app, which means it works on Windows, Mac, Linux, Android, iPad, and also offline.
- Multiple collaborators can add to the same corpus, and you can encrypt any piece of data, keep it private within your corpus, or make it public to share with the community and other researchers.
How can FieldDB save me time?
FieldDB uses machine learning and computational linguistics to adapt to your existing organization of the data which you import and predict how to gloss it. FieldDB already supports import and export of many common formats, including ELAN, Praat, Toolbox, FLEx, Filemaker Pro, LaTeX, xml, csv and more, but if you have another format you’d like to import or export, Contact Us.
What are the principles behind FieldDB?
Curious how it works? FieldDB is OpenSourced on GitHub
If you have a spell checker, you want it to suggest a number of words that are close to the misspelt word. For humans, its easy for us to look at ‘teh’ and know that it is close to ‘the’, but how does the computer know that? A really simple Language Independent way to do it if you don’t have any gold standard data, is to assign costs to the various edits, substitution (2), deletion (1) and insertion (1), and picking the cheapest one.
The table below applies Levenshtein’s algorithm (basically, substitution costs 2) letter by letter. The total distance between the two words, 4 is in the top right corner, because it costs 2 to substitute ‘u’ for ‘i’ and 2 to substitute ‘t’ for ‘k’.
If you’re interested in Natural Language Processing or you have been scraping and have lots of text data the Stanford NLP class a great opportunity to brush up on your regular expressions and learn some tricks. The professors are Dan Jurafsky and Chris Manning. Dan Jurafsky is a leading researcher on investigating the connection between Prosody and written text, and the co-author of Speech and Language Processing.
Natural language processing is the technology for dealing with our most ubiquitous product: human language, as it appears in emails, web pages, tweets, product descriptions, newspaper stories, social media, and scientific articles, in thousands of languages and varieties. In the past decade, successful natural language processing applications have become part of our everyday experience, from spelling and grammar correction in word processors to machine translation on the web, from email spam detection to automatic question answering, from detecting people’s opinions about products or services to extracting appointments from your email. In this class, you’ll learn the fundamental algorithms and mathematical models for human language processing and how you can use them to solve practical problems in dealing with language data wherever you encounter it.
We are hosting a small bi-monthly NLP get together to discuss and apply the Stanford NLP class to some local Montreal data. If you’re interested you can join us, leave us a comment below and we will tell you about our meeting times.
iLanguage Lab members Theresa and Gina formed part of the Roogle Team at the Cloud Robotics Hackathon 2012. At the hackathon the team worked on two robots, a Darwin-OP robot, a humanoid robot running Ubuntu Linux, and a Rover robot with Arduino controlling movement and an Android as “eyes.”
You can get the code and the Android installer at http://code.google.com/p/roogle-darwin/
The Bacteria Detecto Droid team was recently featured in the Montreal TechWatch. The project was built for researcher John Feighery’s Portable Microbiology Kit by a team of 5, including iLanguage Lab members at Random Hacks of Kindness Montreal back in December. For more updates, checkout the Montreal TechWatch article “Portable Microbiology Lab – There’s an App for That!”
Android phones capable of taking these pictures cost less than 120$. Combine this with affordability of ‘Portable Microbiology kits’, that can be incubated using body heat, and we may end up with a sustainable solution to help fight water problems that plague many parts of the world.
iLanguageLab member Gina was part of the “Bacteria Detecto Droid” team which won best use of technology at Random Hacks of Kindness Montreal. Check it out in the 24 Heures!
…un des deux projets gagnants, un laboratoire bactériologique portable. «C’est une application pour téléphone intelligent qui détecte les bactéries présentes dans l’eau, a-t-il expliqué. Elle est capable de dire si l’eau est dangereuse ou non. Le système accumule les données pour produire une carte».
«On a montré que ça pouvait être fait par un téléphone, s’est réjoui M. Grassick. Ce n’est pas cher et ça peut être utilisé par des locaux dans des pays en développement.»
The project was one of over 30 winning RHOK projects around the world, we were also mentioned in the The World article “Geeks without Borders”!
Other projects were equally ambitious. In Portland, developers created an application to allow medical workers to track disease outbreaks in real-time. In Bangalore, hackers built a job database for unskilled workers. In Montreal, developers created an app that can scan a microscopic photo of bacteria taken from water to test for drinking safety—a key tool for poorer countries.
The code uses OpenCV, an Computer Vision library to process the images on the Android Client, check out the code on GitHub.