- How to use the default system recognizer’s results in your own Android projects,
- How to use the NDK in your projects,
- How to use PocketSphinx (a lightweight recognizer library written in C) on Android
This week Joel and Gina presented some of the work lab members Josh, Theresa, Tobin and Gina and interns ME, Louisa, Elise, Yuliya and Hisako have done on the LingSync project as part of their 20 minute presentation “LingSync & the Online Linguistic Database: New models for the collection and management of data for language communities, linguists and language learners” at the Computational Approaches to Endangered Languages workshop at the 52nd Annual Meeting of the Association for Computational Linguistics (ACL).
LingSync and the Online Linguistic
Database (OLD) are new models for the
collection and management of data in
endangered language settings. The LingSync
and OLD projects seek to close a
feedback loop between field linguists, language
communities, software developers,
and computational linguists by creating
web services and user interfaces (UIs)
which facilitate collaborative and inclusive
language documentation. This paper
presents the architectures of these tools
and the resources generated thus far. We
also briefly discuss some of the features
of the systems which are particularly helpful
to endangered languages fieldwork and
which should also be of interest to computational
linguists, these being a service that
automates the identification of utterances
within audio/video, another that automates
the alignment of audio recordings and
transcriptions, and a number of services
that automate the morphological parsing
task. The paper discusses the requirements
of software used for endangered language
documentation, and presents novel data
which demonstrates that users are actively
seeking alternatives despite existing software.
Since Kartuli is an agglutinative language with very rich verb morphology searching for appropriate results is very difficult. Over the past few weeks of observing it seems like most Kartuli speakers prefer to search using Russian search engines, using Russian vocabulary. Mari (who is a lawyer) and Gina decided to create a corpus of law cases in Kartuli, and see if the FieldDB glosser can help build a stemmer that might be used for searching in Georgian.
While Mari was teaching Gina and Esma how to use the Georgian court websites, in the middle she showed them how she modifies her search terms to get some results in supreme court cases, unlike the constitutional court search page which lets you search for an empty string and see all results… This was an illuminating experience of searching as a minority language speaker, so we decided to share it as an unlisted YouTube video despite the poor image quality.
* Requires search to find documents
* Need to use very general search terms to get any results, and results you get are not always relevant to your case you are working on
* Documents are .html which is excellent for machines but Mari didn’t seem to excited about it, we will ask her more later
* Requires no search to find documents
* Documents are in .doc format which users are used to
* Easy to download documents so you can read them offline when you are in the village, or put on a usb key if you are using someone else’s computer for the internet.
This week we documented our findings about what popular apps and operating systems are available in Kartuli, and to what extent. The result was pretty good, but we identified two ways we could help, by showing Kartuli speakers how they can contribute to Chrome and Android localization.
We found out that because of how Google localizes Android, contributing translations for minority languages is extremely time consuming for the Android team, which means they wont be able to accept our help, not for Kartuli, not for Migmaq.
On the other hand, Chromium translations are managed using Launchpad and it is entirely possible to help out. Esma began contributing reviews and novel translations, we are waiting news to find out if she was successful!
After meeting some local software developers we found that
- Many technical words are simply transliterations of English into Kartuli, and
- Many iPhone users don’t have a Georgian keyboard, as a consequence roughly 5% of comments on Facebook are in romanized Kartuli.
- The most popular browser in Georgia (in Batumi, and the villages which are who we are able to ask) is actually Chrome!
- Georgians go to school 100% in Kartuli, even during the USSR times. They have a very very high fluency in their native alphabet and reading in general.
This meant if we built a Chrome Extension which can transform all English letters into their Kartuli equivalent, then Georgians who aren’t entirely fluent with the English alphabet can read more content on the web. So far it seems to work great for Facebook, and for Google plus, but it can also be used on any web page!
Since FieldDB began we have dreamed of being able to import long video/audio elicitation sessions and semi-automatically creating datum which correspond roughly to utterances. After a few weeks of coding and testing, long audio import is ready for users to try.
After talking with members of the TLG volunteers (Teach Learn Georgia) when they come down from the mountains for the weekend, it looks like older volunteers (August 2013) could share what they have learned in the field with newer volunteers (March 2014) using our open source code base called “Learn X” which makes it possible to create an Android App that one or many users can use to create their own language learning lessons together using their Androids to take video, picture or record audio, backed by the FieldDB infrastructure for offline sync. Like heritage learners, TLG volunteers spend their time surrounded with the language and can understand more than they can speak, and what they speak about is highly dependent on their families and what their family speaks about most.
This week Josh released iLanguage Cloud alpha on Google Play. iLanguageCloud is a fun Android application that allows users to generate word clouds using an Android share intent. Clouds can be exported as SVG for use in high-res graphics applications such as InkScape or as png for sharing with friends on Facebook, Google+, Twitter, Flickr or any other social applications a user choses.
Create, save, and share word clouds from text on any website or any another application, share your word cloud with friends and colleagues.
To vote for features, check out the GitHub milestones page.
iLanguage Cloud uses Jason Davies’ D3 word cloud layout engine, you can find his source code on his GitHub repository.
Hisako and Elise went to Carlton University this week to present at FEL 2013, the 17th Conference of the Foundation for Endangered Language. The theme of this year’s workshop was Endangered Languages Beyond Boundaries: Community Connections, Collaborative Approaches, and Cross-Disciplinary Research.
Elise McClay (BA ’12), Erin Olson (BA ’12), Carol Little (BA ’12), Hisako Noguchi (Concordia), Alan Bale (Concordia), Jessica Coon (McGill) and Gina (iLanguage Lab) presented an electronic poster titled “LingSync: Using Technology to Bridge Gaps between Speakers, Learners, and Linguists.”
Source code available on GitHub.