Update on FieldDB

Its been 3 years since the FieldDB project was launched at CAML in Patzun, Guatemala. Since then the project has graduated into its own GitHub organization with 50+ collaborators and 50+ universities that we know about have been using it. In March we made sure that all the clients and libraries had Google Analytics integration to better understand how users were working with the apps.

This week Veronica has been learning Google Analytics to see how the app has grown. One of the questions Hisako had was  been where in the world are the users and how much time do users spend in the app on average.

Average_User_TImeReturning_Visitor_Time_ComparisonNew_Visitor_TIme_Comparison

Fork me on GitHub

Taking a look at iLanguageCloud user reviews

Its been a few years since Josh originally released the iLanguageCloud project. The iLanguageCloud project uses Jason Davies D3.js cloud library and some statistics to tokenize and identify stopwords so that it can support text in any unicode charset in any language.

Since the app was released a surprising number of users have found the app and have been using it. Users have been requesting features and providing feedback on the Play Store and Chrome Store.

Using @iLanguageLab word cloud to collect & display words to describe the moon. One S uses Word Central for help!

Some teachers have even tweeted about the app!

This summer Veronica will be looking over iLanguageCloud user reviews in order to document what needs to be done in the next releases. First she found that most of the reviews indicate that there are different user groups who have different goals when they open the iLanguageCloud project. Some users want to paste a full text and see a cloud, but most users want to see all the words they paste.

She started by identifying the user types with a CouchBD map reduce and learning how to do statistical analysis in LibreOffice. Once she had identified stats to categorize user types, she added tests for these user types in the codebase using Jasmine.

Users are often creating tag clouds, not full text clouds. We attribute this to users being used to having to pre-filter their words to only the words they want to show with random text sizes rather than text size which depends on their frequency or other factors.

 

While she is learning the tools (Angular.js, Travis) to make the modifications so that her user types tests pass, Veronica created a video tutorial showing how you can use the Chrome app so that users can have some instructions.

 

To help decide features get done first visit our GitHub feature list.

Welcome 2015 Summer Interns

Welcome 2015 summer interns Louisa Bielig who just graduated from a BA Honours at McGill and Veronica Cook-Vilbrin who will be entering Norwich University as a student in the fall.

Louisa recently presented her honours thesis “Resumptive classifiers in Chuj high topic constructions” at GLEEFUL and Harvard Undergraduate Linguistics Colloquia. Louisa was a previous intern on the FieldDB project where she helped build a tool to use the Inuktitut Bible as a corpus to supplement fieldwork. She has been using the FieldDB project for a few years to collect data for her thesis and for her research advisor’s projects. This summer she will be using Git, SublimeRegular ExpressionsYeoman, Angular.js, Jasmine, and CouchDB to improve the tools which users use to clean their data.

You can follow Louisa’s work on Github: https://github.com/louisa-bielig

 

Veronica is a former mechanical engineering student turned to psychology. In preparation for running experiments and automating statistical analyses this summer she will be learning GitSublimeCouchDB, LibreOffice, Google Analytics, Yeoman, Angular.js, Jasmine and Travis to give iLanguageCloud users an update based both on the what the users are requesting, but also based on a behavioural analysis of what users have tried to do.

You can follow Veronica’s work on Github: https://github.com/vronvali

 

Testing Android with Computer Vision

This week lab members Farah and Gina will be talking about how to setup and tweek Sikuli tests for Android at GDG Android Montreal. In this talk they show how you can test image heavy, and/or legacy/hybrid android apps using OpenCV (computer vision) and Sikuli.

Sikuli is a framework which automates anything you see on the screen. It uses image recognition to identify and control GUI components. It is useful when there is no easy access to a GUI’s internal or source code, or writing tests crosses layers of technologies ie in a Cordova/HTML5 app running in a webview.

Sikuli is an open source project started at MIT which has grown to be used by developers for diverse types of clicker testing.

 

Here is a video showing how Farah used Sikuli to test a Cordova/HTML5 app running in an Android webview.

 

Slides

 

Curious about the code? You can take a look at Farah’s Sikuli tests on GitHub.
https://github.com/ProjetDeRechercheSurLecriture/dyslex-disorth-game-sikuli
https://github.com/FieldDB/fielddb-spreadsheet-sikuli

 

Recognizing Speech on Android

Tonight Gina and Esma will be presenting their Kartuli Speech Recognition trainer at Android Montreal.
14-08-20 - 1
The talk will shows how to use speech recognition in your own Android apps. The talk will start with a demo of the Kartuli trainer app to set the context for the talk, and then dig into the code and Android concepts under the demo. The talk has something for both beginner and advanced Android devs, namely  two ways to do speech recognition: the easy way (using the built-in RecognizerIntent for the user’s language) and the hard way (building a recognizer which wraps existing open source libraries if the built-in RecognizerIntent can’t handle the user’s language). While Gina was in Batumi she and some friends built an app so that Kartuli users (code) (slides) (installer) could train their Androids to recognize SMS messages and web searches. Recognizing Kartuli is one of the cases where you can’t use the built-in recognizer.
 
  • How to use the default system recognizer’s results in your own Android projects,
  • How to use the NDK in your projects,
  • How to use PocketSphinx (a lightweight recognizer library written in C) on Android

Installer
Live broadcast on YouTube
Code is open sourced on GitHub

Presentation at ComputEL workshop @ ACL 2014

This week Joel and Gina presented some of the work lab members Josh, Theresa, Tobin and Gina and interns ME, Louisa, Elise, Yuliya and Hisako have done on the LingSync project as part of their 20 minute presentation “LingSync & the Online Linguistic Database: New models for the collection and management of data for language communities, linguists and language learners” at the  Computational Approaches to Endangered Languages workshop at the 52nd Annual Meeting of the Association for Computational Linguistics (ACL).

 

 

Abstract:

LingSync and the Online Linguistic
Database (OLD) are new models for the
collection and management of data in
endangered language settings. The LingSync
and OLD projects seek to close a
feedback loop between field linguists, language
communities, software developers,
and computational linguists by creating
web services and user interfaces (UIs)
which facilitate collaborative and inclusive
language documentation. This paper
presents the architectures of these tools
and the resources generated thus far. We
also briefly discuss some of the features
of the systems which are particularly helpful
to endangered languages fieldwork and
which should also be of interest to computational
linguists, these being a service that
automates the identification of utterances
within audio/video, another that automates
the alignment of audio recordings and
transcriptions, and a number of services
that automate the morphological parsing
task. The paper discusses the requirements
of software used for endangered language
documentation, and presents novel data
which demonstrates that users are actively
seeking alternatives despite existing software.

Download full paper as .pdf or .tex


acl_2014

 

Week 7: Searching for court cases in Kartuli

Since Kartuli is an agglutinative language with very rich verb morphology searching for appropriate results is very difficult. Over the past few weeks of observing it seems like most Kartuli speakers prefer to search using Russian search engines, using Russian vocabulary. Mari (who is a lawyer) and Gina decided to create a corpus of law cases in Kartuli, and see if the FieldDB glosser can help build a stemmer that might be used for searching in Georgian.

While Mari was teaching Gina and Esma how to use the Georgian court websites, in the middle she showed them how she modifies her search terms to get some results in supreme court cases, unlike the constitutional court search page which lets you search for an empty string and see all results… This was an illuminating experience of searching as a minority language speaker, so we decided to share it as an unlisted YouTube video despite the poor image quality.

Supreme Court

* Requires search to find documents
* Need to use very general search terms to get any results, and results you get are not always relevant to your case you are working on
* Documents are .html which is excellent for machines but Mari didn’t seem to excited about it, we will ask her more later

vs

Constitutional Court
* Requires no search to find documents
* Documents are in .doc format which users are used to
* Easy to download documents so you can read them offline when you are in the village, or put on a usb key if you are using someone else’s computer for the internet.

 

Week 6: How much of a Kartuli speaker’s virtual life is actually provided in their native language?

This week we documented our findings about what popular apps and operating systems are available in Kartuli, and to what extent. The result was pretty good, but we identified two ways we could help, by showing Kartuli speakers how they can contribute to Chrome and Android localization.

We found out that because of how Google localizes Android, contributing translations for minority languages is extremely time consuming for the Android team, which means they wont be able to accept our help, not for Kartuli, not for Migmaq.

On the other hand, Chromium translations are managed using Launchpad and it is entirely possible to help out. Esma began contributing reviews and novel translations, we are waiting news to find out if she was successful!

Week 5: Viewing the web through Kartuli Glasses

After meeting some local software developers we found that

  •  Many technical words are simply transliterations of English into Kartuli, and
  •  Many iPhone users don’t have a Georgian keyboard, as a consequence roughly 5% of comments on Facebook are in romanized Kartuli.
  • The most popular browser in Georgia (in Batumi, and the villages which are who we are able to ask) is actually Chrome!
  • Georgians go to school 100% in Kartuli, even during the USSR times. They have a very very high fluency in their native alphabet and reading in general.

This meant if we built a Chrome Extension which can transform all English letters into their Kartuli equivalent, then Georgians who aren’t entirely fluent with the English alphabet can read more content on the web. So far it seems to work great for Facebook, and for Google plus, but it can also be used on any web page!

kartuli_glasses