Masakane Project Seeks ML Translation of 2,000 Tribal Dialects

The name “Masakhane” means “We build together” in isiZulu, a South African tribal dialect. The phrase coined by Nelson Mandela represents the hope and inspiration of Africans working towards common goals. With so many tribes unable to communicate unless they are able to speak English, other European languages or Arabic, researchers view the quest to provide translation between tribal tongues as a way of bringing the continent into the 21st Century.

According to a story on venturebeat.com, the Masakhane open source project will rely on the efforts of 60 researchers, developers and programmers to create the Natural Language Processing (NLP) basis for translation between tribal languages.

Kathleen Siminyu, a member of the Luhya tribe in Kenya, speaks English and her tribal language. She joined Masakhane earlier this year as co-organizer of the Women in Machine Learning and Data Science chapter in Nairobi and a coordinator for AI for Development.

“Right now, I’m thinking a lot about how research networks can work on this continent,” Siminyu said. “I see language as a barrier which, if eliminated, allows a lot of Africans to just be able to engage in the digital economy and eventually in the AI economy. As people who are sitting here building for local languages, I feel like it’s our responsibility to … bring the people who are not in a digital age into the age of AI.”

In addition to the project launched by Jade Abbott and Laura Martinus from South Africa following lectures and conversations at Deep Learning Indaba and the Sauti Yetu NLP Unconference, Mozilla and a German government ministry launched an open source project to collect voice data from local African languages. This indicates that the value of the project could open up new educational access between countries, resources and trade throughout Africa with the help of translation technology.

The following link on github.com outlines how the Masakhane contributors will work on the project. The majority of participants are from South Africa, Kenya and Nigeria.

The goal of the project is to help Africans achieve goals from improving agriculture to learning each other’s songs.

Masakhane’s work develop in phases, starting with English translation to African languages using publicly available data, like government documents or newspapers. Then the group plans to create baseline models for translation and submit their work for publication at top NLP conferences around the world.

According to thenextweb.com, German-based OBTranslate intends to achieve the same result as the Masakhane Project. Emmanuel Gabriel, founder of Germany-based OpenBinacle, launched the messaging app six months ago. The project is still in early stages. Under a different name, it initially translated 26 languages, but was plagued with problems.