AERO: AI Enhancements Responsive to Orality
From AI for OBT to AERO
Oral Bible Translation helps to meet the needs of oral language communities, and oral learners in any spoken language. AERO aims to be responsive, which means listening to the needs of language communities and finding if we have solutions that work to help them. The challenges in oral recording range from frustrating and time-wasting to issues of safety and security that can prevent a project from moving forward at all. We’re looking to deliver a suite of tools (collectively called AERO), that meets oral language users in their context. As developers, we’re broadly grouping these communities into three different expressions of orality (no ranking or progression implied):
Pure Oral Language Communities prefer to use their language only orally and for whom preserving their tradition of oral language use is important.
Oral Language Communities who also read/write in other languages use the language they care about most orally, but also make use of written languages in some contexts (e.g. the language they are taught in school),
Multimodal Language Communities make use of orality along with written language. For example, the Spoken English Bible project meets an oral language need in a multimodal language community. We include in this group any language that aspires to make use of their language in written form, even if they are not currently doing so.
We aim to address the needs of each of these groups, and others that aren’t so easily categorized but when we address the needs of pure oral language communities, it generally benefits all oral language communities. While much of our focus is still on process improvements in Oral Bible Translation, we expect many of these tools to apply more generally to orality.
Meeting the expressed needs of oral language communities
While engaging with oral language communities, we’ve heard a few needs and desires expressed, but we think they are areas where AI can help. Using the groupings from above, here are some of the needs and what we’re working to address them.
Pure Oral Language Communities:
- We’ve heard: “I don’t have the best recording environment, and sometimes there’s noise in the background. I don’t like the way the recording sounds with all of the background noise. Do I have to keep recording these over again to eliminate the noise?”
Noise removal can effectively reduce or eliminate background noises, whether it’s a chicken crowing, an air conditioner running, or other people in the background of an OBT workshop.
- We’ve heard: “We have translators that have been working on recording a translation, but they don’t want us to release it for fear of retaliation against them.” We’ve also heard: “Our team has only men (or only women). We’ve tried to change the pitch with audio tools or speak in a pretend voice, but it doesn’t sound convincing.”
Voice conversion allows us to disguise a speaker’s voice while preserving the clarity of the language, even making it sound like the speaker is a different gender.
- We’ve heard: “It’s really hard to check an oral translation. I have to re-listen to make sure the correct key terms are used.” or “We’ve made a decision as a translation team that we have used the wrong key term. Now we have to go back and correct it in all of our recordings!”
Key term detection and key term find-and-replace will allow you to mark all of the key terms in your audio, and use those to navigate the audio more easily. Find-and-replace will even let you put in a replacement key term from an audio recording. We’re researching better ways to do audio infilling in any language, so that inserted key terms will fit right in.
- We’ve heard: “We can’t do a transcription in our language, so we do an oral backtranslation to another language and then it is transcribed to allow for collaborators to more easily review the work.”
Automated speech recognition provides transcription in over 1000 languages, including probably every language that a reviewer would need. By automating this portion of the process, pure oral language communities can focus on the oral part of the process, without having to bring in someone who can do transcription.
Oral Language Communities who also read/write in other languages:
All of the needs above also apply here, but beyond that we can also meet these other needs below.
- We’ve heard: “We’re working with a translation we’ve already recorded and having a workshop to review it, but it’s hard to quickly review the audio. There’s a lot of listening and re-listening to make sure we understand exactly what it says.”
Phonemic transcription and what we are calling sister-language transcriptioncan help to review oral content more quickly. The transcript is tied to specific points in time, so you can find where the part is you’d like to review and go quickly to it. The sister-language uses a writing system that isn’t completely foreign to the language community. Even in languages that aren’t traditionally written, they can often read and recognize their language when it is transcribed in a sister language.
Multimodal Language Communities:
Again, we find almost all of the above are also useful to multimodal language communities, plus the following.
- We’ve heard: “Our language doesn’t have a formal orthography, but we’re working on one as part of our plan to have the Bible written in our language. It takes a lot of time to transcribe the audio we have, and then to review it and make decisions about how our language should be written”.
Here the phonemic and sister-language transcription are the same as above, but they gain an additional use for multimodal communities. Removing the task of transcription allows the team to focus on the words and their representation in orthography. Developing an orthography becomes a post-editing task on a transcription that doesn’t fit, but is close enough to be useful.
- We’ve heard: “With all that AI can do, will we be able to work on a written draft concurrently with our oral translation project?”
With the above process of transcriptions being corrected to develop an orthography, language communities can build up a collection of audio paired with text, enabling the training of an automated speech recognition system in their own language. Language communities have told us they hope to be able to use such a process to more quickly be able to produce a written translation, possibly even jointly checking the oral and written translations, with the help of AQuA.
- We’ve heard: “We can get Bible-related material translated into our language, but that doesn’t make it accessible to people in our language community that prefer listening to reading.”
As we collect even more audio, text-to-speech models can be trained on a new language. This enables us to produce new models for text-to-speech. Those models can be used in Acts2 for many purposes beyond Bible Translation, including producing materials for increased Bible engagement, trauma healing, and literacy (to name a few). Additionally, text-to-speech models can enable audio infilling from text, not just from speech.