Google, in collaboration with leading African universities and research organisations, has officially launched WAXAL — a large‑scale, open‑access speech dataset designed to expand the reach of artificial intelligence technologies across Africa by enabling AI systems to understand and generate African languages.
The initiative targets one of the biggest challenges in artificial intelligence today: the lack of high‑quality speech data for African languages, which has historically limited the development of voice recognition, text‑to‑speech and other voice‑enabled AI tools for local users.
What Is WAXAL and Why Does It Matter
WAXAL is a comprehensive dataset covering 21 Sub‑Saharan African languages, including Hausa, Igbo, Yoruba, Luganda, Acholi, Swahili and several others drawn from diverse linguistic regions across the continent. The collection comprises thousands of hours of transcribed natural speech and hundreds of hours of high‑quality studio recordings — foundational data that developers, researchers, and startups can use to train voice-AI models.
The project was developed over several years with funding and technical support from Google Research Africa and reflects a broader push to make AI technology more inclusive and locally relevant for African communities.
To date, most major AI speech technologies have been trained primarily on data from European and Asian languages, leaving Africa’s more than 2,000 languages severely underrepresented in modern voice systems. This gap has made it difficult for people who speak African languages — particularly those outside major urban centres — to benefit from speech‑based digital services.
Who Built WAXAL
Unlike many similar projects where international organisations collect and retain data, WAXAL was created in true partnership with African institutions, with core contributions led by:
-
Makerere University in Uganda
-
University of Ghana
-
Digital Umuganda in Rwanda
These partner institutions played a central role in collecting, curating and validating the speech data, often working closely with local communities to ensure linguistic authenticity and cultural context.
A key principle of the WAXAL initiative is that these partner organisations retain ownership of the data they helped gather — a model designed to promote equitable access and long‑term local benefit rather than external control of critical language resources.
Scope and Language Coverage
WAXAL’s extensive language coverage spans widely spoken and regionally significant tongues, including but not limited to:
-
Hausa, Igbo, Yoruba
-
Luganda, Acholi, Dholuo
-
Swahili, Lingala, Shona
-
Fulani (Fula), Ewe, Akan family languages
-
Dagbani and others
Together, these languages are spoken by more than 100 million people across Sub‑Saharan Africa.
The dataset’s volume — including over 1,250 hours of transcribed speech and more than 20 hours of studio‑quality recordings — provides a rich foundation for training advanced speech recognition and generation models.
Voices from the Partnership
Speaking on the launch, Aisha Walcott‑Bryant, Head of Google Research Africa, emphasised the importance of building AI infrastructure that reflects local needs rather than importing solutions developed elsewhere.
“The ultimate impact of WAXAL is the empowerment of people in Africa. This dataset provides the critical foundation for students, researchers, and entrepreneurs to build technology on their own terms, in their own languages,” she said.
Academics involved in the project also highlighted the role the dataset will play in strengthening research capability and local innovation. At the University of Ghana, for example, contributions from thousands of volunteers helped create one of the largest coordinated speech collections for West African languages, thereby creating new opportunities in education, agriculture, healthcare, and other sectors.
Similarly, researchers in Uganda noted how the initiative is already catalysing student‑led and faculty‑driven projects to build real‑world applications using speech data that reflects local dialects and cultural contexts.
What WAXAL Enables
With WAXAL publicly accessible, developers anywhere — in Africa or beyond — can use the data to build and improve:
-
Speech recognition systems for African languages
-
Voice assistants and conversational AI tailored to local users
-
Text‑to‑speech tools that speak naturally in indigenous languages
-
Voice‑based applications in education, healthcare, agriculture and government services
Experts say this foundational dataset will also help reduce the digital language divide, enabling tools that previously only worked well in English or a few global languages to function in contexts where most people speak indigenous languages as their first language.
A Foundational Step for African AI
The launch of WAXAL marks a major milestone in the push for language‑inclusive AI — especially in a world where voice technologies are rapidly becoming part of everyday digital life. By empowering African researchers, developers and communities with the data they need, the initiative aims to change not only who builds AI, but also who benefits from it.
As AI continues to expand across sectors such as education, health, agriculture, and business, initiatives like WAXAL could help ensure this transformation is accessible to all, regardless of language or region — a step toward bridging the digital divide in one of the world’s most linguistically diverse regions.

Director
Bio: An (HND, BA, MBA, MSc) is a tech-savvy digital marketing professional, writing on artificial intelligence, digital tools, and emerging technologies. He holds an HND in Marketing, is a Chartered Marketer, earned an MBA in Marketing Management from LAUTECH, a BA in Marketing Management and Web Technologies from York St John University, and an MSc in Social Business and Marketing Management from the University of Salford, Manchester.
He has professional experience across sales, hospitality, healthcare, digital marketing, and business development, and has worked with Sheraton Hotels, A24 Group, and Kendal Nutricare. A skilled editor and web designer, He focuses on simplifying complex technologies and highlighting AI-driven opportunities for businesses and professionals.
