The release notes that due to inherent bias in the development process, incorrect results are sometimes generated when Black users vocalize commands to AI-driven technology. Thus, Black users have needed to inauthentically change their voice patterns away from their natural accents to be understood by voice products.
“African American English has been at the forefront of United States culture since almost the beginning of the country” said Dr. Gloria Washington, Howard University researcher and co-principal investigator of Project Elevate Black Voices in a statement. “Voice assistant technology should understand different dialects of all African American English to truly serve not just African Americans, but other persons who speak these unique dialects. It's about time that we provide the best experience for all users of these technologies,” Washington added.
As a part of Project Elevate Black Voices, researchers collected 600 hours of data from users of different African American English dialects in an effort to address implicit barriers to improving ASR performance. Thirty-two states are represented in the dataset.
“It’s our mission at Google to make technology that’s useful and accessible, and I truly believe that our work here will allow more users to express themselves authentically when using smart devices,” said Courtney Heldreth, co-principal investigator at Google Research, in a statement.
According to the release, Howard University will retain ownership of the dataset and licensing, and serve as stewards for its responsible use, ensuring the data benefits Black communities. Google can also use the dataset to improve its own products, ensuring that their tools work for more people.
“As a community-based researcher, I wanted to carefully curate the community activations to be a safe and trusted space for members of the community to share their experiences about tech and AI and to also ask those uncomfortable questions regarding data privacy,” said Dr. Lucretia Williams, project lead and Howard University researcher, in a statement.
According to the release, the Howard African American English Dataset 1.0 will initially be made available exclusively to researchers and institutions within historically Black colleges and universities to ensure that the data is employed in ways that reflect the interests and needs of marginalized communities, specifically African American communities whose linguistic practices have often been excluded or misrepresented in computational systems.