UNIVERSITY PARK, Pa. — While artificial intelligence (AI) systems, such as home assistants, search engines or large language models like ChatGPT, may seem nearly omniscient, their outputs are only as good as the data on which they are trained. However, ease of use often leads users to adopt AI systems without understanding what training data was used or who prepared the data, including potential biases in the data or held by trainers. A new study by Penn State researchers suggests that making this information available could shape appropriate expectations of AI systems and further help users make more informed decisions about whether and how to use these systems.
The work investigated whether displaying racial diversity cues — the visual signals on AI interfaces that communicate the racial composition of the training data and the backgrounds of the typically crowd-sourced workers who labeled it — can enhance users’ expectations of algorithmic fairness and trust. Their findings were recently published in the journal Human-Computer Interaction.
AI training data is often systematically biased in terms of race, gender and other characteristics, according to S. Shyam Sundar, Evan Pugh University Professor and director of the Center for Socially Responsible Artificial Intelligence at Penn State.
“Users may not realize that they could be perpetuating biased human decision-making by using certain AI systems,” he said.
Lead author Cheng "Chris" Chen, assistant professor of communication design at Elon University, who earned her doctorate in mass communications from Penn State, explained that users are often unable to evaluate biases embedded in the AI systems because they don’t have information about the training data or the trainers.
"This bias presents itself after the user has completed their task, meaning the harm has already been inflicted, so users don’t have enough information to decide if they trust the AI before they use it,” Chen said
Sundar said that one solution would be to communicate the nature of the training data, especially its racial composition.
“This is what we did in this experimental study, with the goal of finding out if it would make any difference to their perceptions of the system,” Sundar said.
To understand how diversity cues can impact trust in AI systems, the researchers created two experimental conditions, one diverse and one non-diverse. In the former, participants viewed a short description of the machine learning model and data labeling practice, along with a bar chart showing an equal distribution of facial images in the training data from three racial groups: white, Black and Asian, each making up about one-third of the dataset. In the condition without racial diversity, the bar chart showed that 92% of the images belonged to a single dominant racial group. Similarly, for labelers’ backgrounds, balanced representation was maintained with roughly one-third each of white, Black and Asian labelers. The non-diverse condition showed a bar chart conveying that 92% of labelers were from a single racial group.