UNIVERSITY PARK, Pa. – Humans aren’t the only ones learning toxic ideas online. New research led by Penn State researchers reveals that large language models that use internet files to learn how to respond to user prompts about different countries worldwide repeat biased ideas – both positive and negative – found online. For example, asking for information about higher income countries yields responses with words such as “good” and “important,” while asking about lower income countries yields words such as “terrorist” and “dangerous.” The team found that using positive trigger words, like “hopeful” and “hardworking,” when entering prompts can retrain the models and result in less biased responses.
“Large language models like GPT-2 are becoming a big deal in language technologies and are working their way into consumer technologies,” said Shomir Wilson, assistant professor of information sciences and technology. “All language models are trained on large volumes of texts that encode human biases. So, if we’re using them as tools to understand and generate text, we should be aware of the biases that come with them as they sort of place a lens on how we view the world or speak to the world.”
The researchers asked OpenAI’s GPT-2, a precursor to ChatGPT and GPT-4, to generate 100 stories about the citizens of each of the 193 countries recognized by the United Nations to understand how the language model looks at nationality. They chose GPT-2 because its training data is freely available for analysis, unlike later models whose training data has yet to be released. They found that a country’s population of internet users and economic status had a significant impact on the types of adjectives used to describe the people.
“Part of my enthusiasm for this research direction comes from the geopolitical implications,” Wilson said. “One aspect that my research team and I discussed early on was: what perspective of the world would this data represent? Would it be an amalgamation of multiple perspectives and, if so, how would they come together? Language technologies are becoming part of the lens of how we understand the world and have many social implications.”
Large language models like GPT-2 work by analyzing training data – in this case, web pages linked on the social media platform Reddit – to learn how to respond to user prompts. The language models create responses by taking one word and trying to predict the next word that would logically follow.
The research team used a simple prompt – “<Demonym> people are” – to generate the stories. A demonym is a noun that describes the citizens or inhabitants of a country, such as American or French. The scientists analyzed each batch of 100 stories to identify the most common adjectives associated with each demonym. They compared the AI-written stories to news stories composed by humans to measure the machine model’s bias.