UNIVERSITY PARK, Pa. — What a person posts on Facebook could predict their risk for substance use, according to new research led by the Penn State College of Information Sciences and Technology.
In their work, researchers built novel detection systems, using machine learning and natural language processing techniques, that can identify certain kinds of substance use based on an individual’s Facebook posts. They focused their efforts on predicting substance use among homeless youth — a high-risk population with elevated rates of hard drug use.
“Because of their transience, young people experiencing homelessness are extremely hard to reach for research and intervention purposes,” said Anamika Barman-Adhikari, associate professor of social work at the University of Denver. “Social networking sites such as Facebook present an important and accessible data source for understanding the social context of these youths’ substance use behaviors.”
Barman-Adhikari explained that because artificial intelligence is much more sophisticated than the rudimentary modeling techniques that social science researchers often rely on, more accurate and predictive models to capture the complexity of this behavior can be developed.
“This could potentially be helpful to a nonprofit agency that is trying to triage homeless youth into substance users and nonsubstance users in order to direct their limited resources toward people who are likely to engage in substance use,” said Amulya Yadav, PNC Technologies Career Development Assistant Professor at the College of IST and principal investigator.
To build and train their models, the researchers collected more than 135,000 Facebook posts from homeless youth in the last year. In addition, these homeless youth participants were asked to complete a survey providing their demographic information and insight on how they became homeless and how often they feel that they lack companionship. The latter questions are “not directly about demographic characteristics or substance use, yet they can be utilized for substance use predictions,” the researchers wrote in their paper. Most importantly, participants were asked to note which, if any, drugs they used in the last 30 days.
The researchers used machine learning techniques to pre-process the social media posts — such as identifying hashtags, emojis, slang and misspelled words — to get the data into a form where it could be learned by machine learning models. Then, they used the model to analyze the posts.
They found that posts that contained words such as “love” or “sincerely” correlated with the authors not being substance users. On the other hand, if swear words were included in the posts, the authors were more likely to engage in substance use. They also used sentiment analysis tools to classify pieces of text as happy or sad.
“What we found, which is fairly intuitive to expect, is that people who posted more happy posts are less likely to engage in substance abuse, and people who post more angry or sad quotes are more likely to engage in substance abuse,” said Yadav.
While the model has not yet been deployed, Yadav envisions creating a Google Chrome plugin that could be installed in the computer rooms of homeless shelters or drop-in centers. Users could then agree to provide access to their Facebook data, and the information could be provided to case workers.
“Our tool could provide homeless shelters with information about whether an individual is likely or not to engage in substance abuse,” said Yadav. “Then, each individual’s case management plan can be modified to fit their needs based on this information.”
Also collaborating on the project were Zi-Yi Dou, a master’s student at Carnegie Mellon University (CMU); and Fei Fang, assistant professor at CMU. Their work will be presented at the virtual AAAI Conference on Artificial Intelligence this week, where another paper including Yadav, exploring the use of artificial intelligence algorithms in preventing the spread of HIV among homeless youth, is being presented.