Geisel Researchers Employ Machine Learning on Instagram Data to Identify Substance Use Risk

Findings from an innovative study conducted by a team of researchers at Dartmouth’s Geisel School of Medicine and published in the journal Neuropsychopharmacology, point to a promising new approach for identifying substance use risk through the use of machine learning and social media data.

Substance use continues to be a pervasive and burdensome public health issue. In the U.S., the use or misuse of alcohol, tobacco, and drugs are among the top 10 causes of preventable deaths. The misuse of illicit and prescription drugs alone, driven in large part by the opioid crisis, leads to more than 100 deaths daily from overdoses.

Saeed Hassanpour, PhD (Photo by Rob Strong)

“For the study, our team developed a machine learning model known as deep neural networks, a form of artificial intelligence that uses statistical techniques to give computers the ability to ‘learn’ from data without having to be explicitly programmed,” explains Saeed Hassanpour, PhD, an assistant professor of biomedical data science and of epidemiology at Geisel, and an adjunct assistant professor of computer science at Dartmouth College.

The approach allowed researchers to automatically classify individuals’ risk for substance use based on content from their posts on Instagram—a very popular social networking app, especially among young adults, he says. The Hassanpour Lab develops computational methods and tools for extracting and organizing clinically meaningful information from a wide range of biomedical data.

Nearly 2,300 consenting adult participants (over age 18), who were recruited primarily through on-line advertising on an incentive-based crowdsourcing platform, took part in the research project. As part of the study, participants completed a web-based survey about their substance use, based on the National Institute on Drug Abuse Modified Alcohol, Smoking, and Substance Involvement Screening Tool (known as NIDA-Modified ASSIST).

To train and evaluate their deep learning model, the researchers were given permission to extract a large amount of anonymized data—which included a total of 466,227 images, as well as accompanying captions and comments, from the Instagram posts—for analysis. When compared to the “gold standard” (NIDA-Modified ASSIST), their model was able to detect risk for alcohol use with high accuracy. In addition, the researchers found that white participants who were younger and posted fewer captions or comments, but shared more facial images, had an elevated risk for alcohol use compared to their counterparts.

“To the best of our knowledge, our results are the first to indicate that machine learning approaches can be used to identify potential substance use risk behavior, such as alcohol use, among social media users,” says Hassanpour.

While the study’s approach was successful in predicting alcohol risk, it was unable to reliably detect tobacco, prescription drug, or illicit drug use. “We found there wasn’t enough data for us to analyze and learn from in the other risk categories,” he says. “We think the relative acceptability of alcohol consumption, and people’s willingness to share information about it, may have played an important role in the results.”

In the future, Hassanpour and his colleagues will work to improve and build on their deep learning model by targeting recruitment of higher substance using populations and developing a similar risk assessment approach for other behavioral health disorders, such as depression. They also hope to extend their research to other social network platforms like Facebook and Twitter.

“We believe that the widespread use of social media and the recent adoption of machine learning methods in the biomedical community provides us with a unique opportunity—to enhance our understanding of substance use and addiction, and to develop novel avenues for clinical research and treatment that can help more people,” he says.

This research was supported in part by a National Institute on Drug Abuse grant (P30DA029926), as well as a pilot grant from the Office of the Provost at Dartmouth College.