UNIVERSITY PARK, Pa. — Health care professionals have long theorized that medical conditions, like asthma or cancer, are a result of not only genetics but also environmental and lifestyle factors. These theories, however, have been difficult to prove conclusively without the ability to analyze large-scale population health data.
Now, thanks to a seed grant supporting Penn State strategic priorities, University researchers are hoping to realize the promise and potential of big data to advance biomedical research by creating a Digital Collaboratory for Precision Health Research.
In the project, collaborators are aiming to establish a secure infrastructure for studying health conditions through shared data. Ultimately, a piece of software could be used to analyze disparate data sets, and researchers could use computation and data to better understand why different health problems occur in different demographics.
“We are a product of our genes, behavior and environment,” said Vasant Honavar, professor and Edward Frymoyer Chair in the College of Information Sciences and Technology (IST) and the project lead. “In order to improve health, we need to look beyond treating individuals who are sick and need to understand the underlying genetic, environmental and behavioral factors so we can develop effective interventions.”
Rather than one-size-fits-all medical treatments, the group hopes to further advance personalized health by contributing new methods and tools that enhance individualized care.
“In this project, we want to focus on the environmental aspect to make progress towards realizing the grand vision of personalizing health.”
A massive undertaking, the team will bring expertise from across the University, including the Center for Big Data Analytics and Discovery Informatics, Penn State College of Medicine, the College of Information Sciences and Technology, the Institute for CyberScience, the Clinical and Translational Sciences Institute, the College of Engineering, the Eberly College of Science, and the Social Science Research Institute, among many others. The team will also leverage existing collaborations with colleagues around the nation.
Safeguarding patient information
While health care providers have an immense repository of patient information through their clinical and personally identifiable information, it is rarely utilized by the research community due to logistical and privacy concerns.
“The key challenge in working with that data is that electronic health records contain sensitive information,” Honavar said. “There are many barriers to sharing health data.”
Penn State’s solution aims to allow investigators to analyze health data to answer specific research questions while adhering to all applicable data access and use policies that safeguard sensitive information.
The infrastructure would allow researchers with approved projects to integrate and analyze the relevant data sets on the platform using reproducible and shareable analytic workflows.
“The personal health data never leaves the secure platform,” Honavar explained.
Since safeguarding patient information is of the utmost importance, Honavar hopes that other institutions will be able to implement their own systems and that cross-institutional analyses can be conducted.
“We plan to build this infrastructure and share it with other institutions,” he said.
Having a network of similar platforms could accelerate discovery in the biomedical and health sciences. For example, Honavar theorized how a group of researchers studying risk factors that contribute to breast cancer could examine data from multiple sources for a more comprehensive understanding of potential causes.
“They could receive similar data from different sites,” he explained. “While Institution A doesn’t have to access Institution B’s data, the researchers can conduct similar analyses across multiple sites.”
Harnessing the power of big data
Traditionally, medical research is conducted through clinical trials with results extrapolated to the population at large. But this new collaborative effort has the potential to upend that method.
Access to large data sets, combined with rapid advances in computing and advanced analytics allow researchers in diverse disciplines to gain new insights by analyzing large data sets. The group aims to leverage recent advances in data sciences to derive actionable findings that could dramatically improve health care.