UNIVERSITY PARK, Pa. — A $1 million grant from the National Science Foundation (NSF) will fund a Big Data Spoke, a consortium of researchers from Penn State, Harvard, Columbia and the University of Pittsburgh, to integrate and analyze health data from disparate sources to discover environmental, ecological, and socio-economic factors that impact the health of individuals and populations.
The spoke, one of 10 being funded by the NSF Big Data Spokes Program, will work in close coordination with the Northeast Big Data Hub, which is one of four such regional Big Data (BD) Hubs funded by NSF to foster collaborative efforts among academia, industry, and government aimed at applying big data technologies to address major societal challenges.
“There is growing evidence that our health depends not only on our genetic makeup but also on our exposure to environmental and socio-economic and other contextual factors,” said Vasant Honavar, the Penn State principal investigator for this project. “The Robert Wood Johnson Foundation report on neighborhoods and health suggests that one's zip code has a greater impact on health outcomes than one's genetic code.”
Honavar is a professor in the College of Information Sciences and Technology (IST) and computer science and associate director of the Institute for Cyberscience, and director of the Center for Big Data Analytics and Discovery Informatics at Penn State.
“Information held in hospital, government, and industry databases could provide extraordinary insights into the influence of poverty, air pollution, climate, and other factors on human health,” said Honavar. “However, much of that data is locked in silos and the ability of researchers and clinicians to harness the power of big data is limited by the lack of means to seamlessly access, analyze, and act upon such data. The project aims to address this gap.”
“The research team will integrate data from the Observational Health Data Sciences and Informatics (OHDSI) network, a virtual data repository that contains millions of longitudinal patient health records with weather, air pollution, income, occupation, and other demographic data," Honavar added. "The team will use machine learning and causal inference algorithms to hunt for links between environmental and socio-demographic factors and health outcomes. The project will also develop training resources (e.g., interactive how-to guides), coordinate cross-institution student internships, and lead a hands-on workshop for researchers interested in using the data. The ultimate goal of the project is to facilitate community-led and collaborative causal discovery through dissemination of integrated and open big data and analytics tools."
"The BD Spokes advance the goals and regional priorities of each BD Hub, fusing the strengths of a range of institutions and investigators and applying them to problems that affect the communities and populations within their regions," said Jim Kurose, assistant director of NSF for computer and information science and engineering. “We are pleased to be making this substantial investment today to accelerate the nation’s big data R&D innovation ecosystem.”
According to Honavar, the project will benefit Penn State’s research and training initiatives in the Data Sciences, including in particular, the Center for Big Data Analytics and Discovery Informatics, the NIH-funded Biomedical Data Sciences Doctoral Training Program and the Clinical and Translational Sciences Institute.