Institute for Computational and Data Sciences

NCEMS aims to ‘build a nationwide community' and address research barriers

The U.S. National Science Foundation will support the establishment and operation of the National Synthesis Center for Emergence in the Molecular and Cellular Sciences at Penn State. Credit: U.S. National Science Foundation. All Rights Reserved.

UNIVERSITY PARK, Pa. — The U.S. National Science Foundation (NSF) National Synthesis Center for Emergence in the Molecular and Cellular Sciences (NCEMS) at Penn State will “bring scientists together from different disciplines to integrate diverse data sets to answer transformative scientific questions,” according to Justin Petucci, associate director of NCEMS and Research Innovations with Scientists and Engineers' (RISE) artificial intelligence and machine learning team lead.

The center, which was announced in April, is supported by a $20 million grant from NSF, and will be housed at University Park and managed by Penn State.

The core mission of a synthesis center like NCEMS is to reuse existing publicly available data, rather than generate new experimental data. And with over 100 Petabytes of data publicly available across the globe, the opportunities for discovery are many. NCEMS was inaugurated in May and has 11 members on their leadership team.

“There’s a massive amount of molecular and cellular data available and open questions that have the opportunity to be addressed applying computational and data science techniques to large integrated datasets,” Petucci said. “The center will build a nationwide community, including the formation of working groups consisting of scientists from different disciplines that will be supported by NCEMS.”

To support community-scale synthesis research — which are research projects beyond the capabilities of individual labs to carry out — the team will address four main barriers to progress, according to NCEMS Director Ed O’Brien.

“The leadership team, in combination with the RISE team of engineers at the Institute for Computational and Data Sciences (ICDS) will address the data, methods, team and collaborative challenges that hold back progress in synthesizing these diverse data sets,” said O’Brien.

According to O’Brien, professor of chemistry and ICDS co-hire, a key challenge in integrating this diverse data is the highly specialized working knowledge needed to process, analyze and interpret the data generated from different experimental techniques.

“Combining, for example, mass spectrometry data and next-generation sequencing data requires specialized workflows to go from raw to processed data,” O’Brien said. “You need trained experts to do that right. Individual labs often do not have the resources to handle

such a wide diversity of data. NCEMS will centralize this expertise as a national resource to overcome this challenge.”

With so much diverse data to be integrated, their analysis requires statistical methods to correctly interpret the data and minimize false positives that can easily arise if not handled correctly, according to O’Brien. Further, bringing to bear methods including machine learning and theoretical modeling can be another barrier for individual labs.

“We are going to provide resources and training to graduate students and post-docs to use these diverse methods to carry out the research on these big data sets,” O’Brien said.

The center will also support catalyst meetings to explore and identify potential synthesis questions that could form the basis of a working group.

“We want to make transformative discoveries,” O’Brien said. “The best way to do this is to bring diverse scientific perspectives together to drive new ideas. We are providing resources to help teams form and go after these questions, and we are giving them the tools to research effectively. In an individual lab, scientists may not have access to a comparable network of scientists NCEMS will create.

To combat these challenges, network and research, the center will need collaborative infrastructure.

“The way we are addressing this is through CyVerse, an open science infrastructure platform funded by NSF that allows teams of scientists to share data and information in a uniform environment,” O’Brien said.

The University of Arizona’s CyVerse initiative is considered the “world’s largest publicly funded open-source cyber infrastructure for life sciences,” according to an article from Penn State News.

The ICDS RISE team will be involved in these groups to ensure needs are met. Post-docs, while independent, can also define their own projects and work within groups, get feedback and have access to resources, according to Petucci.

Other key components to NCEMS’s mission includes developing innovative research and analytical strategies, testing novel organizational models with open science principles and training the future workforce.

“This is super exciting to be a part of… to be supporting this type of research,” Petucci said. “NSF only funded one of these centers and Penn State got it. It really is a privilege to grow this national resource that can answer foundational questions.”

Over the first five years, the center will support 34 working groups that have an average of 10 scientists from across the country. The scientists will collaborate remotely on research projects.

The center is focused on studying emergent properties in molecular and cellular systems. Emergence is the unexpected appearance of new system properties as variables of the system change.

“Among many topics the center is interested in, is how protein interaction networks arise, leading to the networks of biochemical transformations within cells,” O’Brien said. “We want to explore how those networks interplay with environmental conditions as well as regulate other cellular processes such as gene expression. There’s an explosion of emergent properties as one moves from individual molecules to collections of molecules and beyond. I am excited about this center because there are major opportunities for transformative discoveries using the data effectively.

“We anticipate interacting with over 1,600 scientists in the first five years, providing workforce training, post-doctoral fellowships, interactions with scientists at meetings and supporting undergraduate research opportunities across the country,” O’Brien said.

NCEMS is also supported by the Huck Institutes of the Life Sciences.

“This center is housed at Penn State University Park because we have fantastic institutes like the Institute for Computational and Data Sciences and The Huck Institutes of the Life Sciences that this center perfectly overlaps with,” O’Brien said. “There is a lot of opportunity for advancing the missions of those institutes and the center.” To learn more about the center, visit the NCEMS website.

Last Updated August 14, 2024