Academics

Data sciences program blends interdisciplinary training for a growing industry

Intercollege program is among the first nationally for undergraduates

Penn State’s data sciences program was established to position graduates to address some of the world’s most challenging problems. As an increasing number of companies begin to produce larger data sets, there is a growing demand for skilled data scientists to analyze the information and present solutions.  Credit: iStockphoto. All Rights Reserved.

UNIVERSITY PARK, Pa. —  While working as a stock clerk at his hometown grocery store near Pittsburgh, Vince Trost’s manager constantly tasked him with putting mayonnaise on the shelves.

“I started to ask myself ‘why do we need so much mayonnaise?’” said Trost. “The greater Cranberry area didn’t need so much mayo.”

After learning that his manager indiscriminately ordered the stock based on estimated demand for upcoming sales, Trost turned to Google. He wanted to know if there was a company that could optimize orders based on historical sales data. That’s when he discovered the up-and-coming field of data sciences.

“I found all these cool articles about companies using data to solve really big problems,” he said.

A short time later, Trost saw a news story with a headline of “Penn State announces three new majors in data sciences,” which further piqued his interest.

He put the wheels in motion to request a change of campus from Penn State Behrend to University Park, and in a matter of two days, Trost was among the first data sciences students at Penn State.

Now, as the first graduate of Penn State’s intercollege data sciences degree program, which launched in 2016, he is one of many that will take what he learned in the new program into his career. 

Pioneering a program

Penn State is among the first U.S. institutions to offer a comprehensive data sciences degree program at the undergraduate level. The intercollege initiative between the College of Information Sciences and Technology, College of Engineering, and Eberly College of Science trains students to analyze large-scale data sets to address an expanding range of problems in industry, government and research.

“This major is the only one so far around the country, and maybe around the world, that aims to provide students with knowledge and skills from three related disciplines: statistics, computer science and informatics,” said John Yen, professor of information science and technology and coordinator of the applied data sciences program in the College of IST. “[Students] learn not only methods and tools from these three [areas], but also the synergy of combining them.”

First- and second- year students take common core classes and focus on one of three options: applied data sciences in the College of IST; computational data sciences in the College of Engineering; or statistical modeling data sciences in the Eberly College of Science. Then, students in all three options come together in their junior and senior years for shared capstone experiences.

“It’s not three different options; it’s one program,” explained Yen. “The three options give students the opportunity to dig into one type of method of their choice.”

Daniel Kifer, associate professor of computer science in the College of Engineering’s School of Electrical Engineering and Computer Science, noted that the more data a business or organization has, the more it can help solve problems.

“However, one needs the right set of skills to analyze it properly and efficiently,” he said. “It is very easy to be misled by data, to make false discoveries, and to reach harmful conclusions. Our goal is to teach students with massive data sets and produce analyses that can be trusted.”

Penn State’s data sciences program was established to position graduates to address some of the world’s most challenging problems. As an increasing number of companies begin to produce larger data sets, there is a growing demand for skilled data scientists to analyze the information and present solutions.

“Data science has a future,” said James Wang, professor of information sciences and technology. “There will be larger and larger data sets for businesses to take advantage of. The future is there.”

Drawing on a foundation to solve real-world problems

This past spring, Wang taught the program’s first course in which students from all three options came together to make use of the knowledge they learned from their different fields. In the class, DS 340W Applied Data Sciences, students were required to draw on their statistics and computing skills to solve real-world problems.

Ryan Jaeger, a second-year student majoring in mathematics with a minor in IST, was one of the students in the course.

“In my mind, a good data scientist is able to manipulate data and perform the machine learning modeling necessary to answer a given question,” he said. “However, a great data scientist is also able to generate meaningful research questions, place their research in context, and clearly communicate the significance and novelty of their work to many audiences. I believe the emphasis on communication in DS 340W will help me to become a great data scientist.”

He hopes to apply what he’s learned in the program to his future career.

“My interest lies in the intersection of my mathematical skills with the ability to make a difference in the world, and for me, the answer is data,” he said. “Data is increasingly abundant and of higher quality. If I can combine my mathematical reasoning, statistical knowledge, and computational skills to make a meaningful insight from data, I can impact people’s lives through math.”

Dylan Shoemaker, who is pursuing dual majors in applied statistics and computational data sciences, also benefited from the course. He learned to formulate a real-world problem that could be solved with data science, research potential solutions, and develop a novel approach for a solution.

Additionally, the class confirmed his decision to explore a career in the field.

“I knew that the topics of statistics and computer science interested me, and after reading about the career opportunities with knowledge in these domains, I was drawn to pursuing data science,” he said. “Along with the career potential, I was also drawn in by the types of difficult and complex problems data scientists have the chance to solve in practice.”

Jia Li, professor of statistics who co-taught DS 340W with Wang this past spring, noted that the course teaches students to design and present practical mathematical systems, which are assembled from statistical and computational components. This, she said, helps students learn not only about the building blocks of the data sciences but also how to put them together.

“This aspect of design makes the course and more generally the major a lot of fun,” Li said. “When everything is put in the context of conquering a common problem, there is an effect of mutual boosting. Everything seems more meaningful.”

Filling a growing demand

“In general, almost everything that we do [in society] is being impacted by our ability to gather and analyze data,” said Vasant Honavar, professor and Edward Frymoyer Chair of Information Sciences and Technology.

He cited issues like public policy decisions, traffic patterns, scientific discoveries, and prevention of financial crises as examples of how data scientists could make an impact.

“All of these are fundamentally data science questions,” said Honavar. “They are going to be answered using massive amounts of data.”

“We’re at the early stage of a new field emerging,” he added. “In some sense you could argue that data science has been around all along because we have always had some data. What has changed is the amount and the diversity and the speed at which this data is coming at us. And that requires new methods and new tools and a new set of skills.”

These new methods and tools are in need of trained practitioners to apply them, which is where Penn State’s program comes in.

“There is a huge demand in the industry for people with this kind of skill set,” said Wang. “And that demand is growing. There are a lot of unfilled positions. People are paid high salaries, especially the highly skilled personnel in this field.”

Shoemaker looks forward to the opportunities that await him when he graduates, and advises incoming students to consider data sciences as a degree option.

“With the current trends in technology in the modern world, both the supply of large amounts of data and the demand for those who can decipher it have grown dramatically,” he said. “As a result, the field of data science has become one of the biggest trends for recruiters from all types of industries.”

He added, “I would tell prospective students that pursuing a data sciences degree at Penn State would give them tremendous opportunity for career growth, and that it would be one of the best ways to gain entry into perhaps the hottest job on the market right now.”

Creating a new education culture at Penn State

The intercollege administrators of the data sciences program are continuously shaping the curriculum to meet industry demand and to give students the opportunity to become experts in specific focus areas.

Yen explained that the focus areas of statistics, computation and informatics provide students with the knowledge and skills, but a concentration area could help them apply their data science methodology to a particular domain.

“When a data scientist needs to solve a real-world problem involving data, oftentimes they need to understand the related theory in that problem so they can better articulate opportunities and better identify possible initiatives, and formulate possible hypotheses,” he said.

“This is what makes data science different from the traditional methodology training,” he added. “It is really impossible to do this without some understanding of what the problem is.”

Yen is working with administrators in other colleges at University Park to enhance and expand domain knowledge opportunities for data science students in disciplines such as the life sciences, agricultural sciences, health sciences, engineering, and security.

“It will create an interesting two-way fertilization between the data sciences programs and other programs around the University,” he said. “From an education viewpoint, it is reflecting what’s happening in the world. I’m very pleased that we can start to see this revolution happen at Penn State. We are creating a new education culture around the University.”

Last Updated July 13, 2018