MALVERN, Pa. — Five Penn State Great Valley graduate students recently worked nonstop to analyze a complex, 14 million row dataset from Indeed.com. Little did they know that just 48 hours later they would be recognized for their findings and creativity after two rounds of presentations.
Ashish Chauhan, Jackie Markle, Harsha Polisetty, Karpagalakshmi Rajagopalan, and Dhara Thakkar — all students in the campus’ Master of Professional Studies in Data Analytics program — recently competed at Penn State’s DataFest. A national event held through the American Statistical Association, DataFest is a 48-hour competition for teams of two to five undergraduate and graduate students to find meaning in large and complex datasets.
Dubbing themselves the Analytical Assassins, the team arrived in University Park on a Friday evening to compete against 77 registered teams. At 6 p.m., they received the dataset from Indeed.com, a job search website. Each row of data represented one day of a posting, and teams were tasked to provide advice or insight to job seekers.
They immediately started cleaning data to begin their analysis. To evaluate the dataset, they looked at features like dates, job description length, minimum experience, required education level, and clicks per day.
“We were able to quickly load and read the huge data through libraries in R that are especially useful for reading such datasets,” said Rajagopalan. “After that, we went through a whole cyclical process of cleaning, exploration and modeling to come up with the best possible solutions.”
Team members had their own focus and specializations, but also worked collaboratively. Markle and Rajagopalan cleaned and aggregated the data, Chauhan and Rajagopalan implemented machine learning methods, and Polisetty and Thakkar were responsible for data visualization. They relied on skills cultivated through class projects, independent studies, and research assistantships in the campus’ Big Data Lab, and they used various tools including Tableau, Python, Java, and R.
“As part of my independent study, I learned a lot of tricks in data cleaning and integration using R,” said Rajagopalan. “A course in data-driven decision-making gave me a strong foundation in core concepts, and I was able to successfully apply them in real time.”
But the students also learned new abilities as they worked with extreme time limitations. Deeply focused on their analysis, the team worked around the clock. The team took shifts sleeping, so their work was continuous and a true team effort.
“Working on real-time data helped us gain more experience in machine learning parameter tuning, data visualization, and data cleaning,” Chauhan said. “But the best part in this event was taking a break and walking through the streets of the University Park campus and planning further steps to execute our model.”
Running on little-to-no sleep, the team delivered a three-minute presentation that Sunday, summarizing their findings to a judging panel of faculty and industry professionals. After being announced as finalists, the Analytical Assassins presented once more before being named runners up. They also received acknowledgement as the most creative team.
It was an exciting and surprising experience for the students.
“With the competition we had, we didn’t expect to win,” Polisetty said. “I enjoyed learning from all the participants’ presentations. Everyone had a unique approach for their project. As an aspiring data scientist, I enjoyed the opportunity to learn new skills.”
The other Analytical Assassins agreed — participating in DataFest was worthwhile and rewarding.
“DataFest was so grueling, yet fun-filled at the same time,” said Rajagopalan.
“It was an amazing experience,” added Thakkar. “In the end, it was all possible because of great team work.”
This was the second year that students from Penn State Great Valley attended the event. Markle participated last year along with three other students in the data analytics, information science, and engineering management programs. The team placed in the top five.