UNIVERSITY PARK, Pa. — Debarati Das, assistant professor of computer science and engineering in the Penn State College of Engineering, earned a five-year, $647,681 U.S. National Science Foundation (NSF) Early Career Development (CAREER) Award for a project titled, “A Theoretical Exploration of Efficient and Accurate Clustering Algorithms.”
Q: What do you want to understand or solve through this project?
Das: In today's era of unprecedented data generation, big data analysis excels at extracting valuable insights from unimaginably vast and complex datasets, thereby empowering effective decision-making, fostering innovation and solving complex problems. Clustering is a fundamental technique in data analysis and machine learning that groups data points into clusters based on their similarities.
Despite recent progress in clustering algorithms, analyzing large-scale, noisy, complex and high-dimensional datasets remains challenging. This restricts the practical application of large datasets in critical areas of science, technology, engineering and mathematics (STEM). This project aims to overcome these challenges and advance clustering theory by establishing connections among diverse problems, creating a unified framework and designing tailored tools and techniques. This will enable us to design optimal approximation algorithms in order to expand the practical scope of clustering across diverse problem domains.
Q: How will advances in this area impact society?
Das: The proposed research will advance state-of-the-art clustering algorithms by focusing on specific problems related to rank aggregation and hierarchical clustering, which are extensively applied in diverse fields such as information science, computational biology, language learning, business, statistics and social science, to name a few.
While heuristic approaches exist for these computationally hard problems, this project will innovatively build their theoretical foundation. The empirical study associated with this research will bridge the gap between theory and practice.
Q: Will undergraduate or graduate students contribute to this research? How?
Das: The proposed research will entail close collaboration with both graduate and undergraduate students. Although advanced clustering algorithm courses primarily focus on applications and offer limited exposure to theoretical aspects, this project will introduce new courses on clustering aimed at bridging this gap. Such courses will provide an extensive understanding of both theoretical and practical aspects of clustering algorithms while offering hands-on experience regarding data science and clustering.
Q: The NSF CAREER award not only funds a research project, but it also recognizes the potential of the recipient as a researcher, educator and leader in their field. How do you hope to fulfill that potential?
Das: The NSF CAREER award will kickstart my long-term research plan to advance the theory of clustering problems, deepen our understanding of algorithm design and theoretical computer science, and become a renowned researcher in the field. I will integrate my research into educational activities that will inspire and engage students.
I am committed to continuing my association with TCS for All: Theoretical Computer Science without Barriers and inspiring and supporting women and students from underrepresented communities in STEM. My goal is to work toward bridging the gender gap and fostering diversity, equity and inclusion in academia.