Information Sciences and Technology

Data science pioneer and IST icon C. Lee Giles to retire after 24 years

C. Lee Giles, David Reese Professor of Information Sciences and Technology in the College of IST, will retire from Penn State, effective June 30, 2024.  Credit: Penn StateCreative Commons

UNIVERSITY PARK, Pa. — The Penn State College of Information Sciences and Technology (IST) has announced the retirement of C. Lee Giles, the David Reese Professor of Information Sciences and Technology. While he will end his teaching career on June 30, he will continue to pursue his research and advise graduate students at University Park.  

“We have been so fortunate to have had Lee in our college for two decades,” said Andrea Tapia, dean of the College of IST. “There is so much to say about his trailblazing career: I encourage everyone to search ‘C. Lee Giles’ on Google and Wikipedia. He has been an asset to Penn State and to the Information Age.”  

Giles came to the College of IST less than a year after it opened its doors, joining with tenure and the named professorship. A self-described physicist-turned-computer scientist, he is a pioneer in data sciences and neural networks. As an “information retrieval person,” he has been involved in the creation and development of an array of search engines and digital libraries. 

Early in his career, Giles moved from physics to computer science and became well-known for his work on neural networks, just as the web was beginning to explode. In 1997, he co-created CiteSeer — the first automated citation indexing system and a predecessor of Google Scholar and Microsoft Academic Search.  

“CiteSeer has been my greatest career accomplishment,” Giles said. “I began that work at Princeton with Steve Lawrence, who later helped to get Google up and running, and Kurt Bollacker, who created Freebase. We worked to put computer science papers on the web, rather than keep them hidden in journals with no open access. We indexed them. Made them searchable. It was the start of great things.” 

Giles, with Lawrence, was estimating the size of the web, building search engines and generating a lot of publicity, including a front-page story for The New York Times in 1999. Their capture/recapture efforts revealed that existing search engines weren’t being truthful, and in a story for The Wall Street Journal, they “told people what was really out there.”  

In 2003, Giles moved the CiteSeer equipment to Penn State via sneakernet — that is, in the back of a station wagon — and has since built a second version, CiteSeerX, with support from the College of IST.  

“Thanks to Lee, we can easily do forward literature search, something that was impossible before CiteSeer,” said John Yen, professor in the College of IST. “And thanks to Lee, citation counts can be generated automatically — and reliably — and used an important measure for impacts of scholarship worldwide; the field of ‘big scholarly data’ was created; and we have recurrent neural networks, a precursor to deep neural networks, transformers, large language models and generative pre-trained transformers.” 

Yen credited Giles for helping him find his way to the “exhilarating start-up College of IST” in 2000. Dongwon Lee, professor in IST, said he was also inspired by Giles to come to Penn State.   

“When I was searching for my first academic job, I was hesitant about joining a young new college that wasn’t the pure computer science I was trained in,” he said. “The presence of Lee and his group convinced me to join IST. I have never regretted that decision and have since become an enthusiastic advocate for the interdisciplinary and social impacting type of research that Lee has been doing for decades. He has been a wonderful mentor and colleague.”  

Beyond CiteSeer and CiteSeerX, Giles has co-developed other specialty search engines, including ChemXSeer for chemistry, BotSeer for robots.txt, CollabSeer for collaboration searches, ArchSeer for archaeology, RefSeer for citations, AckSeer for acknowledgements and BizSeer for academic business.  

“Lee’s contributions to the field are legendary — he is a visionary and a great human being,” said Prasenjit Mitra, professor in the College of IST. “Lee was my mentor, and I thoroughly enjoyed working on CiteSeerX, ChemXSeer and a few smaller seers. I will cherish the fact that I am possibly the one who has co-authored the most with him. And I am glad he will still be research-active — our students need him.” 

Giles has been a big draw for students searching for a doctorate in informatics and computer science. Giles has graduated 37 such students and has more in the pipeline. He remains in touch with many of them long after they leave IST. Many of his publications involve collaborations with other faculty. 

“Ph.D. students come to me by reputation, but they have to be independent learners,” he said. “I don’t tell them exactly what to do — I guide them in the right direction. My students have gone on to do great things, not the least of which is creating upwards to a billion dollars in wealth.” 

Giles earned bachelor’s degrees from Rhodes College and the University of Tennessee, a master’s degree in physics from the University of Michigan and a doctoral degree in optical sciences from the University of Arizona. He taught at Princeton University, the University of Pennsylvania, Columbia University, the University of Pisa, the University of Trento and Clarkson University. He served as a program manager at the Air Force Office of Scientific Research (AFOSR), a research scientist at the Naval Research Laboratory, a senior research scientist at the NEC Research Institute and a research engineer at Ford Motor Company’s Scientific Research Laboratory.  

“In a way, Lee got me started on my career in AI,” said Vasant Honavar, professor in the College of IST. “When he was a program manager at AFOSR, he funded my advisor’s work — and hence my doctoral research on constructive learning — at the University of Wisconsin when neural networks were at the very fringes of AI and machine learning.” 

Giles’s current research and consulting interests involve intelligent information search for specialty and niche domains and big data using AI and machine learning methods and the learning of sequences and formal grammars. 

“I can’t wait to see what he does next,” said Tapia. 

Last Updated July 10, 2024

Contact