UNIVERSITY PARK, Pa. — A paper authored by students and faculty from the Penn State College of Information Sciences and Technology (IST) received the Best Long Paper award at the 16th International Natural Language Generation (INLG) Conference, which took place Sept. 11-15 in Prague, Czech Republic.
“Only one long paper award was given at this conference, and we were pleased to receive the honor,” said Ting-Hao “Kenneth” Huang, assistant professor of IST. Huang and co-author C. Lee Giles, David Reese Professor of Information Systems and Technology served as faculty co-advisers on the research project.
The paper, “Summaries as Captions: Generating Figure Captions for Scientific Documents With Automated Text Summarization,” was selected based on its originality, impact and contribution to the field of natural language generation, according to the conference website.
“Despite their importance, writing captions for figures in a scientific paper is not often a priority for researchers,” Kenneth Huang said. “A significant portion of the captions we reviewed in our corpus of published papers were terrible. Automatic caption generation could aid paper writers by providing good starting captions that can be refined for better quality.”
In their work, the researchers addressed the limitations of existing natural language processing (NLP) tools that approach automatic caption generation as a vision-to-language task. They aimed to show that using NLP tools to summarize a paper’s textual content would generate better figure captions than those created by the vision-based algorithms.
“A vision-to-language approach creates captions based on the image,” said co-author Ting-Yao Hsu, a computer science doctoral student in the College of Engineering. “We fine-tuned a pre-trained abstractive summarization model to specifically summarize paragraphs that reference figures — for example, ‘as shown in Figure 1’ — into captions.”
A good caption should help readers understand the complexity of a paper’s figures, such as bar charts, line charts or pie charts. Using context from the document’s full text makes sense, according to lead co-author Chieh-Yang Huang, who earned his doctorate in informatics from Penn State this month.
“Scientific papers typically include extensive text that can aid caption generation,” he said. “Our analysis showed that more than 75% of words in figure captions could be aligned with the words in the paragraphs mentioning those figures.”
According to the researchers, automatic evaluation showed summarizing paragraphs that reference a figure resulted in better captions than using vision-based methods. Captions generated by the researchers’ model also performed better than vision-generated captions when evaluated by human external domain experts.