UNIVERSITY PARK, Pa. — This spring, students in the data sciences capstone course at the College of Information Sciences and Technology (IST) worked with a team of MIT Lincoln Laboratory Beaver Works researchers to make real-world contributions toward improving image classifiers for disaster response.
Through the semester-long project, the student team analyzed the group’s Low Altitude Disaster Imagery (LADI) dataset — a collection of aerial images taken above disaster scenes — and tagged images based on the photograph’s content. This dataset was developed by the New Jersey Office of Homeland Security and Preparedness and MIT Lincoln Laboratory, with support from the National Institute of Standards and Technology and Amazon Web Services (AWS). The project is based upon work supported by the United States Air Force.
“We met with the MIT Lincoln Lab team last June and recognized shared goals around improving annotation models for satellite and LADI objects as we’ve been developing similar computer vision (CV) solutions here at AWS,” said Mike Shim, general manager at Amazon Mechanical Turk and Augmented AI. “We connected the team with MLRA and the Open Data Program and funded MTurk credits for the development of MIT Lincoln Laboratory’s ground truth data set.”
He added, “We supported the development of an annotation UI that aligned with FEMA disaster categories which enabled them to pilot real time CAP image annotation following hurricane Dorian.”
Using the dataset hosted as part of the AWS Open Data program, the students then developed a computer model to create an augmented, automatic way to classify the images. This work has led to a trained model with a precision of 79%, and their code and models are officially being integrated into the LADI project as the baseline classifier and tutorial. Anyone who uses LADI will be evaluated against the student’s models.
"They worked on training it with the full data set, and I anticipate the precision will get even better," said Jeff Liu, technical staff at Lincoln Laboratory. "So we’ve seen, just over the course of a couple of weeks, very significant improvements in precision. It's very promising for the future of classifiers.”
"The students using this new data set have shown better than 70% accuracy potentially outperforming some of the current commercial capabilities for this very specific use case," added 2009 alumnus Andrew Weinert, a staff research associate at Lincoln Laboratory who helped facilitate the project with the College of IST.
According to the researchers, this improved precision could lead to shortened response times for first responders in a disaster.
"During a disaster, a lot of data can be collected very quickly," said Weinert. "But collecting data and actually putting information together for decision makers is a very different thing."
He explained that for a large-scale disaster, such as a hurricane, there could be up to 100,000 aerial images for emergency officers to analyze. For example, a logistics officer may be seeking images of bridges to assess damage or flooding nearby and needs a way to review the needed images quickly.
"What the students demonstrated with this data set is that this capability is close to becoming a reality,” said Weinert.
"Say you have a picture that at first glance looks like a lake," continued Marc Rigas, assistant teaching professor of information sciences and technology. "Then you see trees sticking out of it and realize it's a flood zone. The computer has to know that and be able to distinguish what is a lake and what isn't."
The students used real data, specialized software, and machine learning methods in a big data cloud environment to build new analytics, an approach that Weinert's team was not previously tasked to explore.
"When we gave them this assignment, we thought it might work but we didn't know," said Weinert. "The students did real cutting-edge work."
In addition to trailblazing work through this class project, the students have published a paper on their work to the academic pre-print website, arXiv.
Nae-Rong Chang, who earned her degree in data sciences in May and served on the project team, said that the problem had both professional and personal significance. She grew up in a region in Asia that frequently experienced flooding and earthquakes, and has always wanted to use her knowledge in data science to help society.
“This project gave me the opportunity to leverage the disaster imagery and provide some important insights," she said. "I can see myself using the knowledge I gained [through this project] working with deep learning and AWS cloud service in my future career as a data scientist."
The data science capstone course helps students transition from school to the work environment by giving them real world problems to solve that don’t have textbook answers, according to Rigas. In a pilot partnership with the Bernard M. Gordon Learning Factory at Penn State, the students are placed on multidisciplinary teams to tackle open-ended problems for a project sponsor. They are given a budget to manage and a project goal and then develop a work plan, present on specific deliverables, and manage financial resources.
Continued opportunities are being planned for Penn State students to enrich their education through the collaboration.
“Going forward, we plan to provide support in the form of access to our machine learning and data scientists for the joint internship program with Penn State so that students can continue with remote educational opportunities through the summer offsite,” said Shim.