Jean Park Presents Groundbreaking Research on Multimodal AI Reasoning at AAAI 2025: Introducing the Modality Importance Score

News Item
Jean Park Presents Groundbreaking Research on Multimodal AI Reasoning at AAAI 2025: Introducing the Modality Importance Score
Monday, March 24, 2025

Last Friday, Jean Park, our 3rd-year PhD student, presented her research at AAAI 2025!

Missed her poster session? Check out her work on multimodal AI reasoning here: https://lnkd.in/eciY6peu

Jean introduces the Modality Importance Score (MIS)—a pioneering metric that quantifies each modality’s contribution to answering a question. This breakthrough exposes biases in VidQA datasets and drives AI toward true multimodal understanding.

THE PROBLEM
Despite advancements in Video Question Answering (VidQA), many AI models fail to truly integrate multiple modalities. For instance, in a medical setting, if a patient verbally denies drinking while their spouse subtly nods in disagreement, today’s AI might overlook the contradiction. This happens because existing datasets enable models to rely on biases rather than truly reasoning across modalities—posing risks in healthcare, security, and autonomous systems.

THE INNOVATION
MIS is the first quantitative measure of a modality’s importance in answering a question. It:

• Uncovers hidden biases in VidQA datasets and models
• Provides more practical and scalable method compared to manual annotation
• Guides the development of more balanced, robust multimodal datasets
• Holds promise for researchers to build AI that actually integrates multiple signals—rather than relying on single-modality shortcut

As Multimodal Large Language Models (MLLMs) are rapidly deployed in real-world settings, biased training data means biased models. MIS isn’t just an academic contribution—it’s a practical tool that can revolutionize industries where integrating multiple modalities is critical:

• Healthcare – Ensuring AI correctly interprets verbal + visual cues in diagnostics and patient interactions
• Film & Media – Enhancing AI-driven scene understanding and video summarization
• Autonomous Driving – Improving how vehicles synthesize multimodal sensor data for decision-making

This work was advised by Professors Eric Eaton, PhD, Insup Lee, and Kevin Johnson, whose guidance was instrumental in shaping Jean's research.