Join the next MICCAI Industry Talk: Aug 29, 2023

Sunday 20th August 2023

Join the next MICCAI Industry Talk: Benchmark for Medical Vision Language Modeling Enriched with Clinical Expertise

Date: August 29, 2023
Time: 8:00am - 9:00 am (PT)/11:00 am - 12:00 pm (ET)

Registration (required): Register here

Abstract: In this presentation, I will share our recent research efforts focused on the development of an advanced benchmark for multi-modal medical vision language modeling, enriched with clinical expert knowledge. The existing medical vision language models have notable limitations: They simplify complex clinical text into binary labels, leading to a significant loss of crucial clinical information. They attempt to directly generate clinical notes from medical images, resulting in redundant information that lacks verification. They lack proper definition and handling of uncertainties present in clinical text. While medical Visual Question Answering (VQA) models can offer accurate responses to specific inquiries, current medical VQA datasets are either quite small or limited in terms of question variety.

To tackle these challenges, we propose the creation of a novel medical vision modeling benchmark. This benchmark harnesses the insights of clinical experts to formulate essential questions that follow logical clinical reasoning. Our benchmark encompasses inquiries related to disease type, location, severity, and disease correlations. Furthermore, we introduce questions about differences between clinical images, emulating how clinical experts assess current images against historical ones for intervention assessment. We also account for uncertainties acknowledged by clinical experts, extracting corresponding uncertainty labels. These labels are then compared to state-of-the-art uncertainty labeling methods, consistently demonstrating improvements. Fundamentally, our endeavor introduces a pioneering benchmark that integrates clinical expertise to rectify the limitations of existing extensive medical vision language models. Through this approach, our goal is to elevate the caliber and relevance of these models within medical contexts using current extensive language models.

Speaker: Dr. Yingying Zhu

Bio: Dr. Yingying Zhu is working in the Computer Science and Engineering Department, University of Texas at Arlington as an assistant professor and also a guest researcher working in the clinical center, NIH. She was a Staff Scientist working with Ronald M. Summers at Clinical Center, National Institutes of Health. She works on the intersection of computer vision, medical image analysis, bioinformatics and machine learning with the goal of developing machine learning tools for solving real-world problems. She is currently looking for a PhD student to work on machine learning, computer vision and medical data analysis.

She did a postdoc at Cornell University working with Mert Sabuncu and a postdoc in UNC Chapel Hill working with Guorong Wu. She obtained her Ph.D. from University of Queensland, Australia under the supervision of Simon Lucey (currently research associate professor in CMU, Pittsburgh, USA). She received my B.S. from Sichuan University and M. S. from University of Electronic Science and Technology of China.