Next MICCA Industry Talk: Foundation/Big Models for Medical Image Segmentation, Sept 28, 2023

Monday 25th September 2023

Industry Talk: Foundation/Big Models for Medical Image Segmentation
Thursday, Sept 28, 2023: 6:30 - 8:00 am PT / 9:30 - 11:00 am ET

Registration (required): Register here

If you have a keen interest in foundation/big models for medical image segmentation, attending this event is highly recommended. This MICCAI industrial talk will feature research contributions from three speakers/works, all focusing on 3D medical image models using extensive datasets. The topics covered will include:

  1. Continual Segmentation: Progressing towards a unified and easily accessible segmentation model encompassing 143 whole-body organs in CT scans (ICCV 2023).
  2. Anatomical Embedding Learning from 3D Medical Images: Exploring both self-supervised and supervised approaches to enhance anatomical embedding learning (ICCV 2023 Oral).
  3. Advancing Whole Brain Segmentation with 3D Medical Visual Foundation Models: Pushing the boundaries in whole brain segmentation through the utilization of advanced 3D medical visual foundation models (Medical Image Analysis 2023).

The detailed information is shown below.

Talk 1: Continual segment: Towards a single, unified and accessible continual segmentation model of 143 whole-body organs in CT scans (ICCV 2023)


Deep learning empowers the mainstream medical image segmentation methods. Nevertheless current deep segmentation approaches are not capable of efficiently and effectively adapting and updating the trained models when new incremental segmentation classes (along with new training datasets or not) are required to be added. In real clinical environment, it can be preferred that segmentation models could be dynamically extended to segment new organs/tumors without the (re-)access to previous training datasets due to obstacles of patient privacy and data storage. This process can be viewed as a continual semantic segmentation (CSS) problem, being understudied for multi-organ segmentation. In this work, we propose a new architectural CSS learning framework to learn a single deep segmentation model for segmenting a total of 143 whole-body organs. Using the encoder/decoder network structure, we demonstrate that a continually-trained then frozen encoder coupled with incrementally-added decoders can extract and preserve sufficiently representative image features for new classes to be subsequently and validly segmented. To maintain a single network model complexity, we trim each decoder progressively using neural architecture search and teacher-student based knowledge distillation. To incorporate with both healthy and pathological organs appearing in different datasets, a novel anomaly-aware and confidence learning module is proposed to merge the overlapped organ predictions, originated from different decoders. Trained and validated on 3D CT scans of 2500+ patients from four datasets, our single network can segment total 143 whole-body organs with very high accuracy, closely reaching the upper bound performance level by training four separate segmentation models (i.e., one model per dataset/task). 

Speaker's Bio: 

Mr. Zhanghexuan Ji is a Ph.D. candidate and research assistant in the Department of Computer Science and Engineering at University at Buffalo, SUNY. He is an AI/deep learning researcher and enthusiast with a strong passion in developing practical AI solutions for real world vision problems. His research interest mainly spans over deep learning and its applications in computer vision, medical image analysis and radiology. Applications include 3D CT/MRI organ and lesion segmentation, continual whole-body multi-organ segmentation, weakly supervised tumor segmentation, interactive segmentation, multimodality segmentation, vision-language representation learning, and pathological cell segmentation and phenotyping. His work have been accepted at recent MICCAI, ICLR, ICCV and other conferences. This is a joint work with Dr. Dazhou Guo, Dr. Dakai Jin at Alibaba DAMO Academy.

Talk 2: Anatomical Embedding Learning from 3D Medical Images (ICCV 2023 Oral)


We propose a new self-supervised learning framework, namely Alice, that explicitly fulfills Anatomical invariance modeling and semantic alignment. Alice introduces a new contrastive learning strategy which encourages the similarity between views that are diversely mined but with consistent high-level semantics, in order to learn invariant anatomical features. Moreover, we design an anatomical feature alignment module to complement corrupted embeddings with globally matched semantics and inter-patch topology information, conditioned by the distribution of local image content, which permits to create better contrastive pairs. Beyond this, we will introduce an effective spatially steerable anatomical embedding that is learned in a supervised fashion. It allows for the direct retrieval of any instance of anatomy of interest within the known anatomy set. This goal is achieved by the combination of the concise (non-hierarchical) 9-DoF object detection solution and the steerable binding technique between query embeddings and anatomical semantics. This method may have strong practical implications for various applications, especially for fast localization/parsing of oblique objects/anatomies in the full 3D space.

Speaker's Bio: 

Mr. Heng Guo is a senior algorithm engineer at Alibaba DAMO Academy. His research interests lie in the interdiscipline of deep learning, medical image analysis and their clinical applications. During the COVID-19 pandemic, the team he is in has been awarded the “Advanced Collaborative Achievement in Technological Anti-Epidemic Efforts”. He holds a B.S. and a M.S. degree from Shanghai Jiao Tong University.

Talk 3: Advancing Whole Brain Segmentation with 3D Medical Visual Foundation Models (Medical Image Analysis 2023)


Visual foundation models have undergone a transformation in their ability to handle a wide range of image domains, showcasing their efficiency in processing large-scale images and versatility in addressing various end-tasks. However, when these models venture into the intricate 3D medical image analysis, they face challenges that hinder their robustness and efficiency, particularly in dealing with numerous anatomy classes and effectively modeling complex interconnected structures such as whole-brain and whole-body segmentation. In this intricate landscape, we strive to enhance the effectiveness of visual foundation models in the context of “heavy-duty” 3D medical image segmentation. The Nvidia MONAI team and Vanderbilt University developed a visual foundation model (UNEST) for brain imaging. This model possesses the unique ability to simultaneously segment all 133 brain structures from brain MRI. It showcases a state-of-the-art solution by segmenting the full spectrum of these structures in just 4 seconds. This offers a valuable resource for improving the accuracy of brain measurements in clinical settings, which shows a promising development in medical imaging foundation model research.

Speakers' Bios: 

Dr. Yucheng Tang is a research scientist at Nvidia and Vanderbilt University. He completed his PhD at Vanderbilt University and B.S. at Tianjin University. He was a member of Vanderbilt Institute of Surgery Engineering. and a teaching affiliate at the computer science department. His research focuses on innovative, clinically useful medical image analysis techniques and developing data-centered artificial intelligence for clinical decision making. He has published over 50 peer-reviewed articles, including top conference papers of CVPR, ICCV, NeuroIPS, WACV, MICCAI, and journal publications of Nature, TMI, MedIA, Radiology.

Ms. Xin Yu is a computer science Ph.D. candidate at Vanderbilt University. She also served as a research affiliate at Vanderbilt University Medical Center (VUMC) and the Institute of Surgery and Engineering (VISE).