MICCAI Industrial Talk: Open-source Foundation Models for 3D Medical Image Segmentation and Generation

Wednesday 13th November 2024

Join us for the next exciting Industrial Talk: Open-source Foundation Models for 3D Medical Image Segmentation and Generation

Monday, November 25, 2024
10:30 - 11:30 am EST / 4:30 - 5:30 pm CET
Speakers: Dr. Yufan He and Dr. Can Zhao, NVIDIA
Register here 

Overview:
Participants will learn about two cutting-edge open-source foundation models for 3D medical imaging from NVIDIA. The first model, VISTA3D (Versatile Imaging Segmentation and Annotation), addresses zero-shot 3D medical imaging segmentation, supporting both automatic and interactive segmentation across 127 anatomical classes. The second model, MAISI (Medical AI for Synthetic Imaging), is a 3D foundation diffusion model designed for flexible image volume sizes and voxel resolutions. Both models provide open-source code and model weights, fostering accessibility and collaboration in the field.

Talk 1: VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging

Speaker: Dr. Yufan He is an applied research scientist at NVIDIA. His research interests include medical image segmentation, AutoML and foundation models. He obtained his Ph.D. from Johns Hopkins University and is a recipient of the MICCAI Young Scientist Award in 2019.

Abstract:
Foundation models for interactive segmentation in 2D natural images and videos have sparked significant interest in building 3D foundation models for medical imaging. However, the domain gaps and clinical use cases for 3D medical imaging require a dedicated model that diverges from existing 2D solutions. Specifically, such foundation models should support a full workflow that can actually reduce human effort. Treating 3D medical images as sequences of 2D slices and reuse interactive 2D foundation models seems straightforward, but 2D annotation is too time consuming for 3D. Moreover, for large cohort analysis, it's the highly accurate Automatic segmentation models that reduce the most human effort. However, these models lack support for interactive corrections and lack zero-shot ability for novel structures, which is a key feature as ``foundation". While reusing pre-trained 2D backbones in 3D enhances zero-shot potential, performance on complex 3D structures still lags behind leading 3D models.

To address those issues, we present VISTA3D, Versatile Imaging SegmenTation and Annotation model, that targets to solve all these challenges and requirements with one unified foundation model. VISTA3D is built on top of the well-established 3D segmentation pipeline, and it is the first model to achieve state-of-the-art performance in both 3D automatic (supporting 127 classes) and 3D interactive segmentation, even when compared with top 3D expert models on large and diverse benchmarks. Additionally, VISTA3D's 3D interactive design allows efficient human correction, and a novel 3D supervoxel method that distills 2D pre-trained backbones grants VISTA3D top 3D zero-shot performance. We believe the model, recipe, and insights represent a promising step toward a clinically useful foundation model for 3D imaging. Code and weights are publicly available.

Talk 2: MAISI: Medical AI for Synthetic Imaging

Speaker: Dr. Can Zhao is an applied research scientist at NVIDIA. Her research interests include medical image synthesis and foundation models. She obtained her Ph.D. from Johns Hopkins University. She has organized multiple MICCAI workshops and was a Session Chair at MICCAI 2024.

Abstract:
Medical imaging analysis faces challenges such as data scarcity, high annotation costs, and privacy concerns. This paper introduces the Medical AI for Synthetic Imaging (MAISI), an innovative approach using the diffusion model to generate synthetic 3D computed tomography (CT) images to address those challenges. MAISI leverages the foundation volume compression network and the latent diffusion model to produce high-resolution CT images (up to a landmark volume dimension of 512 ×512 ×768 ) with flexible volume dimensions and voxel spacing. By incorporating ControlNet, MAISI can process organ segmentation, including 127 anatomical structures, as additional conditions and enables the generation of accurately annotated synthetic images that can be used for various downstream tasks. Our experiment results show that MAISI’s capabilities in generating realistic, anatomically accurate images for diverse regions and conditions reveal its promising potential to mitigate challenges using synthetic data.