Previous research has investigated the application of Multimodal Large Language Models (MLLMs) in understanding 3D scenes by interpreting them as videos. These approaches generally depend on ...
Abstract: We present an enhanced approach for scene decomposition based on 3D Gaussian Splatting. Our method maintains the efficiency of prompt-driven mask propagation while achieving competitive ...