Here, we visualize cross-frame aligned static scenes with dynamic point clouds at a selected timestamp.
Instead of using GT dynamic masks, we use the estimated dynamic masks to filter out points at other timestamps.
Click tabs below to explore the results for each baseline.
Our method achieves superior structure alignment and fewer artifacts, owing to the robust dynamic segmentation estimation.
MonST3R
Ours
Results are downsampled 10 times for efficient online rendering
Our method provides clean reconstructions, while DAS3R suffers from structure misalignment and ghosting artifacts due to inaccuracies in dynamic segmentation estimation. For example, it under-segments the dog and goose.
DAS3R
Ours
Results are downsampled 10 times for efficient online rendering
CUT3R lacks support for dynamic mask estimation, leading to the blending of points from different frames when ground truth masks are not used. Additionally, our approach demonstrates greater reliability in achieving accurate camera poses.
CUT3R
Ours
Results are downsampled 10 times for efficient online rendering