RoDUS: Robust Decomposition of Static and Dynamic Elements in Urban Scenes

Huawei Paris Research Center

RoDUS is a neural scene representation designed to decompose 4D dynamic scenes into two elements: moving foreground and static background. This decomposition remains consistent across all photometric, geometric, and semantic aspects

Abstract

The task of separating dynamic objects from static environments using NeRFs has been widely studied in recent years. However, capturing large-scale scenes still poses a challenge due to their complex geometric structures and unconstrained dynamics. Without the help of 3D motion cues, previous methods often require simplified setups with slow camera motion and only a few/single dynamic actors, leading to suboptimal solutions in most urban setups. To overcome such limitations, we present RoDUS, a pipeline for decomposing static and dynamic elements in urban scenes, with thoughtfully separated NeRF models for moving and non-moving components. Our approach utilizes a robust kernel-based initialization coupled with 4D semantic information to selectively guide the learning process. This strategy enables accurate capturing of the dynamics in the scene, resulting in reduced artifacts caused by NeRF on background reconstruction, all by using self-supervision. Notably, experimental evaluations on KITTI-360 and Pandaset datasets demonstrate the effectiveness of our method in decomposing challenging urban scenes into precise static and dynamic components.

Results

Dynamic scene reconstruction and decomposition

Static background extraction

BibTeX

@article{nguyen2024rodus,
      title={RoDUS: Robust Decomposition of Static and Dynamic Elements in Urban Scenes},
      author={Nguyen, Thang-Anh-Quan and Rold{\~a}o, Luis and Piasco, Nathan and Bennehar, Moussab and Tsishkou, Dzmitry},
      journal={arXiv preprint arXiv:2403.09419},
      year={2024}
    }