Recent advances in 3D semantic scene understanding have shown impressive progress in 3D instance segmentation, enabling object-level reasoning about 3D scenes; however, a finer-grained understanding is required to enable interactions with objects and their functional understanding. Thus, we propose the task of part-based scene understanding of real-world 3D environments: from an RGB-D scan of a scene, we detect objects, and for each object predict its decomposition into geometric part masks, which composed together form the complete geometry of the observed object. We leverage an intermediary part graph representation to enable robust completion as well as building of part priors, which we use to construct the final part mask predictions. Our experiments demonstrate that guiding part understanding through part graph to part prior-based predictions significantly outperforms alternative approaches to the task of semantic part completion.
We test our approach for semantic part completion on ScanNet dataset in comparison with state of the art for part decomposition, including scan completion followed by part segmentation. Our approach produces more consistent, accurate part decompositions.
@inproceedings{bokhovkin2021towards,
title={Towards Part-Based Understanding of RGB-D Scans},
author={Bokhovkin, Alexey and Ishimtsev, Vladislav and Bogomolov, Emil and Zorin, Denis and Artemov, Alexey and Burnaev, Evgeny and Dai, Angela},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7484--7494},
year={2021}
}