SCFlow2: Plug-and-Play Object Pose Refiner with Shape-Constraint Scene Flow

Qingyuan Wang¹, Rui Song¹, Jiaojiao Li¹, Kerui Cheng², David Ferstl³, Yinlin Hu³

¹State Key Laboratory of ISN, Xidian University ²Taiyuan University of Technology ³MagicLeap

Abstract

We introduce SCFlow2, a plug-and-play refinement framework for 6D object pose estimation. Most recent 6D object pose methods rely on refinement to get accurate results. However, most existing refinements either suffer from noises in establishing correspondences, or rely on retraining for novel objects. SCFlow2 is based on the SCFlow model designed for iterative RGB refinement with shape constraint, but formulates the additional depth as a regularization in the iteration via 3D scene flow for RGBD frames. The key design of SCFlow2 is an introduction of geometry constraints into the training of recurrent match network, by combining the rigid-motion embeddings in 3D scene flow and 3D shape prior of the target. We train the refinement network on a combination of dataset Objaverse, GSO and ShapeNet, and demonstrate on BOP datasets with novel objects that, after using our method, the result of most state-of-the-art methods improves significantly, without any retraining or fine-tuning.

Design overview of SCFlow and SCFlow2. Given the object 3D mesh, we render an image I₁ and depth map D₁ based on an initial pose, and then use networks to compare these rendered outputs with the real input I₂ and D₂ to refine the pose. (a) Although SCFlow adds 3D shape constraint into the optimization loop, it formulates the matching process as a pure 2D problem, which is less effective in capturing 3D motions. On the other hand, it cannot work with RGBD images. A common practice is to use RANSAC Kabsch to consume additional depth as a second stage, which however is only locally optimal within each stage. (b) SCFlow2 tackles these problems. We introduce an intermediate representation based on 3D scene flow to capture 3D motions in network optimization. Furthermore, we embed depth into the loop by formulating depth as an additional regularization to guide the correlation look-up iteratively, producing an end-to-end trainable system with RGBD images.

Main results

Effect of SCFlow2. All the baselines have their own refinement strategy. After using our method as a post-processing ("+ Ours"), the result consistently improves across all baselines on all datasets.

Qualitative results. Given the same pose initialization as that in GenFlow ("GFlow") and FoundPose ("FPose"), denoted as "GFlow (init)" and "FPose (init)" respectively, our refinement method ("+ Ours") produces considerably more accurate results compared to the refinement approaches in their original methods (note how our reprojected 3D mesh aligns better with the object contours).

More results

SCFlow2 demonstrates robustness to diverse pose initialization errors and can be seamlessly integrated as a plug-and-play refinement module into existing object pose estimation pipelines.

Citation

If you find this work useful in your research, please consider citing:

@inproceedings{wang2025scflow2,
      title     = {SCFlow2: Plug-and-Play Object Pose Refiner with Shape-Constraint Scene Flow},
      author    = {Wang, Qingyuan and Song, Rui and Li, Jiaojiao and Cheng, Kerui and Ferstl, David and Hu, Yinlin},
      booktitle = {CVPR},
      year      = {2025},
    }
@inproceedings{yang2023scflow,
      title     = {Shape-Constraint Flow for 6D Object Pose Estimation},
      author    = {Hai, Yang and Song, Rui and Li, Jiaojiao and Hu, Yinlin},
      booktitle = {CVPR},
      year      = {2023}
    }