← Back to profile
GHOST teaser showing hand-object reconstructions

Abstract

Understanding realistic hand-object interactions from monocular RGB videos is essential for AR/VR, robotics, and embodied AI. Existing methods rely on category-specific templates or heavy computation, yet still produce physically inconsistent hand-object alignment in 3D. We introduce GHOST (Gaussian Hand-Object Splatting), a fast, category-agnostic framework for reconstructing dynamic hand-object interactions using 2D Gaussian Splatting. GHOST represents both hands and objects as dense, view-consistent Gaussian discs and introduces three key innovations: (1) a geometric-prior retrieval and consistency loss that completes occluded object regions, (2) a grasp-aware alignment that refines hand translations and object scale to ensure realistic contact, and (3) a hand-aware background loss that prevents penalizing hand-occluded object regions. GHOST achieves complete, physically consistent, and animatable reconstructions from a single RGB video while running an order of magnitude faster than prior category-agnostic methods. Extensive experiments on ARCTIC, HO3D, and in-the-wild datasets demonstrate state-of-the-art accuracy in 3D reconstruction and 2D rendering quality.

Results

Input Video
Drill input video
Drill
Mixer input video
Mixer
Mug input video
Mug
Waffle Iron input video
Waffle Iron
GHOST Reconstruction
Drill GHOST output
Drill
Mixer GHOST output
Mixer
Mug GHOST output
Mug
Waffle Iron GHOST output
Waffle Iron

Method

GHOST pipeline overview

GHOST pipeline: Preprocessing (segmentation, SfM, hand reconstruction, geometric prior retrieval), optimization, and 2D Gaussian Splatting with hand-object losses.

Qualitative Results

Qualitative comparison 1 Qualitative comparison 2

Quantitative Results

3D Reconstruction Metrics (ARCTIC Bi-CAIR)

Method MPJPERA,h MPJPERA,l MPJPERA,r CDICP CDr CDl CDh F10mm F5mm
HOLD 25.91 27.13 24.70 2.07 123.54 105.92 114.73 63.92 37.13
BIGS 24.49 24.63 24.35 1.36 31.28 46.11 38.69 81.78 56.41
GHOST 24.07 25.42 22.71 2.26 13.40 23.41 18.40 60.88 34.67

MPJPE in mm, CD in cm2. Green = best, orange = second best.

2D Rendering Quality

Method ARCTIC HO3D Runtime↓
PSNR↑ SSIM↑ LPIPS↓ PSNR↑ SSIM↑ LPIPS↓
HOLD 12.83 0.66 0.32 16.20 0.74 0.21 16h
BIGS 24.87 0.96 0.05 24.51 0.92 0.07 13h
GHOST 25.93 0.88 0.02 21.37 0.75 0.03 1h

GHOST achieves 13-16x faster runtime while maintaining competitive or superior rendering quality.

Citation

@inproceedings{aboukhadra2026ghost, title = {GHOST: Fast Category-agnostic Hand-Object Interaction Reconstruction from RGB Videos using Gaussian Splatting}, author = {Aboukhadra, Ahmed Tawfik and Rogge, Marcel and Robertini, Nadia and Arafa, Abdalla and Malik, Jameel and Elhayek, Ahmed and Stricker, Didier}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings}, year = {2026} }