Muhammad Asad Ali · Nadia Robertini · Didier Stricker
In this work, we present HandMvNet, one of the first real-time method designed to estimate 3D hand motion and shape from multi-view camera images. Unlike previous monocular approaches, which suffer from scale-depth ambiguities, our method ensures consistent and accurate absolute hand poses and shapes. This is achieved through a multi-view attention-fusion mechanism that effectively integrates features from multiple viewpoints. In contrast to previous multi-view methods, our approach eliminates the need for camera parameters as input to learn 3D geometry. HandMvNet also achieves a substantial reduction in inference time while delivering competitive results compared to the state-of-the-art methods, making it suitable for real-time applications. Evaluated on publicly available datasets, HandMvNet qualitatively and quantitatively outperforms previous methods under identical settings.
- Cuda 11.7
- Python 3.8
- Pytorch 2.0.1
- Lightning 2.0.6
We used following three datasets:
- DexYCB
- HO3Dv3: The packed dataset tars of HO3D used for training and evaluation can be downloaded from here.
- MVHand
python src/train.py --config configs/release/HO3D_HandMvNet.yaml
python src/eval.py --config configs/release/HO3D_HandMvNet.yaml --checkpoint /path/to/checkpoint.ckpt
python src/eval_fps.py --config configs/release/HO3D_HandMvNet.yaml
- Add code for DexYCB-MV and MVHand dataset loaders.
If you find our work useful in your research, please consider citing:
@inproceedings{Ali2025,
author = {Ali, Muhammad Asad and Robertini, Nadia and Stricker, Didier},
title = {HandMvNet: Real-Time 3D Hand Pose Estimation Using Multi-View Cross-Attention Fusion},
booktitle = {Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP},
year = {2025},
pages = {555--562},
isbn = {978-989-758-728-3},
issn = {2184-4321},
doi = {10.5220/0013107300003912}
}
The following repositories are used in HandMvNet, either in close to original form or as an inspiration: