HandMvNet: Real-Time 3D Hand Pose Estimation using Multi-View Cross-Attention Fusion

Muhammad Asad Ali · Nadia Robertini · Didier Stricker

In this work, we present HandMvNet, one of the first real-time method designed to estimate 3D hand motion and shape from multi-view camera images. Unlike previous monocular approaches, which suffer from scale-depth ambiguities, our method ensures consistent and accurate absolute hand poses and shapes. This is achieved through a multi-view attention-fusion mechanism that effectively integrates features from multiple viewpoints. In contrast to previous multi-view methods, our approach eliminates the need for camera parameters as input to learn 3D geometry. HandMvNet also achieves a substantial reduction in inference time while delivering competitive results compared to the state-of-the-art methods, making it suitable for real-time applications. Evaluated on publicly available datasets, HandMvNet qualitatively and quantitatively outperforms previous methods under identical settings.

Dependencies

Cuda 11.7
Python 3.8
Pytorch 2.0.1
Lightning 2.0.6

Datasets

We used following three datasets:

DexYCB
HO3Dv3: The packed dataset tars of HO3D used for training and evaluation can be downloaded from here.
MVHand

Training:

python src/train.py --config configs/release/HO3D_HandMvNet.yaml

Evaluation:

Test the model

python src/eval.py --config configs/release/HO3D_HandMvNet.yaml --checkpoint /path/to/checkpoint.ckpt

Inference speed

python src/eval_fps.py --config configs/release/HO3D_HandMvNet.yaml

To-Dos

Add code for DexYCB-MV and MVHand dataset loaders.

Citation

If you find our work useful in your research, please consider citing:

@inproceedings{Ali2025,
  author    = {Ali, Muhammad Asad and Robertini, Nadia and Stricker, Didier},
  title     = {HandMvNet: Real-Time 3D Hand Pose Estimation Using Multi-View Cross-Attention Fusion},
  booktitle = {Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP},
  year      = {2025},
  pages     = {555--562},
  isbn      = {978-989-758-728-3},
  issn      = {2184-4321},
  doi       = {10.5220/0013107300003912}
}

Acknowledgement

The following repositories are used in HandMvNet, either in close to original form or as an inspiration:

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
configs		configs
docs		docs
notebooks		notebooks
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HandMvNet: Real-Time 3D Hand Pose Estimation using Multi-View Cross-Attention Fusion

Dependencies

Datasets

Training:

Evaluation:

Test the model

Inference speed

To-Dos

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

pyxploiter/HandMvNet

Folders and files

Latest commit

History

Repository files navigation

HandMvNet: Real-Time 3D Hand Pose Estimation using Multi-View Cross-Attention Fusion

Dependencies

Datasets

Training:

Evaluation:

Test the model

Inference speed

To-Dos

Citation

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages