This is small project, which shows the use of image captioning (machine learning task; model used: https://huggingface.co/nlpconnect/vit-gpt2-image-captioning) as used in Web VR. It was inspired by similar project created by Misslivirose titled Scene Reader which shows image captioning with Three.js and Microsoft Azure service.
To see image captioning at work, click on camera icon. On every click image of the scene with caption will be generated. In order to see the magic happen, try to find answer to the riddle.
The project uses A-Frame at its core with Hugging Face API.
3D model of the room was created by Francesco Coldesina, and taken from Sketchfab.com
To see the application at work: Demo application