Skip to content

Latest commit

 

History

History
131 lines (112 loc) · 5.37 KB

README.md

File metadata and controls

131 lines (112 loc) · 5.37 KB

Language-enhanced RNR-Map: Querying Renderable Neural Radiance Field Maps with Natural Language.

Table of Contents

Cite Le-RNR-Map

If you find our work useful, please consider citing:

@InProceedings{Taioli_2023_ICCV,
    author    = {Taioli, Francesco and Cunico, Federico and Girella, Federico and Bologna, Riccardo and Farinelli, Alessandro and Cristani, Marco},
    title     = {{Language-Enhanced RNR-Map: Querying Renderable Neural Radiance Field Maps with Natural Language}},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
    month     = {October},
    year      = {2023},
    pages     = {4669-4674}
}

Version

  • Python version: 3.8
  • Habitat-lab: 0.2.1
  • Habitat-sim: 0.2.1
  • Torch: pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu117
  • Clip: pip install git+https://github.com/openai/CLIP.git

Habitat simulator

For the habitat setup visit the page https://github.com/facebookresearch/habitat-lab/tree/v0.2.1#Gibson

Dataset

Project structure

├── data
│   └── gibson
│       └── scene_dataset
│           └── gibson_habitat
├── pretrained
├── src
│   └── model
│       └── autoencoder
│           └── GSN
├── heat_maps
│   ├── multi_search
│   └── single_search
├── images
│   ├── map_generation
│   └── query_map
├── lernr_maps
├── lernr
├── requirements.txt
├── evaluation.py
├── map_creation.py
├── query_map.py
└── README.md
  • data: It contains the dataset,
  • pretrained: It contains the pretreined autoencoder models,
  • src: It contains the encoding and decoding processes,
  • heatmaps: It contains the output heatmaps for the single/multi search,
  • lernr_map: It contains the lernr maps as dictionary,
  • images: It contains the video generated by the navigation/querying,
  • lernr: It contains utils files,
  • evaluation.py: It performs an evaluation on the maps contained in the lernr_maps folder,
  • map_creation.py: It generates lernr maps,
  • query_map.py: It queries the maps on single or multi search.

Download lernr_map from here and put it in the lernr_map folder. To download the pretrained ckpt, download the ckpt from here and put them inside the pretrained folder.

Launch map creation

  • Example:

    python map_creation.py --scene="Wiconisco" --n_goals=25 --img_res=128 --map_size=128

  • scene: Name of the scene (default: Cantwell)

  • n_goals: Number of goal points to reach (default: 25)

  • img_res: Input img resolution entering the encoder (default: 128)

  • map_size: Lernr_map's resolution (default: 128)

  • make_video: If present, a video of the navigation is generated (default: false)

Query map

  • Outputs:

    • Single object search:
      • For each different word queries a heat map is saved.
    • Multi object search:
      • A heat map with different locations of the given query word is saved.
  • Example

    • Single object search:
      • python query_map.py --scene="Wiconisco" --query_words="toilet, couch" --negative_prompt="objects, textures" --make_video --smooth
    • Multi object search:
      • python query_map.py --scene="Cantwell" --query_words="Window" --make_video --smooth --erased_area=500 --th=0.6 --multi_search
  • scene: Name of the map (default: "Cantwell")

  • query_words: Objects to search (default: "Window")

  • negative_prompt: List of negative prompt (default: "stuff, things, objects, textures")

  • make_video: If present a video of the navigation is generated (default: False)

  • smooth: If present, the video generated is smooth (default: False)

  • multi_object: if present, a multi object search is performed, otherwise a sigle object search is made

  • erased_area: Area to delete, in order to perform multi object search (default: 500)

  • th: Treshold used in the multi object search to stop the navigation (default: 0.6)

Ack

@InProceedings{Kwon_2023_CVPR,
    author    = {Kwon, Obin and Park, Jeongho and Oh, Songhwai},
    title     = {Renderable Neural Radiance Map for Visual Navigation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {9099-9108}
}