Skip to content

[BMVC'24] Official repository for "MeTTA: Single-View to 3D Textured Mesh Reconstruction with Test-Time Adaptation"

License

Notifications You must be signed in to change notification settings

postech-ami/MeTTA

Repository files navigation

MeTTA: Single-View to 3D Textured Mesh Reconstruction with Test-Time Adaptation (BMVC 2024)

Install

  • Python version 3.9.
# conda setup
conda create -n metta python=3.9
conda activate metta
  • You can use cuda that fits your gpu environment.
pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 torchaudio --extra-index-url https://download.pytorch.org/whl/cu118 --no-cache-dir
  • Install cuDNN followd by this guideline.

  • Install additional pacakges.

pip install -r requirements.txt

# for zero123
pip install taming-transformers-rom1504 --no-deps

Trouble-shooting

  • Install gridencoder manually
pip install ./gridencoder
  • If there is a problem with the ninja package, check if OpenGL is installed on your gpu.
sudo apt-get install -y build-essential
sudo apt-get install freeglut3-dev libglu1-mesa-dev mesa-common-dev

Prerequisite

To use multi-view diffusion models, you need to download some pretrained checkpoints.

  • Zero-1-to-3 for multi-view diffusion priors. We use zero123-xl.ckpt.
cd pretrained/zero123
wget https://zero123.cs.columbia.edu/assets/zero123-xl.ckpt
  • Omnidata for depth and normal prediction.
mkdir pretrained/omnidata
cd pretrained/omnidata
# assume gdown is installed
gdown '1Jrh-bRnJEjyMCS7f-WsaFlccfPjJPPHI&confirm=t' # omnidata_dpt_depth_v2.ckpt
gdown '1wNxVO4vVbDEMEpnAi_jwQObf2MFodcBR&confirm=t' # omnidata_dpt_normal_v2.ckpt

Prerequisite (not mandatory)

Before the optimization, you need to get object-centric segmented images from real scene images. We use Grounded-Segment-Anything to get segmentation images. You can also use your personal segmentation modules.

  • We use SAM-HQ for segmentation models. Please download sam_hq_vit_h.pth in the below directory.
cd pretrained/sam
  • Install SAM-HQ followed by their github repository.
pip install segment-anything-hq
  • Install Ground-Segmet-Anything followed by their github repository.
git clone https://github.com/IDEA-Research/Grounded-Segment-Anything.git --recursive
cd Grounded-Segment-Anything
export AM_I_DOCKER=False
export BUILD_WITH_CUDA=True
export CUDA_HOME=/usr/local/cuda-11.8/  # your cuda version

pip install git+https://github.com/facebookresearch/segment-anything.git
pip install git+https://github.com/IDEA-Research/GroundingDINO.git
pip install --upgrade diffusers[torch]
git submodule update --init --recursive
cd grounded-sam-osx && bash install.sh
sh shells/preprocess_grounded_sam.sh

The shell file is organized as follows.

python utils/segment_grounded.py \
    {your_image_path} \
    --single_image

Usage

First you need to create a config file under configs, you can copy and edit one of the provided examples like this:

{
    "random_textures": true,
    "iter": 1500,
    "texture_res": [ 1024, 1024 ],
    "train_res": [512, 512],
    "batch": 8,
    "learn_light" : false,
    "envmap": "data/irrmaps/mud_road_puresky_4k.hdr",
    "pixel_loss": true,
    "write_video": true,
    "hard_time": true,
    "optim_radius": true,
    "train_location": true
}

The default settings are tested under a 48GB A6000. Lower batch and increase iter if your GPU memory is limited.

Then you can run training by:

# single GPU
sh shells/train.sh

The shell file is organized as follows:

# Your input image file path: "./src/pix3d_im3d/chair/image/pix3d_chair_0178_img_rgba.png"
# It can be converted to "./src/{_DATA}/{_CLASS}/image/pix3d_chair_{_ID}_img_rgba.png"

_CLASS="chair"  # category name
_ID="0178"  # ID in the input file name
_DATA="pix3d_im3d"  # folder name
_geo_lr=0.001
_tex_lr=0.001
_loc_lr=0.0001
_sdf=0.6
_normal=2
_lap=2
_wandb_user=temp # your user name

## 1-stage
CUDA_VISIBLE_DEVICES=0 python main.py \
    --config configs/main.json \
    --out_dir ${_DATA}_${_CLASS}_id-${_ID} \
    --save_interval 100 \
    --sds_interval 100 \
    --geo_lr ${_geo_lr} \
    --tex_lr ${_tex_lr} \
    --loc_lr ${_loc_lr} \
    --geo_range 0 1 \
    --tex_range 0 1 \
    --geo_schedule 0.4 \
    --init_mesh_thicker 0 \
    --pix3d_class ${_CLASS} \
    --pix3d_id ${_ID} \
    --lambda_rgb 1 \
    --lambda_mask 1 \
    --data_type ${_DATA} \
    --radius 3.2 \
    --sdf_regularizer ${_sdf} \
    --lambda_mesh_normal ${_normal} \
    --lambda_mesh_laplacian ${_lap} \
    --wandb_user ${_wandb_user}

For single GPU, it takes about 30 minutes to train a single model (1500 iters at batch size of 8).

The validation/checkpoints/final mesh will be stored to ./work_dirs/<out_dir>.

Testing on your images

  • We used Implicit3DUnderstanding as a Image-to-3D model.
  • Preprocessing code utilizing this model will be updated soon...

Acknowledgement

  • The awesome original paper:
@inproceedings{chen2023fantasia3d,
      title={Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation}, 
      author={Rui Chen and Yongwei Chen and Ningxin Jiao and Kui Jia},
      year={2023},
      eprint={2303.13873},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

@inproceedings{liu2023zero1to3,
      title={Zero-1-to-3: Zero-shot One Image to 3D Object}, 
      author={Ruoshi Liu and Rundi Wu and Basile Van Hoorick and Pavel Tokmakov and Sergey Zakharov and Carl Vondrick},
      year={2023},
      eprint={2303.11328},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Citation

@inproceedings{yu2024metta,
    title={MeTTA: Single-View to 3D Textured Mesh Reconstruction with Test-Time Adaptation},
    author={Yu-Ji, Kim and Ha, Hyunwoo and Youwang, Kim and Surh, Jaeheung and Ha, Hyowon and Oh, Tae-Hyun},
    booktitle={The British Machine Vision Conference (BMVC)},
    year={2024}
}

About

[BMVC'24] Official repository for "MeTTA: Single-View to 3D Textured Mesh Reconstruction with Test-Time Adaptation"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published