MeTTA: Single-View to 3D Textured Mesh Reconstruction with Test-Time Adaptation (BMVC 2024)

Paper | Project Page

Install

Python version 3.9.

# conda setup
conda create -n metta python=3.9
conda activate metta

You can use cuda that fits your gpu environment.

pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 torchaudio --extra-index-url https://download.pytorch.org/whl/cu118 --no-cache-dir

Install cuDNN followd by this guideline.
Install additional pacakges.

pip install -r requirements.txt

# for zero123
pip install taming-transformers-rom1504 --no-deps

Trouble-shooting

Install gridencoder manually

pip install ./gridencoder

If there is a problem with the ninja package, check if OpenGL is installed on your gpu.

sudo apt-get install -y build-essential
sudo apt-get install freeglut3-dev libglu1-mesa-dev mesa-common-dev

Prerequisite

To use multi-view diffusion models, you need to download some pretrained checkpoints.

Zero-1-to-3 for multi-view diffusion priors. We use zero123-xl.ckpt.

cd pretrained/zero123
wget https://zero123.cs.columbia.edu/assets/zero123-xl.ckpt

Omnidata for depth and normal prediction.

mkdir pretrained/omnidata
cd pretrained/omnidata
# assume gdown is installed
gdown '1Jrh-bRnJEjyMCS7f-WsaFlccfPjJPPHI&confirm=t' # omnidata_dpt_depth_v2.ckpt
gdown '1wNxVO4vVbDEMEpnAi_jwQObf2MFodcBR&confirm=t' # omnidata_dpt_normal_v2.ckpt

Prerequisite (not mandatory)

Before the optimization, you need to get object-centric segmented images from real scene images. We use Grounded-Segment-Anything to get segmentation images. You can also use your personal segmentation modules.

We use SAM-HQ for segmentation models. Please download sam_hq_vit_h.pth in the below directory.

cd pretrained/sam

Install SAM-HQ followed by their github repository.

pip install segment-anything-hq

Install Ground-Segmet-Anything followed by their github repository.

git clone https://github.com/IDEA-Research/Grounded-Segment-Anything.git --recursive
cd Grounded-Segment-Anything
export AM_I_DOCKER=False
export BUILD_WITH_CUDA=True
export CUDA_HOME=/usr/local/cuda-11.8/  # your cuda version

pip install git+https://github.com/facebookresearch/segment-anything.git
pip install git+https://github.com/IDEA-Research/GroundingDINO.git
pip install --upgrade diffusers[torch]
git submodule update --init --recursive
cd grounded-sam-osx && bash install.sh

sh shells/preprocess_grounded_sam.sh

The shell file is organized as follows.

python utils/segment_grounded.py \
    {your_image_path} \
    --single_image

Usage

First you need to create a config file under configs, you can copy and edit one of the provided examples like this:

{
    "random_textures": true,
    "iter": 1500,
    "texture_res": [ 1024, 1024 ],
    "train_res": [512, 512],
    "batch": 8,
    "learn_light" : false,
    "envmap": "data/irrmaps/mud_road_puresky_4k.hdr",
    "pixel_loss": true,
    "write_video": true,
    "hard_time": true,
    "optim_radius": true,
    "train_location": true
}

The default settings are tested under a 48GB A6000. Lower batch and increase iter if your GPU memory is limited.

Then you can run training by:

# single GPU
sh shells/train.sh

The shell file is organized as follows:

# Your input image file path: "./src/pix3d_im3d/chair/image/pix3d_chair_0178_img_rgba.png"
# It can be converted to "./src/{_DATA}/{_CLASS}/image/pix3d_chair_{_ID}_img_rgba.png"

_CLASS="chair"  # category name
_ID="0178"  # ID in the input file name
_DATA="pix3d_im3d"  # folder name
_geo_lr=0.001
_tex_lr=0.001
_loc_lr=0.0001
_sdf=0.6
_normal=2
_lap=2
_wandb_user=temp # your user name

## 1-stage
CUDA_VISIBLE_DEVICES=0 python main.py \
    --config configs/main.json \
    --out_dir ${_DATA}_${_CLASS}_id-${_ID} \
    --save_interval 100 \
    --sds_interval 100 \
    --geo_lr ${_geo_lr} \
    --tex_lr ${_tex_lr} \
    --loc_lr ${_loc_lr} \
    --geo_range 0 1 \
    --tex_range 0 1 \
    --geo_schedule 0.4 \
    --init_mesh_thicker 0 \
    --pix3d_class ${_CLASS} \
    --pix3d_id ${_ID} \
    --lambda_rgb 1 \
    --lambda_mask 1 \
    --data_type ${_DATA} \
    --radius 3.2 \
    --sdf_regularizer ${_sdf} \
    --lambda_mesh_normal ${_normal} \
    --lambda_mesh_laplacian ${_lap} \
    --wandb_user ${_wandb_user}

For single GPU, it takes about 30 minutes to train a single model (1500 iters at batch size of 8).

The validation/checkpoints/final mesh will be stored to ./work_dirs/<out_dir>.

Testing on your images

We used Implicit3DUnderstanding as a Image-to-3D model.
Preprocessing code utilizing this model will be updated soon...

Acknowledgement

The awesome original paper:

@inproceedings{chen2023fantasia3d,
      title={Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation}, 
      author={Rui Chen and Yongwei Chen and Ningxin Jiao and Kui Jia},
      year={2023},
      eprint={2303.13873},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

@inproceedings{liu2023zero1to3,
      title={Zero-1-to-3: Zero-shot One Image to 3D Object}, 
      author={Ruoshi Liu and Rundi Wu and Basile Van Hoorick and Pavel Tokmakov and Sergey Zakharov and Carl Vondrick},
      year={2023},
      eprint={2303.11328},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Fantasia3D unofficial codebase.
Nvdiffrec codebase.

Citation

@inproceedings{yu2024metta,
    title={MeTTA: Single-View to 3D Textured Mesh Reconstruction with Test-Time Adaptation},
    author={Yu-Ji, Kim and Ha, Hyunwoo and Youwang, Kim and Surh, Jaeheung and Ha, Hyowon and Oh, Tae-Hyun},
    booktitle={The British Machine Vision Conference (BMVC)},
    year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
configs		configs
data		data
dataset		dataset
geometry		geometry
gridencoder		gridencoder
guidance		guidance
ldm		ldm
render		render
shells		shells
src/pix3d_im3d		src/pix3d_im3d
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
LICENSE_NVIDIA		LICENSE_NVIDIA
README.md		README.md
dpt.py		dpt.py
encoding.py		encoding.py
inference.py		inference.py
main.py		main.py
optimizer.py		optimizer.py
requirements.txt		requirements.txt
rgba_to_rgb.py		rgba_to_rgb.py
setup.cfg		setup.cfg
setup.py		setup.py
shell_preproces.py		shell_preproces.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MeTTA: Single-View to 3D Textured Mesh Reconstruction with Test-Time Adaptation (BMVC 2024)

Paper | Project Page

Install

Trouble-shooting

Prerequisite

Prerequisite (not mandatory)

Usage

Testing on your images

Acknowledgement

Citation

About

Releases

Packages

Languages

License

postech-ami/MeTTA

Folders and files

Latest commit

History

Repository files navigation

MeTTA: Single-View to 3D Textured Mesh Reconstruction with Test-Time Adaptation (BMVC 2024)

Paper | Project Page

Install

Trouble-shooting

Prerequisite

Prerequisite (not mandatory)

Usage

Testing on your images

Acknowledgement

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages