Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with reproducing experiment results. #18

Open
gaohan-cmd opened this issue May 23, 2024 · 13 comments
Open

Issue with reproducing experiment results. #18

gaohan-cmd opened this issue May 23, 2024 · 13 comments

Comments

@gaohan-cmd
Copy link

Hi, I think your work is very meaningful to me, but I encountered some issues while trying to replicate it.Are you using the pre-trained weights from https://huggingface.co/CH3COOK/LL3DA-weight-release/tree/main for the table 5 experiment? I used the following command to evaluate the ScanRefer results as shown in figure 1.
python main.py \ --use_color --use_normal \ --detector detector_Vote2Cap_DETR \ --captioner ll3da \ --checkpoint_dir ./ckpts/opt-1.3b/ll3da-generalist \ --test_ckpt ./ckpts/opt-1.3b/ll3da-generalist/ll3da-opt-1.3b.pth \ --dataset unified_densecap_scanrefer \ --vocab facebook/opt-1.3b \ --qformer_vocab bert-base-embedding \ --dist_url tcp://localhost:222 \ --criterion 'CiDEr@0.5' \ --freeze_detector --freeze_llm \ --batchsize_per_gpu 8 --ngpus 2 \ --max_des_len 256 \ --max_prompt 1 \ --use_beam_search \ --test_only

I fine-tuned it first using the following command.
python main.py \ --use_color --use_normal \ --detector detector_Vote2Cap_DETR \ --captioner ll3da \ --pretrained_weights ./ckpts/opt-1.3b/ll3da-generalist/ll3da-opt-1.3b.pth \ --warm_lr_epochs 0 \ --dataset unified_densecap_scanrefer \ --vocab facebook/opt-1.3b \ --qformer_vocab bert-base-embedding \ --checkpoint_dir ./ckpts/opt-1.3b/ll3da-scanrefer-tuned \ --max_epoch 16 \ --dist_url tcp://localhost:222 \ --eval_every_iteration 4000 \ --start_eval_after -1 \ --save_every 10000 \ --criterion 'CiDEr@0.5' \ --freeze_detector --freeze_llm \ --batchsize_per_gpu 8 --ngpus 2 --base_lr 1e-6 --final_lr 1e-6 \ --max_des_len 256 \ --max_prompt 1 --use_beam_search
After finishing, use the checkpoint_best.pth for evaluation. The command is as follows, but my experimental results did not reach the 65.19 as in the paper. What could be the issue?
python main.py \ --use_color --use_normal \ --detector detector_Vote2Cap_DETR \ --captioner ll3da \ --checkpoint_dir ./ckpts/opt-1.3b/ll3da-scanrefer-tuned \ --test_ckpt ./ckpts/opt-1.3b/ll3da-scanrefer-tuned/checkpoint_best.pth \ --dataset unified_densecap_scanrefer \ --vocab facebook/opt-1.3b \ --qformer_vocab bert-base-embedding \ --dist_url tcp://localhost:222 \ --criterion 'CiDEr@0.5' \ --freeze_detector --freeze_llm \ --batchsize_per_gpu 8 --ngpus 2 \ --max_des_len 256 \ --max_prompt 1 \ --use_beam_search \ --test_only

image
image

@ch3cook-fdu
Copy link
Contributor

Please see #11 for more details.

@YiwuZhong
Copy link

YiwuZhong commented May 30, 2024

@ch3cook-fdu Thanks for your explanation and nice work!

However, I met the same issue as @gaohan-cmd, by using the pre-trained model weights you uploaded in huggingface. After finetuning, I got 61.8@CIDEr and 35.0@B4, while the reported results in paper are 65.2@CIDEr and 36.8@B4.

I understand that there would be some randomness. But ~3% gap in CIDEr and ~2% gap in B4 are already large. Could you please also verify the reproduction of paper results on your side?

@ch3cook-fdu
Copy link
Contributor

ch3cook-fdu commented May 30, 2024

To unleash the full potential of LL3DA, I encourage you to:

  1. Train the Vote2Cap-DETR model to align your copy of 3D points with the scene encoder weights.
  2. Train the LL3DA generalist to see whether the results align.

Because the scene encoder is frozen, the ability to perceive the 3D scene might be the bottleneck for reproduction.

@YiwuZhong
Copy link

@ch3cook-fdu Thanks for your response and suggestion!

Is there any script in this repo (LL3DA) that I can follow to train the detector?

@ch3cook-fdu
Copy link
Contributor

You can follow the instructions in https://github.com/ch3cook-fdu/Vote2Cap-DETR, and copy the pretrained weights to this repo.

I might try uploading the point cloud data I processed as well, to see whether your reproduction aligns.

@YiwuZhong
Copy link

YiwuZhong commented May 30, 2024

Uploading your processed data would be helpful to me and other researchers. Thanks!

@ch3cook-fdu
Copy link
Contributor

Uploading your processed data would be helpful to me and other researchers. Thanks!

Hi we finally managed to upload the processed data to https://huggingface.co/CH3COOK/LL3DA-weight-release/blob/main/scannet_data.zip .

@YiwuZhong
Copy link

Uploading your processed data would be helpful to me and other researchers. Thanks!

Hi we finally managed to upload the processed data to https://huggingface.co/CH3COOK/LL3DA-weight-release/blob/main/scannet_data.zip .

@ch3cook-fdu Thank you for uploading the data. One more thing I noticed is that, the detector uses the "aligned" version of vertices and boxes, unlike 3DETR. Is there any reason for doing this?

@ch3cook-fdu
Copy link
Contributor

For 3D-VL studies, it is a common practice to use the axis-aligned 3D data. You can refer to other repos like ScanRefer and Scan2Cap.

@YiwuZhong
Copy link

YiwuZhong commented Jul 10, 2024

@ch3cook-fdu Thanks for your provided data and the paper results can be reproduced using pre-trained weights.

On the other side, we tried to train the detector using LL3DA repo, however, fail to reproduce the detection performance trained by your Vote2Cap-DETR repo (~2.0 mAP50 gap). We already tried fixing the random seed and re-enable use_random_cuboid. What other gap exists between these two repos in terms of the detector part? Thanks!

@ch3cook-fdu
Copy link
Contributor

Could you provide me with more details on the choice of hyper parameters? Try using 1 GPU with a batch size of 8 might help.

@YiwuZhong
Copy link

@ch3cook-fdu The following script is used in LL3DA repo to train the detector. Did we miss any thing from Vote2Cap-DETR?

python main.py \
    --use_color \
    --use_normal \
    --detector detector_Vote2Cap_DETR \
    --warm_lr_epochs 9 \
    --dataset scannet \
    --checkpoint_dir ./Vote2Cap_DETR_XYZ_COLOR_NORMAL \
    --max_epoch 1080 \
    --eval_every_iteration 2000 \
    --start_eval_after 1999 \
    --save_every 2000 \
    --criterion 'mAP@0.5' \
    --batchsize_per_gpu 8 \
    --ngpus 1 \
    --base_lr 5e-4 \
    --final_lr 1e-6 \
    --lr_scheduler 'cosine' \
    --weight_decay 0.1 \
    --clip_gradient 0.1

@ch3cook-fdu
Copy link
Contributor

The random cuboid is disabled in our implementation.
https://github.com/Open3DA/LL3DA/blob/main/datasets/scannet.py#L27

Please try using the original Vote2Cap-DETR repo for reproduction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants