Issue with reproducing experiment results. #18

gaohan-cmd · 2024-05-23T03:15:10Z

Hi, I think your work is very meaningful to me, but I encountered some issues while trying to replicate it.Are you using the pre-trained weights from https://huggingface.co/CH3COOK/LL3DA-weight-release/tree/main for the table 5 experiment? I used the following command to evaluate the ScanRefer results as shown in figure 1.
python main.py \ --use_color --use_normal \ --detector detector_Vote2Cap_DETR \ --captioner ll3da \ --checkpoint_dir ./ckpts/opt-1.3b/ll3da-generalist \ --test_ckpt ./ckpts/opt-1.3b/ll3da-generalist/ll3da-opt-1.3b.pth \ --dataset unified_densecap_scanrefer \ --vocab facebook/opt-1.3b \ --qformer_vocab bert-base-embedding \ --dist_url tcp://localhost:222 \ --criterion 'CiDEr@0.5' \ --freeze_detector --freeze_llm \ --batchsize_per_gpu 8 --ngpus 2 \ --max_des_len 256 \ --max_prompt 1 \ --use_beam_search \ --test_only

I fine-tuned it first using the following command.
python main.py \ --use_color --use_normal \ --detector detector_Vote2Cap_DETR \ --captioner ll3da \ --pretrained_weights ./ckpts/opt-1.3b/ll3da-generalist/ll3da-opt-1.3b.pth \ --warm_lr_epochs 0 \ --dataset unified_densecap_scanrefer \ --vocab facebook/opt-1.3b \ --qformer_vocab bert-base-embedding \ --checkpoint_dir ./ckpts/opt-1.3b/ll3da-scanrefer-tuned \ --max_epoch 16 \ --dist_url tcp://localhost:222 \ --eval_every_iteration 4000 \ --start_eval_after -1 \ --save_every 10000 \ --criterion 'CiDEr@0.5' \ --freeze_detector --freeze_llm \ --batchsize_per_gpu 8 --ngpus 2 --base_lr 1e-6 --final_lr 1e-6 \ --max_des_len 256 \ --max_prompt 1 --use_beam_search
After finishing, use the checkpoint_best.pth for evaluation. The command is as follows, but my experimental results did not reach the 65.19 as in the paper. What could be the issue?
python main.py \ --use_color --use_normal \ --detector detector_Vote2Cap_DETR \ --captioner ll3da \ --checkpoint_dir ./ckpts/opt-1.3b/ll3da-scanrefer-tuned \ --test_ckpt ./ckpts/opt-1.3b/ll3da-scanrefer-tuned/checkpoint_best.pth \ --dataset unified_densecap_scanrefer \ --vocab facebook/opt-1.3b \ --qformer_vocab bert-base-embedding \ --dist_url tcp://localhost:222 \ --criterion 'CiDEr@0.5' \ --freeze_detector --freeze_llm \ --batchsize_per_gpu 8 --ngpus 2 \ --max_des_len 256 \ --max_prompt 1 \ --use_beam_search \ --test_only

The text was updated successfully, but these errors were encountered:

ch3cook-fdu · 2024-05-23T05:05:36Z

Please see #11 for more details.

YiwuZhong · 2024-05-30T13:16:22Z

@ch3cook-fdu Thanks for your explanation and nice work!

However, I met the same issue as @gaohan-cmd, by using the pre-trained model weights you uploaded in huggingface. After finetuning, I got 61.8@CIDEr and 35.0@B4, while the reported results in paper are 65.2@CIDEr and 36.8@B4.

I understand that there would be some randomness. But ~3% gap in CIDEr and ~2% gap in B4 are already large. Could you please also verify the reproduction of paper results on your side?

ch3cook-fdu · 2024-05-30T13:27:03Z

To unleash the full potential of LL3DA, I encourage you to:

Train the Vote2Cap-DETR model to align your copy of 3D points with the scene encoder weights.
Train the LL3DA generalist to see whether the results align.

Because the scene encoder is frozen, the ability to perceive the 3D scene might be the bottleneck for reproduction.

YiwuZhong · 2024-05-30T13:30:45Z

@ch3cook-fdu Thanks for your response and suggestion!

Is there any script in this repo (LL3DA) that I can follow to train the detector?

ch3cook-fdu · 2024-05-30T13:39:13Z

You can follow the instructions in https://github.com/ch3cook-fdu/Vote2Cap-DETR, and copy the pretrained weights to this repo.

I might try uploading the point cloud data I processed as well, to see whether your reproduction aligns.

YiwuZhong · 2024-05-30T13:42:54Z

Uploading your processed data would be helpful to me and other researchers. Thanks!

ch3cook-fdu · 2024-06-14T21:26:53Z

Uploading your processed data would be helpful to me and other researchers. Thanks!

Hi we finally managed to upload the processed data to https://huggingface.co/CH3COOK/LL3DA-weight-release/blob/main/scannet_data.zip .

YiwuZhong · 2024-06-18T10:22:51Z

Uploading your processed data would be helpful to me and other researchers. Thanks!

Hi we finally managed to upload the processed data to https://huggingface.co/CH3COOK/LL3DA-weight-release/blob/main/scannet_data.zip .

@ch3cook-fdu Thank you for uploading the data. One more thing I noticed is that, the detector uses the "aligned" version of vertices and boxes, unlike 3DETR. Is there any reason for doing this?

ch3cook-fdu · 2024-06-18T15:56:48Z

For 3D-VL studies, it is a common practice to use the axis-aligned 3D data. You can refer to other repos like ScanRefer and Scan2Cap.

YiwuZhong · 2024-07-10T14:35:18Z

@ch3cook-fdu Thanks for your provided data and the paper results can be reproduced using pre-trained weights.

On the other side, we tried to train the detector using LL3DA repo, however, fail to reproduce the detection performance trained by your Vote2Cap-DETR repo (~2.0 mAP50 gap). We already tried fixing the random seed and re-enable use_random_cuboid. What other gap exists between these two repos in terms of the detector part? Thanks!

ch3cook-fdu · 2024-07-10T15:14:49Z

Could you provide me with more details on the choice of hyper parameters? Try using 1 GPU with a batch size of 8 might help.

YiwuZhong · 2024-07-11T03:14:08Z

@ch3cook-fdu The following script is used in LL3DA repo to train the detector. Did we miss any thing from Vote2Cap-DETR?

python main.py \
    --use_color \
    --use_normal \
    --detector detector_Vote2Cap_DETR \
    --warm_lr_epochs 9 \
    --dataset scannet \
    --checkpoint_dir ./Vote2Cap_DETR_XYZ_COLOR_NORMAL \
    --max_epoch 1080 \
    --eval_every_iteration 2000 \
    --start_eval_after 1999 \
    --save_every 2000 \
    --criterion 'mAP@0.5' \
    --batchsize_per_gpu 8 \
    --ngpus 1 \
    --base_lr 5e-4 \
    --final_lr 1e-6 \
    --lr_scheduler 'cosine' \
    --weight_decay 0.1 \
    --clip_gradient 0.1

ch3cook-fdu · 2024-07-11T03:23:59Z

The random cuboid is disabled in our implementation.
https://github.com/Open3DA/LL3DA/blob/main/datasets/scannet.py#L27

Please try using the original Vote2Cap-DETR repo for reproduction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with reproducing experiment results. #18

Issue with reproducing experiment results. #18

gaohan-cmd commented May 23, 2024

ch3cook-fdu commented May 23, 2024

YiwuZhong commented May 30, 2024 •

edited

Loading

ch3cook-fdu commented May 30, 2024 •

edited

Loading

YiwuZhong commented May 30, 2024

ch3cook-fdu commented May 30, 2024

YiwuZhong commented May 30, 2024 •

edited

Loading

ch3cook-fdu commented Jun 14, 2024

YiwuZhong commented Jun 18, 2024

ch3cook-fdu commented Jun 18, 2024

YiwuZhong commented Jul 10, 2024 •

edited

Loading

ch3cook-fdu commented Jul 10, 2024

YiwuZhong commented Jul 11, 2024

ch3cook-fdu commented Jul 11, 2024

Issue with reproducing experiment results. #18

Issue with reproducing experiment results. #18

Comments

gaohan-cmd commented May 23, 2024

ch3cook-fdu commented May 23, 2024

YiwuZhong commented May 30, 2024 • edited Loading

ch3cook-fdu commented May 30, 2024 • edited Loading

YiwuZhong commented May 30, 2024

ch3cook-fdu commented May 30, 2024

YiwuZhong commented May 30, 2024 • edited Loading

ch3cook-fdu commented Jun 14, 2024

YiwuZhong commented Jun 18, 2024

ch3cook-fdu commented Jun 18, 2024

YiwuZhong commented Jul 10, 2024 • edited Loading

ch3cook-fdu commented Jul 10, 2024

YiwuZhong commented Jul 11, 2024

ch3cook-fdu commented Jul 11, 2024

YiwuZhong commented May 30, 2024 •

edited

Loading

ch3cook-fdu commented May 30, 2024 •

edited

Loading

YiwuZhong commented May 30, 2024 •

edited

Loading

YiwuZhong commented Jul 10, 2024 •

edited

Loading