Performance on KIT dataset #211

yangyangyang127 · 2024-06-24T04:16:41Z

Thanks for your wonderful work, but i have some problems to accurately reproduce the results on KIT dataset.

I use the KIT dataset and the pre-trained model provided in this repository. I use the provided command to evaluate "python -m eval.eval_humanml --model_path ./save/kit_trans_enc_512/model000400000.pt". Then I cannot accurately reproduce the performance in the main paper's Table 2, even for the ground truth.

Here are my results (args.eval_mode = 'wo_mm'):
========== Matching Score Summary ==========
---> [ground truth] Mean: 2.7784 CInterval: 0.0108
---> [vald] Mean: 3.0964 CInterval: 0.0238
========== R_precision Summary ==========
---> [ground truth](top 1) Mean: 0.4266 CInt: 0.0068;(top 2) Mean: 0.6539 CInt: 0.0044;(top 3) Mean: 0.7827 CInt: 0.0054;
---> [vald](top 1) Mean: 0.4037 CInt: 0.0053;(top 2) Mean: 0.6065 CInt: 0.0044;(top 3) Mean: 0.7312 CInt: 0.0041;
========== FID Summary ==========
---> [ground truth] Mean: 0.0271 CInterval: 0.0029
---> [vald] Mean: 0.5130 CInterval: 0.0459
========== Diversity Summary ==========
---> [ground truth] Mean: 10.9842 CInterval: 0.1113
---> [vald] Mean: 10.7319 CInterval: 0.1032

Here are my results (args.eval_mode = 'mm_short'):
========== Matching Score Summary ==========
---> [ground truth] Mean: 2.7831 CInterval: 0.0125
---> [vald] Mean: 3.0739 CInterval: 0.0177
========== R_precision Summary ==========
---> [ground truth](top 1) Mean: 0.4173 CInt: 0.0080;(top 2) Mean: 0.6531 CInt: 0.0114;(top 3) Mean: 0.7793 CInt: 0.0092;
---> [vald](top 1) Mean: 0.4037 CInt: 0.0196;(top 2) Mean: 0.6156 CInt: 0.0130;(top 3) Mean: 0.7369 CInt: 0.0052;
========== FID Summary ==========
---> [ground truth] Mean: 0.0213 CInterval: 0.0041
---> [vald] Mean: 0.5465 CInterval: 0.0697
========== Diversity Summary ==========
---> [ground truth] Mean: 10.8986 CInterval: 0.1735
---> [vald] Mean: 10.7483 CInterval: 0.2033
========== MultiModality Summary ==========
---> [vald] Mean: 1.8062 CInterval: 0.1764

I did not change anything in the code.

I wonder if someone also suffers from this problem.

GuyTevet · 2024-07-05T13:41:49Z

The issue you are pointing out is the improvement in Multimodel-dist and R-precision? If so, this is due to an evaluation bug fix (#189). You can see also the correct results for HumanML in the README.

GuyTevet · 2024-07-05T13:49:51Z

In future publications - cite the results after the bug fix - i.e. those you got and those in the README.

yangyangyang127 · 2024-07-05T14:00:19Z

Thanks for your response.

yangyangyang127 · 2024-07-05T14:00:57Z

Thanks for your helpful response.

yangyangyang127 closed this as completed Jul 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance on KIT dataset #211

Performance on KIT dataset #211

yangyangyang127 commented Jun 24, 2024

GuyTevet commented Jul 5, 2024

GuyTevet commented Jul 5, 2024

yangyangyang127 commented Jul 5, 2024

yangyangyang127 commented Jul 5, 2024

Performance on KIT dataset #211

Performance on KIT dataset #211

Comments

yangyangyang127 commented Jun 24, 2024

GuyTevet commented Jul 5, 2024

GuyTevet commented Jul 5, 2024

yangyangyang127 commented Jul 5, 2024

yangyangyang127 commented Jul 5, 2024