-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance on KIT dataset #211
Comments
The issue you are pointing out is the improvement in Multimodel-dist and R-precision? If so, this is due to an evaluation bug fix (#189). You can see also the correct results for HumanML in the README. |
In future publications - cite the results after the bug fix - i.e. those you got and those in the README. |
Thanks for your response. |
Thanks for your helpful response. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks for your wonderful work, but i have some problems to accurately reproduce the results on KIT dataset.
I use the KIT dataset and the pre-trained model provided in this repository. I use the provided command to evaluate "python -m eval.eval_humanml --model_path ./save/kit_trans_enc_512/model000400000.pt". Then I cannot accurately reproduce the performance in the main paper's Table 2, even for the ground truth.
Here are my results (args.eval_mode = 'wo_mm'):
========== Matching Score Summary ==========
---> [ground truth] Mean: 2.7784 CInterval: 0.0108
---> [vald] Mean: 3.0964 CInterval: 0.0238
========== R_precision Summary ==========
---> [ground truth](top 1) Mean: 0.4266 CInt: 0.0068;(top 2) Mean: 0.6539 CInt: 0.0044;(top 3) Mean: 0.7827 CInt: 0.0054;
---> [vald](top 1) Mean: 0.4037 CInt: 0.0053;(top 2) Mean: 0.6065 CInt: 0.0044;(top 3) Mean: 0.7312 CInt: 0.0041;
========== FID Summary ==========
---> [ground truth] Mean: 0.0271 CInterval: 0.0029
---> [vald] Mean: 0.5130 CInterval: 0.0459
========== Diversity Summary ==========
---> [ground truth] Mean: 10.9842 CInterval: 0.1113
---> [vald] Mean: 10.7319 CInterval: 0.1032
Here are my results (args.eval_mode = 'mm_short'):
========== Matching Score Summary ==========
---> [ground truth] Mean: 2.7831 CInterval: 0.0125
---> [vald] Mean: 3.0739 CInterval: 0.0177
========== R_precision Summary ==========
---> [ground truth](top 1) Mean: 0.4173 CInt: 0.0080;(top 2) Mean: 0.6531 CInt: 0.0114;(top 3) Mean: 0.7793 CInt: 0.0092;
---> [vald](top 1) Mean: 0.4037 CInt: 0.0196;(top 2) Mean: 0.6156 CInt: 0.0130;(top 3) Mean: 0.7369 CInt: 0.0052;
========== FID Summary ==========
---> [ground truth] Mean: 0.0213 CInterval: 0.0041
---> [vald] Mean: 0.5465 CInterval: 0.0697
========== Diversity Summary ==========
---> [ground truth] Mean: 10.8986 CInterval: 0.1735
---> [vald] Mean: 10.7483 CInterval: 0.2033
========== MultiModality Summary ==========
---> [vald] Mean: 1.8062 CInterval: 0.1764
I did not change anything in the code.
I wonder if someone also suffers from this problem.
The text was updated successfully, but these errors were encountered: