Evaluation script Bugs? #15

voldemortX · 2022-06-15T04:33:06Z

Hi guys, thanks for the amazing dataset!
However, me and my colleagues have encountered several issues with your evaluation script, which made us unable to get 100% accuracy when testing GT against GT:

You set distance to invisible point (annotated invisible for gt or out-of-range invisible for pred) as dist_th:

https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L159
https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L179
https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L190

So the x & z error counting will be off, they will be at least dist_th = 1.5 for invisible points, I'm guessing these distances should be ignored here.

Because of 1, if a GT line is entirely invisible, any pred's distance to this GT line will be exactly dist_th = 1.5, then it won't pass the initial check here:
https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L203
and will be accumulated as FP/FN error. Simply removing this could have other consequences like division by 0 later in:
https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L208

Anyways, this problem should not show because the script filters lines to have at least 2 visible points. However, the x range filtering is inconsistent between:
https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L104
and

https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L121

Also, there is no filtering after interpolation, if a line has 2 visible points before interpolation but don't afterwards, it will also produce entirely invisible lines. For example, one line has y coordinates [23.5 23.8] and is valid, but y_samples are only integers, it won't be valid after (ex)interpolation.

Btw, by testing GT against GT, I can only get around 87% F1. I saved GT after the coordinate transform and filtering. If you could clarify the ignore mechanism, I can make a pull request to fix this for you. There are two popular ignore mechanisms in metrics, I think the first one sounds better and aligns more with your original metric (only suggestions here):

ignore and let the prediction predict anything (e.g., the 255 ignore index in segmentation datasets).
neither encourage nor discourage a pred, provided if it matches with an ignored GT (e.g., the MOTChallenge non-pedestrian classes are ignored if matched with IoU 0.5, otherwise count the pred as a FP).

I think these issues could have been inherited from the synthetic benchmark. And they could non-trivially influence your already evaluated results.

cc @ChonghaoSima @dyfcalid @hli2020

The text was updated successfully, but these errors were encountered:

ChonghaoSima · 2022-06-15T04:40:11Z

Thank you for raising this issue. We'll check it and reply to you later

ChonghaoSima · 2022-06-15T06:41:49Z

For lanes that are entirely invisible, yes the problem exists here. We are trying to fix this by simply ignoring them in evaluation. But we insist those annotations are meaningful since invisible lanes are still part of the local map, and we will keep them in gt json.
For the inconsistent x range filtering, we're checking if that has an influence on the evaluation result. But yes it's inconsistent.
Filtering after interpolation. We're going to test it to see the difference in results.

Again, thank you for pointing out these issues. Previously we intend to keep consistent with Apollo evaluation code so the adaptation should be easy. We're fixing these bugs and could you make a pull request about them so we can do a double check?

voldemortX · 2022-06-15T06:44:49Z

@ChonghaoSima I've only yet fixed the F score, not the x & z errors. I will make a WIP pull request for you to cross-check.

ChonghaoSima · 2022-07-03T12:06:44Z

We've fixed this issue and now GT against GT evaluation is perfectly matched, and we will update all related results in our paper, thank you for pointing this out.

voldemortX · 2022-07-03T12:13:37Z

Great!

ilnehc assigned ilnehc and unassigned ilnehc Jun 15, 2022

ilnehc mentioned this issue Jun 15, 2022

关于测试指标的问题 OpenDriveLab/PersFormer_3DLane#14

Closed

voldemortX mentioned this issue Jun 15, 2022

[RFC] Fix evaluation #16

Merged

1 task

voldemortX closed this as completed Jul 3, 2022

ilnehc mentioned this issue Jul 8, 2022

A question about labeling #19

Closed

ilnehc mentioned this issue Aug 23, 2022

关于论文中的指标问题 OpenDriveLab/PersFormer_3DLane#39

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation script Bugs? #15

Evaluation script Bugs? #15

voldemortX commented Jun 15, 2022 •

edited

Loading

ChonghaoSima commented Jun 15, 2022

ChonghaoSima commented Jun 15, 2022

voldemortX commented Jun 15, 2022

ChonghaoSima commented Jul 3, 2022

voldemortX commented Jul 3, 2022

Evaluation script Bugs? #15

Evaluation script Bugs? #15

Comments

voldemortX commented Jun 15, 2022 • edited Loading

ChonghaoSima commented Jun 15, 2022

ChonghaoSima commented Jun 15, 2022

voldemortX commented Jun 15, 2022

ChonghaoSima commented Jul 3, 2022

voldemortX commented Jul 3, 2022

voldemortX commented Jun 15, 2022 •

edited

Loading