Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation script Bugs? #15

Closed
voldemortX opened this issue Jun 15, 2022 · 5 comments
Closed

Evaluation script Bugs? #15

voldemortX opened this issue Jun 15, 2022 · 5 comments

Comments

@voldemortX
Copy link
Contributor

voldemortX commented Jun 15, 2022

Hi guys, thanks for the amazing dataset!
However, me and my colleagues have encountered several issues with your evaluation script, which made us unable to get 100% accuracy when testing GT against GT:

  1. You set distance to invisible point (annotated invisible for gt or out-of-range invisible for pred) as dist_th:

https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L159
https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L179
https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L190

So the x & z error counting will be off, they will be at least dist_th = 1.5 for invisible points, I'm guessing these distances should be ignored here.

  1. Because of 1, if a GT line is entirely invisible, any pred's distance to this GT line will be exactly dist_th = 1.5, then it won't pass the initial check here:
    https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L203
    and will be accumulated as FP/FN error. Simply removing this could have other consequences like division by 0 later in:
    https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L208

Anyways, this problem should not show because the script filters lines to have at least 2 visible points. However, the x range filtering is inconsistent between:
https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L104
and

https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L121

Also, there is no filtering after interpolation, if a line has 2 visible points before interpolation but don't afterwards, it will also produce entirely invisible lines. For example, one line has y coordinates [23.5 23.8] and is valid, but y_samples are only integers, it won't be valid after (ex)interpolation.


Btw, by testing GT against GT, I can only get around 87% F1. I saved GT after the coordinate transform and filtering. If you could clarify the ignore mechanism, I can make a pull request to fix this for you. There are two popular ignore mechanisms in metrics, I think the first one sounds better and aligns more with your original metric (only suggestions here):

  1. ignore and let the prediction predict anything (e.g., the 255 ignore index in segmentation datasets).
  2. neither encourage nor discourage a pred, provided if it matches with an ignored GT (e.g., the MOTChallenge non-pedestrian classes are ignored if matched with IoU 0.5, otherwise count the pred as a FP).

I think these issues could have been inherited from the synthetic benchmark. And they could non-trivially influence your already evaluated results.

cc @ChonghaoSima @dyfcalid @hli2020

@ChonghaoSima
Copy link
Collaborator

Thank you for raising this issue. We'll check it and reply to you later

@ChonghaoSima
Copy link
Collaborator

  1. For lanes that are entirely invisible, yes the problem exists here. We are trying to fix this by simply ignoring them in evaluation. But we insist those annotations are meaningful since invisible lanes are still part of the local map, and we will keep them in gt json.
  2. For the inconsistent x range filtering, we're checking if that has an influence on the evaluation result. But yes it's inconsistent.
  3. Filtering after interpolation. We're going to test it to see the difference in results.

Again, thank you for pointing out these issues. Previously we intend to keep consistent with Apollo evaluation code so the adaptation should be easy. We're fixing these bugs and could you make a pull request about them so we can do a double check?

@voldemortX
Copy link
Contributor Author

@ChonghaoSima I've only yet fixed the F score, not the x & z errors. I will make a WIP pull request for you to cross-check.

@voldemortX voldemortX mentioned this issue Jun 15, 2022
1 task
@ChonghaoSima
Copy link
Collaborator

We've fixed this issue and now GT against GT evaluation is perfectly matched, and we will update all related results in our paper, thank you for pointing this out.

@voldemortX
Copy link
Contributor Author

Great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants