Understanding detection model training and metrics #1725

hanshupe007 · 2024-09-13T20:58:18Z

hanshupe007
Sep 13, 2024

I am fine-tuning the fast_base model, which already worked pretty well, but missed some individual special characters (*) in a table, the base model returns:

Validation loss: 0.370893 (Recall: 32.82% | Precision: 36.06% | Mean IoU: 47.00%)
`
After fine-tuning the pretrained model on 200 synthetic images, the result is:

Validation loss: 0.191029 (Recall: 31.34% | Precision: 31.33% | Mean IoU: 42.00%)

Which leads me to the following questions:

While the loss improved significantly, the other metrics got worse. Why, and how is Loss, Recall and Precision calculated?
When I visualize the images from the validation set (20 synthetic examples), the fine-tuned model detected every single word correctly, only the boxes don't match exactly. Why this is not reflected in any metric,
I visualize the images during the validation (see below, pred in blue, target in red, there is some issue with the text visualization but can be ignored) and noticed that after fine-tuning the boxes are larger than before, which could be the reason for the decreased metrics? I used the same tight boxes for training, any idea why they become larger at inference compared to the base-model?
Is there any recommendation how much padding the boxes should have around the text?

felixdittrich92 · 2024-09-17T06:47:43Z

felixdittrich92
Sep 17, 2024
Maintainer

Hi @hanshupe007 👋🏼,

While the loss improved significantly, the other metrics got worse. Why, and how is Loss, Recall and Precision calculated?

Loss:

doctr/doctr/models/detection/fast/pytorch.py

Line 213 in 9045dcf

def compute_loss(

It's a combination of text loss + kernel loss (where the text loss is the loss over the created mask as you provide the boxes and kernel loss is the loss for the "inner text" where we create shrunken masks for each box) additional OHEM is used which forces the model to focus on more complicated parts to detect - Attention: while building the targets we ignore boxes which would produce box masks with less than 3x3 pixels

Metrics:

doctr/doctr/utils/metrics.py

Line 230 in 9045dcf

class LocalizationConfusion:

When I visualize the images from the validation set (20 synthetic examples), the fine-tuned model detected every single word correctly, only the boxes don't match exactly. Why this is not reflected in any metric,

It is covered in mean IoU (Intersection over Union) - which is really hard to increase at some point - but it describes as the name says how accurate the predictions matches your target boxes :)

I visualize the images during the validation (see below, pred in blue, target in red, there is some issue with the text visualization but can be ignored) and noticed that after fine-tuning the boxes are larger than before, which could be the reason for the decreased metrics? I used the same tight boxes for training, any idea why they become larger at inference compared to the base-model?

The base model seems to work already pretty well in this case i would try to freeze the feature extractor you can do this with (--freeze-backbone)

Is there any recommendation how much padding the boxes should have around the text?

Optimal no padding they should be close as possible to the text.

Hope this helps :)

2 replies

hanshupe007 Sep 17, 2024
Author

Thanks for your answers, that helps. It's still not completely clear why my observations are not reflected in the IoU (see the screenshot).

I have a higher IoU with the base model, which misses many small boxes completely, while matching better the size of the bigger boxes.
My fine-tuned model has a much lower IoU, but it detects all target boxes correctly, but doesn't match the exact dimensions of the bigger boxes.

The fine-tuned model is the preferred one, although lower IoU. Is this IoU behavior as expected?

felixT2K Sep 17, 2024

Yep the IoU score can only be computed if there are predictions - so it "says" how exact the predictions matches your targets - only for the predicted onces. To measure that all areas was detected correctly you have recall and precision :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding detection model training and metrics #1725

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Understanding detection model training and metrics #1725

hanshupe007 Sep 13, 2024

Replies: 1 comment · 2 replies

felixdittrich92 Sep 17, 2024 Maintainer

hanshupe007 Sep 17, 2024 Author

felixT2K Sep 17, 2024

hanshupe007
Sep 13, 2024

Replies: 1 comment 2 replies

felixdittrich92
Sep 17, 2024
Maintainer

hanshupe007 Sep 17, 2024
Author