Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Add Keypoint R-CNN #69

Merged
merged 14 commits into from
Feb 12, 2019
Merged

Add Keypoint R-CNN #69

merged 14 commits into from
Feb 12, 2019

Conversation

fmassa
Copy link
Contributor

@fmassa fmassa commented Oct 30, 2018

This is a work-in-progress PR and requires cleanup and a few improvements.

This PR implements Keypoint R-CNN, and add the necessary building blocks for it to work.
It includes a PersonKeypoint class, as well as the

Notable changes:

  • [breaking change] adds supports for multi-scale training. This changes INPUT.MIN_SIZE_TRAIN to be a tuple instead of a number. I need to change all the other configs to this change before merging the PR

Differences wrt Detectron

  • In Detectron, we skip iterations that do not have enough keypoints. In here, we still compute the loss for the boxes, and only skip the keypoint head

Here are the statistics:

backbone type lr sched im / gpu train mem(GB) train time (s/iter) total train time(hr) inference time(s/im) box AP keypoint AP model id
R-50-FPN Fast 1x 2 6.7 0.5351 13.3 0.15268 50.9 58.7 6605296
R-50-FPN Fast 1x 2 6.7 0.5379 13.4 0.15083 50.9 59.0 6605301

We are still a few mAP points worse than the Caffe2 Detectron implementation, but we use less memory and are significantly faster.

I'll keep this PR open until I fix the mAP differences

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Oct 30, 2018
@rbgirshick
Copy link

Do you know if the difference is in training, inference, or both? I.e., have you loaded trained C2 Detectron keypoint models and verified that inference results are within epsilon?

@fmassa
Copy link
Contributor Author

fmassa commented Oct 31, 2018

@rbgirshick the difference is in training: I have loaded a pre-trained C2 model, and I reproduced exactly the accuracies reported in the C2 model zoo

@Darknesszlx
Copy link

Have you fix the mAP differences?

@fmassa
Copy link
Contributor Author

fmassa commented Dec 25, 2018

@Darknesszlx I haven't had the chance to look further into it again, but @wat3rBro used it for his modified backbones and it gave the same mAPs

@Darknesszlx
Copy link

@fmassa Thank you for your replay! But sorry, I can't find keypoints related code in @wat3rBro 's repository: https://github.com/wat3rBro/maskrcnn-benchmark
Could you please give me detailed link?

@wat3rBro
Copy link
Contributor

@Darknesszlx sorry the code is not published.

@ChangErgou
Copy link

@Darknesszlx sorry the code is not published.

Can you provide the difference between your implementation and this PR? We are eager for the modification. What did you do to improve the performance?

@wat3rBro
Copy link
Contributor

wat3rBro commented Jan 4, 2019

@ChangErgou It uses mobile models as backbone and heads, it also replaces upsampling part of keypoint head, I haven't investigated the reason for accuracy drop for original model, but I guess difference might came from the latter part.

@ChangErgou
Copy link

According to the implementation of mask_head, the neg_inds is neg_inds = matched_idxs == Matcher.BELOW_LOW_THRESHOLD. However, in the implementation of keypoint head the neg_inds is neg_inds = matched_idxs == Matcher.BETWEEN_THRESHOLDS.

I think the BETWEEN_THRESHOLDS should be replaced by BELOW_LOW_THRESHOLD. This code is used to remove the negative RoIs which has iou bellow the low threshold with targets. But I have limited GPUs to do the validations. I will report the results later. Maybe you can try this modification.

@fmassa
Copy link
Contributor Author

fmassa commented Jan 10, 2019

Hi @ChangErgou ,

Thanks for the comment! I'm not sure that that line has an effect, because there are no negative proposals selected for keypoints, and my understanding is that all boxes that are not person will be removed by the visibility check.

But I might be wrong here, and I should definitely try this change out!

@ChangErgou
Copy link

After replacing the BETWEEN_THRESHOLDS with BELOW_LOW_THRESHOLD, I train the model on the Coco2017train and validate the net on the coco2017val. The result is that:

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.516
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.844
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.561
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.376
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.598
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.636
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.178
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.527
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.608
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.489
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.667
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.717
Loading and preparing results...
DONE (t=3.17s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type keypoints
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.665
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.887
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.727
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.615
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.749
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.731
Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.930
Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.787
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.673

@fmassa
Copy link
Contributor Author

fmassa commented Jan 14, 2019

@ChangErgou this is awesome!

This might be indeed the issue, did you change anything else in the implementation?

I'll try making this change and launch some trainings. It might be the remaining ingredient to get the PR merged, thanks a lot!

@fmassa
Copy link
Contributor Author

fmassa commented Jan 16, 2019

For information, I've kicked a few trainings with the fix from @ChangErgou .
If I reproduce his numbers, I'll clean up the code and (finally) merge it.

@fmassa
Copy link
Contributor Author

fmassa commented Jan 17, 2019

@ChangErgou quick question: Did you change anything else in the code?

I've done the change you mentioned, and while I got better results than before, it didn't match your keypoints numbers. Here is what I got

Run 1
Box:      0.514
Keypoint: 0.609

Run2
Box:      0.517
Keypoint: 0.605

@ChangErgou
Copy link

Honestly, I know nothing about key points detection. Therefore,I didn't change anything special. At first,the dataset defined in "DatasetCatalog" is oid? I see something like "person_keypoints_train2017_valminusminival_mod" with "coco2014". U do something to them and named it with "mod". I use and download the original dataset of coco2017. Maybe the dataset is something wrong in your server. Second,I only have one GPU. I set the learning rate to 0.001 and train the network with 720k iteration steps. I put 2 images on one GPU. After all, I do nothing because I am newly working on key points detection

@fmassa
Copy link
Contributor Author

fmassa commented Jan 17, 2019

Thanks for the information.

So you effectively used a smaller learning rate (it should have been 0.0025 for 1 GPU I believe).
Did you also change the learning rate schedule, by changing the moments where it decreases the learning rate?

@ChangErgou
Copy link

Yes. The learning rate schedule and training steps are adjusted according to the learning rate.

@Yishun99
Copy link

Hi @ChangErgou ,

Thanks for the comment! I'm not sure that that line has an effect, because there are no negative proposals selected for keypoints, and my understanding is that all boxes that are not person will be removed by the visibility check.

But I might be wrong here, and I should definitely try this change out!

Hi @fmassa

The previous version

neg_inds = matched_idxs == Matcher.BETWEEN_THRESHOLDS
labels_per_image[neg_inds] = 0

leaving the proposals not matched well(below threshold) have chances to be chosen as pos_ids in fg_bg_sampler.

@Yishun99
Copy link

Yishun99 commented Jan 24, 2019

Yes. The learning rate schedule and training steps are adjusted according to the learning rate.

Hi @ChangErgou ,

How did you set SOLVER.STEPS and SOLVER.MAX_ITER when lr=0.001?
Compare your results with the Detectron's:

R-50-FPN

bbox:  51.6 vs. 52.7
keypoint: 66.5 vs. 64.1

I think your Box and Keypoint results are not matched, and the mAP of keypoint is even higher than X-101-64x4d-FPN(66.0) in Detectron. I would be very grateful if you can provide more details.


When I run with config(R-50-FPN, lr=0.001, schedule=(480000,640000), maxiter=720000, num_GPU=1, batch_size=2, train_data=COCO_2017). I got:

  • COCO_2017_val mAP:

    box:  52.7
    keypoint: 62.7
    
  • COCO_2017_val mAP(keypoint) curve:

    selection_945

    (It’s a bit early to drop lr, but seems hard to achieve 66.5 if we delay dropping lr moment according to the trend)

@jonmorton
Copy link

BTW, one difference between this implementation and the caffe2 one is in the handling of invalid minibatches (batches with <20 total keypoints). Detectron will retry minibatch loading until a valid batch is found [1], whereas in this diff, we just zero the keypoint loss (but presumably still backprop all the other losses) [2].

Not sure how much of a difference this makes, though.

[1] https://github.com/facebookresearch/Detectron/blob/cbb0236dfdc17790658c146837215d2728e6fadd/detectron/roi_data/loader.py#L130-L135
[2] https://github.com/facebookresearch/maskrcnn-benchmark/pull/69/files#diff-da5a88b627656a466855dbea03e86b75R165

@fmassa
Copy link
Contributor Author

fmassa commented Jan 25, 2019

@jonmorton precisely, this is one difference that I was aware of when I implemented Keypoints.

I printed the number of times that we have zeroed the keypoint loss but not the other losses (comparing to just fetching a new batch), and over 90k iterations we would zero the loss 2k times, which I'm not sure would account for all the remaining difference (but who knows?).

Making the batch be retried is kind of tricky, and doesn't quite fit with PyTorch DataLoader.

@jonmorton
Copy link

jonmorton commented Jan 25, 2019

EDIT: I realize now that I'm seeing a larger drop rate because I'm using the single GPU, single image per batch setup. With two images per gpu, the drop rate is around what you saw.

It's a bit unfortunate that detectron had such a sensitivity to batch size though.

@fmassa
Copy link
Contributor Author

fmassa commented Jan 28, 2019

I think for this particular case what I might end up doing is to just run for a few more iterations than what we had in Detectron (like 95k instead of 90k), and slightly adapt the learning rate schedules to take this into account. This might not exactly reproduce the results of Detectron as the other models do, but will still be significantly faster and will already add value

@jonmorton
Copy link

jonmorton commented Jan 29, 2019

@fmassa another alternative suggested by @ppwwyyxx when we discussed it was simply filtering out images with fewer than X keypoints before training. (X=10 would give roughly the same skip rate as before, right?). This has the advantage of not having any dependence on batch size.

@jonmorton
Copy link

jonmorton commented Jan 29, 2019

Another small difference between this PR and c2: you have initialized the ConvTranspose2d using kaiming_normal. But in c2 the weights of the transpose conv are initialized using a std=0.1 guassian distribution (unlike the convs): https://github.com/facebookresearch/Detectron/blob/master/detectron/modeling/keypoint_rcnn_heads.py#L62

@fmassa
Copy link
Contributor Author

fmassa commented Jan 30, 2019

@jonmorton thanks for the messages and for looking into the potential reasons for the performance differences!

Yes, maybe filtering the images with fewer than X keypoints would be a cleaner solution, even though the number of keypoints visible inside the proposals is not exactly equivalent I think? The reason why it might not be equivalent is that the proposals are sampled dynamically via the RPN, and they might be bad and include few keypoints? But maybe just skipping those images indeed has the same intended behavior, and is much simpler!

About the initialization, I unfortunately think that we don't actually use that configuration in our models, because USE_DECONV is false in Detectron, see the config and the default config.
We are instead adding the kps_score_lowres, which uses MSRAFill, and I believe it is equivalent to the kaiming_normal_ that we have here (but I might be wrong).

@fmassa
Copy link
Contributor Author

fmassa commented Feb 6, 2019

With the help of @jonmorton , I discovered what was wrong with my experiments with Keypoints that made me unable to reproduce Detectron's results: I had a problem with my training data, and I was using ~15k less images during training!

Now that I've reproduced the results I'll clean the code a bit and (finally!) merge this PR.

Thank you all for your help identifying the issue

@fmassa
Copy link
Contributor Author

fmassa commented Feb 6, 2019

For completeness, here are the results I got:

boxes

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.538
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.829
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.584
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.367
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.616
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.696
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.184
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.543
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.620
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.475
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.684
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.763

keypoints
Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.643
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.862
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.698
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.593
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.724
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.707
Average Recall     (AR) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.905
Average Recall     (AR) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.758
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.658
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.777

Still need further cleanups and improvements, like adding fields support for the other ops in Keypoints
@zimenglan-sysu-512
Copy link
Contributor

hi @fmassa
when do u merege the this pr into master?

@fmassa
Copy link
Contributor Author

fmassa commented Feb 11, 2019

@zimenglan-sysu-512 I was just cleaning up a bit the code, rebasing against master and running some more trainings to verify that I didn't break anything.

It all looks good now, so I'll be merging it early this week.

@fmassa fmassa merged commit e0a525a into master Feb 12, 2019
@fmassa fmassa deleted the keypoints branch February 12, 2019 14:48
@fmassa fmassa changed the title [WIP] Add Keypoint R-CNN Add Keypoint R-CNN Feb 12, 2019
Lyears pushed a commit to Lyears/maskrcnn-benchmark that referenced this pull request Jun 28, 2020
* [WIP] Keypoints inference on C2 models work

* Training seems to work

Still gives slightly worse results

* e2e training works but gives 3 and 5 mAP less

* Add modification proposed by @ChangErgou

Improves mAP by 1.5 points, to 0.514 and 0.609

* Keypoints reproduce expected results

* Clean coco.py

* Linter + remove unnecessary code

* Merge criteria for empty bboxes in has_valid_annotation

* Remove trailing print

* Add demo support for keypoints

Still need further cleanups and improvements, like adding fields support for the other ops in Keypoints

* More cleanups and misc improvements

* Fixes after rebase

* Add information to the readme

* Fix md formatting
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed Do not delete this pull request or issue due to inactivity.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants