-
Notifications
You must be signed in to change notification settings - Fork 11.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can you train for keypoint detection? #2
Comments
Key point detection is not implemented in this release but it should be easy to do if you want to try it. Simply change the mask head to use cross entropy loss rather than binary cross entropy and extend the Dataset class to load a dataset that includes key points. |
@waleedka: Thanks a lot for your rapid reply. I will try keypoint detection.
Could you guide me a little bit where (in which files) to add the code. I am reading your code ... :) |
I think most of your changes will be in model.py:
Which dataset are you going to be using? |
Thank you so much @waleedka . I will train my own data but in the same format as COCO. By the way, could you tell me why did you do "...=np.where(m >= 128, 1, 0)" both in minimize_mask and expand_mask? In minimize_mask, you may convert the data type to boolean, since the data type in mini_mask is boolean, but why 128? In expand_mask, m from mini_mask is already boolean, doing " mask[y1:y2, x1:x2, i] = np.where(m >= 128, 1, 0)", all the elements would be 0. My understanding is definitely somehow wrong. Can you help me understand it correctly? |
The function |
@waleedka: Is it possible to train on custom datasets that have only bounding boxes, but no segmentation? Mmm let me rephrase. I've seen object detection models based on ResNet101, but for some reason, this is better. I'd like to use this one for doing object detection on dataset that does not have image seg.
My dataset isn't huge like COCO. 5 classes, 4 images each per class = 2k images. |
@taewookim You'll need to change the same places as in the change for key point detection, but rather then modify the mask branch and loss, you'd instead remove them completely. See my comment above about which functions to modify. Alternatively, for a quicker hack that would be okay if your dataset is small and the extra processing load of the mask branch is not an issue, you could simply have your In terms of accuracy, you should expect to get similar accuracy to other object detection frameworks built on the Faster RCNN architecture because the basic building blocks are the same. Another related point. If your dataset is small, you could use resnet50 instead of resnet101 to make training faster. A discussion about that is at issue #5 |
@waleedka Hi thanks for your great work!! Now I want to predict human keypoint using your code But I have a few questions: The input mask ground truth for segmentation is [batch, img_height, img_width , num_instances]
So the input should be [batch, img_height, img_width, keypoint_num * num_instances] Also some problem with line 531 - 539 because the mask is only 1 point set to 1 but after crop, resize, round it may have more than 1 point set to 1 so maybe we need to just set the point with max value to 1 You said we just to change My naive thought is: |
I said most of the changes are in those functions, but didn't meant that these are the only places to touch. It's been a long time since I read the paper so I'm afraid I can't give you a precise and concrete list of places to change. But I'm happy to help answer questions or review any changes you make and offer feedback. I think your intuition is correct about adding a new head for key points. You can use the mask code as a template and modify as necessary. |
@waleedka `
` Then When I use the mode.fitgenerator(...) it seems the error occurs at build_fpn_mask_graph which I changed to: `
So do you think any problem about the Cut layer? |
Have you train and evaluate on human pose estimation successfully? |
After fixing some details, and reformulating the function loss to: Gather the masks (predicted and true) that contribute to loss loss = [] Got the following results
As you can see, the neural network does not recognize between right or left shoulder, right or left knee, etc. |
Maybe more training and also be careful with data augmentation. Flipping maybe it's not a good idea in human pose. |
@RodrigoGantier , which files did you change to include keypoint detection. I would like to start this approach. BTW, are you not forgetting to include background class? If you have a softmax activation, you should include a background class. It appears that the net needs always to predict one of keypoint classes, even when it's background |
@filipetrocadoferreira |
That would be helpful. I'm stuck in the loss function. But I'm planning next week to develop this. I hope we can share some opinions |
Hi @filipetrocadoferreira, I am also developing the mask rcnn for human pose estimation, but there are some bugs. If you want, I can share my code with you so that we can debug together. The following is the code of mask loss function and the tensor shape is the same as @RodrigoGantier 's configuration. `target_class_ids = K.reshape(target_class_ids, (-1,))
############################## #My program focuses on 12 key points (without key points on the face) |
@QtSignalProcessing my main problem is the same, the missing key points labels, since there are photographs which do not have all the points, these labels (vectors of 28 x 28 with only zeros) turn the loss function into zero " loss = - (mean (y_label * log (y_predict )) ", the result in my opinion is: the neural network facing the not visible point or nonexistent point, looks for the closest or more alike point, proving the bad results. -My Actual loss function 'with this I solve the crass problem, in my case' target_masks = K.reshape(target_masks, (-1, 784, 14)) y_true = tf.gather(target_masks, positive_ix) loss = K.switch(tf.size(y_true) > 0, P.S: I've already tried the loss function with the Euclidean distance, which gives the worst results |
I have a doubt. if num_rois != proposals how can we make gather with the positive_ix for the both tensor? |
HI @RodrigoGantier , in my mask loss function, I add additional operations to get only non-zero labels: `pred = []
I am waiting for the results and I will update my progress as soon as possible. |
Hi @filipetrocadoferreira , num_rois should equal to proposals, according to this implementation. |
@QtSignalProcessing Assuming, y_pred = [positive rois, 728,12] and y_pred is in one-hot incode format, I think pos_lsh_ix = tf.where (l_sh_t> 0) [:, 0] is not correct because u just erase the other zeros in the onehot vector. I suppose what you intend do not to take into account the labels that contain zero, leaving with a form of y_pred = [positive rois, 728.10] if two key points are missing for example, |
@filipetrocadoferreira In my understanding, for training of the neural network, first are selected the positive propolsals, then is filled(padding) the final tensor with proposals negative and zeros to reach the maxim number of propolsals configured in the config function, positive_ix contains the index of positive propolsals, so for in training stage are selected only the positive propolsals (for inference stage in the case of y_pred were sectioned the propolsas with bigest probability that usually correspond to the positive index) |
This is my trial to address the problem of empty keypoints
|
I'm not being able to converge the mask loss.
|
@Superlee506 My kpt loss function:
|
@QtSignalProcessing The number of keypoint in coco dataset is 17, why did you use 19? |
@Superlee506 I am not using coco |
@QtSignalProcessing Finally, I checked the original Detectron code for human pose estimation, and changed my code. The loss converged, but the detection result isn't as good as the original paper. And the model can't distinguish between right or left shoulder, right or left knee, etc. no matter when I used the flipping augment or not. |
What did you changed? |
@filipetrocadoferreira A lot of places, and I find many mistakes in RodrigoGantier code. Firstly, I changed the ground truth keypoint as label type. Secondly the loss function, I added weights in the keypoint loss function as the Detectron did, and then the sparse_softmax_cross_entropy_with_logits converges quickly. What's more importantly, the flipping method isn't right for keypoints, and we need some modifications. However, my results doesn't as good as the oringal paper. I 'm confused about it. I plan to submit my code when the results seem good. |
Nice! I also found the way to deal with keypoint ground-truth can't be the same as the mask (because resizes and crops will prolly make clear it) the code would be amazing |
@filipetrocadoferreira @QtSignalProcessing @RodrigoGantier @racinmat I opensource my project with detailed code comments. The loss can converge quickly, but the predicted results are not as good as the original paper. I just have one 980 graphics card, so I release my code, and any contribution or improvement is welcome and appreciated. https://github.com/Superlee506/Mask_RCNN |
@Superlee506 It's really hard to achieve results reported in the original paper since the training parameters should be carefully selected ( I read sth like this in one of the issues in Detectron ). BTW, distinguishing left/ right key points replies on the geometric information of human body, this could be done by post processing. |
@QtSignalProcessing How to do the post processing? In my case, my model usually output the left/right key point together. |
https://github.com/Superlee506/Mask_RCNN_Humanpose, I change the name of this repository. |
@Superlee506 Positions of nose and eyes provide information that you can use for distinguishing left and right. Otherwise you should have some assumptions. Sorry for my last comment, I used wrong words. The best way to distinguish left and right is to change the key point head to model the key points relationships. |
@RodrigoGantier , thanks for your advice. It is really helpful to me. But for your following code, I have a little problems. pred_masks = K.reshape(pred_masks, (-1, 784, 14)) Gather the masks (predicted and true) that contribute to loss loss = [] |
@rujiao @waleedka @RodrigoGantier , thanks for your advice. Now I have changed segmentation part for keypoint detection. But I found that rcnnL1Loss will be very big such as 234677418896143419441152.0000. Do you know what is the reason? Any advice will be appreciated. Thank you. |
@rujiao |
Has anyone been successful in using the Mask RCNN to detect only keypoints? |
Yes, I have used Mask-RCNN to detect bbox and keypoints. It works quick well. You can simply remove the mask part |
* use_mini_mask as config, not param
@rujiao Could you share your code in Github? |
Upgrading to Tensorflow 2
Hi @waleedka : Thanks for the great work! Is it possible to train for keypoint detection? Sorry for the wrong title of the issue, I can't correct it.
The text was updated successfully, but these errors were encountered: