-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
YOLO #7496
base: main
Are you sure you want to change the base?
YOLO #7496
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/7496
Note: Links to docs will display an error until the docs builds have been completed. ❗ 2 Active SEVsThere are 2 currently active SEVs. If your PR is affected, please view them below: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
cc @NicolasHug |
Seems like I was able to fix most of the unit tests. The biggest job was to get the TorchScript compilation working. In order to fix it, I had to make the code a bit uglier in some places. For example, the target matching classes cannot be subclasses, because the JIT compilation can't handle subclasses. Also, the iou function and the cross-entropy function cannot be passed as function objects, which makes the loss computation a bit awkward. I don't know if there would be some way to make function objects work. |
|
||
def compute_mean_std(tensor): | ||
# can't compute mean of integral tensor | ||
tensor = tensor.to(torch.double) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason for using torch.double
instead of torch.float32
?
also, just in case there exists torch.std_mean
function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing @vadimkantorov . This function used to be an inner function of test_detection_model()
and I just copied it outside the function so that other functions can call it. So I don't know if there has been a reason to cast it to double instead of float32, but based on the above comment it seems like float32 would be fine. Should I change it to float32 and also switch to std_mean()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm an not one of maintainers, let's wait for a more official review :) This is also part of test code, so less important...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vadimkantorov all right, thanks for pointing that out though.
I'm attempting to test these models by training them on the COCO dataset, however I am unable to get close to the results reported in the original papers, for example yolo v7 here: https://github.com/WongKinYiu/yolov7?tab=readme-ov-file#performance Wondering if I am doing something wrong in training these, or if you can provide any guidance on what you are doing to train them and the results you are seeing? |
Hi @patches11 . I haven't trained models on YOLO recently. Also, I don't have access to a compute cluster for training these models anymore. I did check earlier that I can use YOLOv4 weights, so the forward pass should be correct. But there are lots of details used in model training, like mosaic and copy-paste augmentation. I feel like all the details are not mentioned in the papers. I'm not even sure how exactly the models were tested (what data and resolution were used). I also found gradient clipping to be important, even though it's not used in the papers. Maybe @FateScript can comment what we're still missing? |
@senarvi thanks for the details, I will take a look at implementing those augmentations. It does definitely seem like we don't get all the details in the papers |
Yeah, it will be interesting if we can get all the augmentations and training tricks exactly as in the papers. That way we could get a fair comparison between YOLO and other architectures. I think YOLOv7 uses 1280x1280 input size, which consists of four 640x640 tiles. So for each network input you sample four images, which makes it a bit more complicated. For the copy-paste augmentation we need segmentation masks, in addition to the bounding boxes. I think during testing they sort the images so that for each batch you get as similar sizes as possible, so that you don't have to resize the images as much. Still, if I'm right, the test results vary a little bit, depending on the batch size. |
@senarvi |
@discort I was left waiting for someone to review this PR. I'm not sure if there's interest in merging this to torchvision... there hasn't been much activity in 1.5 years. More or less the same implementation was merged to lightning-bolts, though: https://github.com/Lightning-Universe/lightning-bolts/tree/master/src/pl_bolts/models/detection/yolo I'm still happy to help with the code if there's interest, but I don't work for Groke Technologies anymore and I don't have the computational resources, so I cannot really continue to develop any features that we're missing to achieve a better performance on the COCO dataset. I did go through other YOLO implementations, however, and noticed one detail that has been added at some point but I have missed: Task Alignment Learning (TAL). Assigning ground-truth boxes to anchors is based on a metric that combines both the score given to the correct class and the IoU. In case someone has the resources to continue the development, that's one thing to look at. |
A generic YOLO implementation that supports the most important features of YOLOv3, YOLOv4, YOLOv5, YOLOv7, Scaled-YOLOv4, and YOLOX. It includes networks that have been written in PyTorch, but the user can also load a network from a Darknet configuration file. The features such as matching predictions to targets have been implemented in a modular way, so that they can easily be replaced, or reused in different models. Target class labels may be specified as a matrix of class probabilities, allowing multi-label classification. Includes unit tests and complete type hints for static type checking.
This code is contributed with the premission of my employer Groke Technologies.
Fixes #6341