Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YOLO #7496

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

YOLO #7496

wants to merge 17 commits into from

Conversation

senarvi
Copy link

@senarvi senarvi commented Apr 4, 2023

A generic YOLO implementation that supports the most important features of YOLOv3, YOLOv4, YOLOv5, YOLOv7, Scaled-YOLOv4, and YOLOX. It includes networks that have been written in PyTorch, but the user can also load a network from a Darknet configuration file. The features such as matching predictions to targets have been implemented in a modular way, so that they can easily be replaced, or reused in different models. Target class labels may be specified as a matrix of class probabilities, allowing multi-label classification. Includes unit tests and complete type hints for static type checking.

This code is contributed with the premission of my employer Groke Technologies.

Fixes #6341

@pytorch-bot
Copy link

pytorch-bot bot commented Apr 4, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/7496

Note: Links to docs will display an error until the docs builds have been completed.

❗ 2 Active SEVs

There are 2 currently active SEVs. If your PR is affected, please view them below:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@oke-aditya
Copy link
Contributor

oke-aditya commented Apr 5, 2023

cc @NicolasHug

@senarvi
Copy link
Author

senarvi commented Apr 20, 2023

Seems like I was able to fix most of the unit tests. The biggest job was to get the TorchScript compilation working. In order to fix it, I had to make the code a bit uglier in some places. For example, the target matching classes cannot be subclasses, because the JIT compilation can't handle subclasses. Also, the iou function and the cross-entropy function cannot be passed as function objects, which makes the loss computation a bit awkward. I don't know if there would be some way to make function objects work.


def compute_mean_std(tensor):
# can't compute mean of integral tensor
tensor = tensor.to(torch.double)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason for using torch.double instead of torch.float32?
also, just in case there exists torch.std_mean function

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing @vadimkantorov . This function used to be an inner function of test_detection_model() and I just copied it outside the function so that other functions can call it. So I don't know if there has been a reason to cast it to double instead of float32, but based on the above comment it seems like float32 would be fine. Should I change it to float32 and also switch to std_mean()?

Copy link

@vadimkantorov vadimkantorov May 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm an not one of maintainers, let's wait for a more official review :) This is also part of test code, so less important...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vadimkantorov all right, thanks for pointing that out though.

dxoigmn added a commit to IntelLabs/MART that referenced this pull request Jun 23, 2023
@patches11
Copy link

I'm attempting to test these models by training them on the COCO dataset, however I am unable to get close to the results reported in the original papers, for example yolo v7 here:

https://github.com/WongKinYiu/yolov7?tab=readme-ov-file#performance

Wondering if I am doing something wrong in training these, or if you can provide any guidance on what you are doing to train them and the results you are seeing?

@senarvi
Copy link
Author

senarvi commented May 2, 2024

Hi @patches11 . I haven't trained models on YOLO recently. Also, I don't have access to a compute cluster for training these models anymore. I did check earlier that I can use YOLOv4 weights, so the forward pass should be correct. But there are lots of details used in model training, like mosaic and copy-paste augmentation. I feel like all the details are not mentioned in the papers. I'm not even sure how exactly the models were tested (what data and resolution were used). I also found gradient clipping to be important, even though it's not used in the papers. Maybe @FateScript can comment what we're still missing?

@patches11
Copy link

@senarvi thanks for the details, I will take a look at implementing those augmentations.

It does definitely seem like we don't get all the details in the papers

@senarvi
Copy link
Author

senarvi commented May 6, 2024

Yeah, it will be interesting if we can get all the augmentations and training tricks exactly as in the papers. That way we could get a fair comparison between YOLO and other architectures. I think YOLOv7 uses 1280x1280 input size, which consists of four 640x640 tiles. So for each network input you sample four images, which makes it a bit more complicated. For the copy-paste augmentation we need segmentation masks, in addition to the bounding boxes. I think during testing they sort the images so that for each batch you get as similar sizes as possible, so that you don't have to resize the images as much. Still, if I'm right, the test results vary a little bit, depending on the batch size.

@discort
Copy link

discort commented Dec 2, 2024

@senarvi
Thanks for sharing your implementation. Any plans to finish this PR?

@senarvi
Copy link
Author

senarvi commented Dec 2, 2024

@discort I was left waiting for someone to review this PR. I'm not sure if there's interest in merging this to torchvision... there hasn't been much activity in 1.5 years. More or less the same implementation was merged to lightning-bolts, though: https://github.com/Lightning-Universe/lightning-bolts/tree/master/src/pl_bolts/models/detection/yolo

I'm still happy to help with the code if there's interest, but I don't work for Groke Technologies anymore and I don't have the computational resources, so I cannot really continue to develop any features that we're missing to achieve a better performance on the COCO dataset.

I did go through other YOLO implementations, however, and noticed one detail that has been added at some point but I have missed: Task Alignment Learning (TAL). Assigning ground-truth boxes to anchors is based on a metric that combines both the score given to the correct class and the IoU. In case someone has the resources to continue the development, that's one thing to look at.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC] Support YOLOX detection model
6 participants