Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about yolov3 tiny occlusion track? #2553

Open
derekwong66 opened this issue Mar 7, 2019 · 16 comments
Open

question about yolov3 tiny occlusion track? #2553

derekwong66 opened this issue Mar 7, 2019 · 16 comments
Labels

Comments

@derekwong66
Copy link

Hi @AlexeyAB ,

Thanks for ur contribution. Can you explain more about how to train tiny-occlusion track cfg? which pre-trained weight we can use and crnn ?

thanks in advance.

@AlexeyAB
Copy link
Owner

AlexeyAB commented Mar 7, 2019

@derekwong66 Hi,

It is just an experimental detector (it isn't well tested yet)

  1. that should have higher accuracy for detection on a video stream

  2. should solve blinking issue, and 2 related problems: solve track_id migration from one object to another and solve false object counting on video, when detection disaper and apear again with new track_id (if you use yolo_console_dll.cpp (./uselib or yolo_console_dll.exe for counting objects)

  3. and can solve occlusions issue (for two objects with different class_id, or object & background) for at least 20 frames time_steps=20 in the cfg-file.


(I have not tested yet it with random=1 in cfg-file)

You should train it on sequential frames from one or several videos:

  • yolo_mark.exe data/occlusion cap_video occlusion.mp4 1 - it will grab each 1 frame from video (i.e. all frames) usually you should set max(1, camera_fps / detection_fps), preferably set lower value than higher, since it will be augmented by speed=3 in cfg-file (1x-3x times faster) during training (preferably to use the latest version of Yolo_mark https://github.com/AlexeyAB/Yolo_mark )

  • yolo_mark.exe data/occlusion data/occlusion_train.txt data/occlusion.names - to mark bboxes, even if at some point the object is invisible (occlused/obscured by another type of object)

  • darknet.exe detector train data/occlusion.data yolov3-tiny_occlusion_track.cfg yolov3-tiny.conv.15 -map - to train the detector & tracker

  • darknet.exe detector demo data/occlusion.data yolov3-tiny_occlusion_track.cfg backup/yolov3-tiny_occlusion_track_10000.weights forward.avi - run detection

The only conditions - the frames from the video must go sequentially in the train.txt file.
And you should validate results on a separate Validation dataset, for example, divide your dataset into 2:

  1. train.txt - first 80% of frames (80% from video1 + 80% from video 2, if you use frames from 2 videos)
  2. valid.txt - last 20% of frames (20% from video1 + 20% from video 2, if you use frames from 2 videos)

Idea in simple words:

  • Inside, this network works like a super resolution algorithm based on RNN/LSTM-layers (without image reconstruction), getting for itself a more detailed representation of objects from a variety of images (frames from video) - it can give higher accuracy: https://arxiv.org/abs/1704.02738 , https://arxiv.org/abs/1801.04590 , https://arxiv.org/abs/1805.02704

  • A recurrent RNN/LSTM-layers allow network to memorize the state that if an object was detected on several frames, then it is highly likely that the object is present on several subsequent frames: https://arxiv.org/pdf/1607.05781.pdf

  • The recurrent layers conv-RNN/conv-LSTM (with kernel_size=3) also allow network to predict and extrapolate the trajectory of motion of object.


Example of state of art super-resolution network: https://arxiv.org/abs/1704.02738
super_resolution

Example of object Detection and Tracking:

det_n_track


Your training process can looks like this:

chart_full_occlusion

@diennv
Copy link

diennv commented Mar 20, 2019

Dear AlexeyAB,

I also have problem with track_id when using Yolov3 for counting objects on video.

I used yolov3.cfg for my own dataset.
I saw that yolov3-tiny_cocclusion_track.cfg haves some modification compared with yolov3-tiny.cfg.
In example, [crnn] layer is added...
I have a question. For yolov3.cfg, could i do same way to solve track_id issue ?

@AlexeyAB
Copy link
Owner

@diennv Hi,

For yolov3.cfg, could i do same way to solve track_id issue ?

Yes, you can.
Just it may require very much GPU-RAM for Yolov3 + CRNN, that can be trained only on highend GPUs.

Also later I will add conv-LSTM layer, that can work with higher accuracy.

@PythonImageDeveloper
Copy link

@AlexeyAB
Hi,
I have one 1080 GPU, Is this device enough for training the yolov3-tiny_cocclusion_track? yolo3-occlusion-track too?
Is the yolov3-tiny_cocclusion_track algorithm better than existing algorithms in OpenCV for training such as KFC, MOSSE,....?

@AlexeyAB
Copy link
Owner

@zeynali Hi,

GTX 1080 is enought for training yolov3-tiny_cocclusion_track.cfg
But if you want to create your custom yolo3-occlusion-track.cfg then you should use: Quadro RTX 8000 (48 GB GPU-RAM), Quadro GV100 (32 GB GPU-RAM), Nvidia TITAN V - CEO Edition (32 GB GPU-RAM), Tesla V100 32GB-Edition, DGX-2 with V100 32GB-edition

yolov3-tiny_cocclusion_track.cfg isn't well tested yet, and will be modified to use conv-LSTM instead of conv-RNN later. So I can't compare it with another algorithms.

@kamarulhairi
Copy link

Hi,

for the 3rd step which is to train the detector and tracker, where do i get the data/occlusion.data?

Thank You

@jacklin602
Copy link

Hi @AlexeyAB ,

Can you explain more about the rules of labeling bboxes for occlusion tracking?
If the object is totally invisible (occluded by another type of object or background), should i estimate and mark the object no matter how long it is occluded? If the object is partially occluded, should i only mark the visible area of the object or the estimated total extent of the object?

Thank you

@AlexeyAB
Copy link
Owner

@jacklin602 Hi,

If the object is totally invisible (occluded by another type of object or background), should i estimate and mark the object no matter how long it is occluded?

Yes, you should mark totally invisible objects in Training and Validation datasets.


If the object is partially occluded, should i only mark the visible area of the object or the estimated total extent of the object?

You can do as you want that it will be detected. Usually I mark the estimated total extent of the object.

Also in several days I will commit new version of yolov3-tiny_occlusion_track.cfg with convolutional-LSTM, that is works much better for occlusions.

@alexanderfrey
Copy link

@AlexeyAB Hi, I'm also very much interested into your LSTM implementation to compare it to Deep sort tracker.
Best
Alexander

@Tangotrax
Copy link

@AlexeyAB I'm also very much interested in your LSTM implementation

@AlexeyAB
Copy link
Owner

AlexeyAB commented May 23, 2019

@alexanderfrey @Tangotrax @diennv @derekwong66 @PythonImageDeveloper @kamarulhairi

You can try to use LSTM-models with the latest version of Darknet: #3114 (comment)

For example, this model: https://github.com/AlexeyAB/darknet/files/3199631/yolo_v3_tiny_pan_lstm.cfg.txt

How to train: #3114 (comment)

@alexanderfrey
Copy link

alexanderfrey commented Jun 2, 2019

@AlexeyAB Thanks, appreciate your work very much ! I am very excited to see how it performs. Any recommendations on how many sequences it should be trained with ?

@AlexeyAB
Copy link
Owner

AlexeyAB commented Jun 2, 2019

@alexanderfrey There are no exact recommendations yet.
Just each sequence should have more than 200 frames.

@alexanderfrey
Copy link

alexanderfrey commented Jun 26, 2019

@AlexeyAB
Hi Alexey,
thanks for your great work ! I trained a yolo_v3_tiny_pan_lstm like you explained in #3114 and the results look promising for the first 1000 iterations. I train on 15 video sequences with approx. 180 frames per sequences. (I know this should be higher)

Unfortunately the avg. loss becomes NAN after roughly 1200 iterations. I use the yolov3-tiny.conv.14 weights and I already set state_constrain=16 for each [conv_lstm] layer and sequential_subdivisions=8 and sgdr_cycle=10000. I train on 3 classes and adjusted the filters accordingly to 24(the ones before the yolo layers...) Anchors are default and I set batch and subdivision to 1. What else can I do to make the training run through ?

Thanks for any help

@AlexeyAB
Copy link
Owner

@alexanderfrey Hi,

Anchors are default and I set batch and subdivision to 1.

Did you set batch=1 subdivision=1 for training?
You must use

batch=16
subdivisions=4

you can increase batch, but you shouldn't decrease batch.

What GPU do you use?


If it doesn't help - then try to set

learning_rate=0.001
burn_in=1000
max_batches = 10000

policy=steps
steps=8000,9000
scales=.1,.1

If it doesn't help, then try to set learning_rate=0.0005 or learning_rate=0.0002

@alexanderfrey
Copy link

alexanderfrey commented Jun 27, 2019

@AlexeyAB
Thank you very much for the answer. I started a new run this night with batch 64 and subdivision 64. So far its running. Should steps be 2 or 4 values ? In the default config file its 4 values....
Btw. is it possible to get the track_id of an object ?
Thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants