-
Notifications
You must be signed in to change notification settings - Fork 82
Case Study: Face Detection
YonghaoHe edited this page Mar 4, 2021
·
2 revisions
In this page, we describe how to get a face detection model for practical use. Essentially, you have to consider two conflicting things: accuracy and speed, sometimes under the condition of limited memory.
Before dive into the details, we must clearly know the requirements. Here, we set up some fake requirements that guide the following design:
- face detection using HD surveillance cameras
- outdoor scenes
- faces that are far away from the camera should be detected, say the face range is 16 pixels to 256 pixels (longer side)
- the resolution of video stream is 4K (3840x2160)
- the deployment GPU is RTX 2080Ti, and the speed > 30FPS
- accuracy: AP > 0.9
In practice, data should be collected in the real world and well annotated. Here, we assume that you already have enough data.
- the video frame resolution is 4K, and we need to detect small faces (16 pixels). So we can not downsample the original frame, meanwhile keeping 30 FPS
- faces are nearly 'square', and with analysis from 1, we choose FCN-style structure
- RTX 2080Ti is OK
- use FP16 mode in deployment, FP16 can greatly increase the speed while keeping the accuracy
- four heads are created for ranges [16, 32], [32, 64], [64, 128] and [128, 256], the strides are 4, 8, 16, 32
- according to above results, we can design the whole structure and comfirm if it meets the speed requirement using timing_inference_latency.py. At last, we get a satisfied network.
For example, WIDERFACE_LFD_L is suitable for this case.