Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize ssd detector #1321

Merged
merged 2 commits into from
Aug 28, 2024
Merged

Optimize ssd detector #1321

merged 2 commits into from
Aug 28, 2024

Conversation

kremnik
Copy link
Contributor

@kremnik kremnik commented Aug 27, 2024

What has been done

With this PR, pandas has been removed from the SSD detector script. All computations are now performed in matrix form using numpy, which speeds up calculations and reduces memory consumption.
Additionally, a test for SSD was added to test_extract_faces.py, and a bug related to checking for a black image was fixed: cv2.dnn.blobFromImage expects np.uint8 as input, while np.zeros by default generates np.float64.

How to test

make lint && make test

@serengil
Copy link
Owner

IMO, implementation with pandas is much cleaner and readable.

no idea what is going on these lines

        faces = detections[0][0]
        faces = faces[(faces[:, 1] == 1) & (faces[:, 2] >= 0.90)]
        faces[:, 3:7] = np.int32(faces[:, 3:7] * 300)
        faces[:, 3:7] = np.int32(faces[:, 3:7] * [aspect_ratio_x, aspect_ratio_y, aspect_ratio_x, aspect_ratio_y])
        faces[:, 5:7] -= faces[:, 3:5]

PS: would you please raise a ticket before creating a PR to discuss first, otherwise a PR may not be merged, and your effort goes wasted.

@kremnik
Copy link
Contributor Author

kremnik commented Aug 27, 2024

@serengil Ok, I will try to make it more readable.
The purpose of this PR is to avoid using slow pandas when possible. Also, this is not a piece of code that users actively edit.

@serengil
Copy link
Owner

TBH, I am satisfied with pandas because it comes with readability. No matter users aren't touching this.

@kremnik
Copy link
Contributor Author

kremnik commented Aug 28, 2024

@serengil pandas is great for data analysis and some data pre-processing, but it can be quite slow for production purposes, especially when filtering.
For example, the following code:

detections_df = detections_df[detections_df["is_face"] == 1]  # 0: background, 1: face
detections_df = detections_df[detections_df["confidence"] >= 0.90]

takes around 26ms on selfie-many-people.jpg on my cpu.

However, this code:

faces = faces[(faces[:, 1] == 1) & (faces[:, 2] >= 0.90)]

takes around 16μs on the same photo and the same cpu.

@serengil
Copy link
Owner

Then, you convinced me to switch :)

But please make it readable with comments at least. Because I have to fight with issues and the understanding the code not written by myself is hard.

Storing index keys to variables would be good.

e.g. idx_face = 1 and faces[:, idx_face] == 1

@serengil
Copy link
Owner

LGTM

@serengil serengil merged commit 14158d3 into serengil:master Aug 28, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants