RasPiDets: A Quasi-Real-Time Defect Detection Method with End-Edge-Cloud Collaboration
- 1.Introductiion
- 2.RasPiDets
- 3.An easy starting instance
- 4.ACDO algorithm for End-Edge-Could Collaboration
- 5.PDD System Implementation
- 6.Datasets
- 7.STE for Audio Extraction
- 8.Loss Function
- 9.Experiments
- 10.Performance
- 11.Visualization of Detection Results
- 12.Citation
Product Defect Detection (PDD) exists in many processes of industrial product production, which is an important workflow to sort out unqualified products. We focus on the PDD problem at multiple production stages, each of which produces specific data types and requires strict product quality control. In this work, we developed a lightweight PDD (RasPiDets) method with end-edge-cloud collaboration to defect the detection of product in industrial scenarios.
Specifically, Audio Anomaly Detection (AAD) existed in Air Conditioner (AC) internal units and Appearance Defect Detection (ADD) raised in AC external units are a very important and time-consuming quality control process. This greatly restricts the beat of the assembly line, which in turn leads to a reduction in production efficiency. To solve this problem, we developed a lightweight PDD (RasPiDets) method with End-Edge-Cloud (EEC) collaboration to accelerate the detection speed of edge nodes.
The main contributions are summarized as follows:
- A novel lightweight defect detection model (RasPiDets) is specially designed for PDD tasks, which can run smoothly on the Raspberry Pi and realizes the plug-and-play of edge nodes. Impressively, the detection speed per round of RasPiDets (2.85 seconds) is reduced by 22% compared to advanced YOLOv7 (3.64 seconds) for the same conditions. This ensures low-cost and rapid deployment of defect detection algorithms in industrial scenarios.
- We propose two innovative modules for RasPiDets to ensure reduced detection time while maintaining high detection accuracy. Deep Cascade U-shaped Network (DCUN) can quickly extract rich multi-scale features, and Adaptive Multi-Scale Squeeze-and-Excitation (AMSE) facilitates multi-scale feature reuse and fusion. By integrating them, RasPiDets (98.24%) outperforms YOLOv7 (97.50%) by 0.74% in terms of the metric mAP@0.5. This is particularly noteworthy because improvements in high-accuracy ranges are extremely challenging.
- This work utilizes end-edge-cloud resources as a holistic system in the field of defect detection, employing reinforcement learning algorithms (ACDO) for unified scheduling. The speed of the PDD task is reduced an average of 64%} when using the proposed ACDO compared to other scheduling strategies.
- The proposed methods offer a 1.2% improvement in detection accuracy over current state-of-the-art (SOTA) models and an average 64% reduction in detection time.
- Two PDD datasets (SDU-Haier-AQD and SDU-Haier-ND) from AC manufacturing are open sourced. This holds significant importance for accelerating the development and research of manufacturing industry.
Considering together with the ``where are the defects'' problem existing in image type, we propose a Easy-to-Deploy defect detection Network (RasPiDets) that can solve the ``what'' and ``where'' detection problems in a unified network.
The innovations of RasPiDets include:
- A cascaded U-Net architecture is designed to quickly obtain feature maps of various sizes;
- The lightweight deep architecture can be easily obtained by stacking U-Nets;
- Numerous shortcut path is added between the feature maps in U-Nets to reduce model over-fitting;
- RasPiDets is a lightweight network architecture that can be deployed on Raspberry Pi.
In the following list, symbol
- DETR: This model is an encoder-decoder structure that adopts Transformer as its backbone.
-
$\checkmark$ It transforms the problem into a set prediction problem, simplifying the detection process. -
$\times$ Its prediction performance suffers from inadequate utilization of multiscale features. For example, it has a lower mAP in small object recognition. In addition, it requires a longer training process.
- MobileNetv3: The model is a mobile network obtained via a neural architecture search (NAS) algorithm.
-
$\checkmark$ It searches for the optimal number of channels and filters. Moreover, it adds many effective modules, such as Squeeze-and-Excitation module (SE) and hard-swish activation function. -
$\times$ The absence of multiscale (MC) features and dense connections (DC), coupled with the heavy reliance on numerous deepwise convolutions, renders it inadequate in extracting multi-resolution features, consequently leading to a degraded performance in detecting small objects.
- YOLO v3, v4 and v7: The design of the one-stage detection method enables faster detection speed, and larger resolution input improves its detection performance.
-
$\checkmark$ These are single-stage detection algorithms, and all have specially designed network structures and activation functions, e.g., DarkNet-53 and SiLU activation function. -
$\times$ The network structure design is complex and there are numerous redundant features, which hinders its detection speed. In addition, the transfer of high-resolution features to later layers occurs at a slow pace, consequently leading to diminished detection accuracy for small target objects.
- RasPiDets: A lightweight model that can run smoothly on Raspberry Pi (RasPiDets) is good at solving the audio and image detection problems.
-
$\checkmark$ RasPiDets contains many efficient components, such as multiscale (MS), dense connections (DC), inverted modules (IM), MB Cascade and Adaptive MS. These components are crucial for enhancing performance and speed. -
$\times$ The mAP of RasPiDets in specific categories is not uniformly optimal, such as in the categories of cyclone net.
-
Configure Darknet environment to accelerate RasPiDets, the details of the configuration can refer to this.
-
Make dataset directory like this:
- train (directory)
- xx1.jpg (image sample)
- xx1.txt (description file)
- xx2.jpg (image sample)
- xx2.txt (description file)
- ......
- train (directory)
-
Run the following code for training,
cd RasPiDets
./darknet detector train ./dataconfigs/ok.data ./configs/RasPiDets.cfg ./configs/RasPiDets_best.weights
- Run the following code for validation
./darknet detector valid ./dataconfigs/ok.data ./configs/RasPiDets.cfg ./configs/RasPiDets_best.weights
- Run the following code for testing
./darknet detector test ./dataconfigs/ok.data ./configs/RasPiDets.cfg ./configs/RasPiDets_best.weights
- Visualization of network architecture (Zoom Out)
An Actor-Critic based Dynamic Offloading (ACDO) is designed to reduce the overall delay of RasPiDets on end devices. Ultrasonic sensors, scanners and cameras are used to obtain the location, type and appearance of AC, respectively. They are connected with Raspberry Pi, an end device deployed with RasPiDets, to form a closed-loop process of perception, decision-making and control. In this end-edge-cloud collaboration scenario, the cloud has abundant computing capabilities but is far away from the end devices, and the edge servers are relatively close to the end devices but need to be connected to the end device via 5G network.
We built an assembly line for industrial production detection to implement the PDD system with end-edge-cloud collaboration. In this PDD system, Raspberry Pi (4B) is used as the edge node to connect end devices such as ultrasonic sensors, scanners and cameras to realize low-cost and flexible deployment of the PDD algorithm. Subsequently, the edge nodes are connected to the cloud via 5G to offload and schedule PDD tasks to further improve the speed of the detection algorithm. This makes the deployment of PDD detection units in complex industrial scenarios with high convenience and low cost. Finally, the negative detection results are sent to PLC to sort out the unqualified products.
The initially sampled AAD data in AC internal unit is saved as `wav' files, and the sampling frequency of the audio signal is 48kHz. To efficiently utilize these audio files, the long piece of audio is splited into many frames and each of them is converted into the 2D spectrogram by SG-Gram algorithm. Therefore, the AAD dataset is a multi-label image detection dataset, which includes 562 training samples and 142 test samples.
- We open-sourced the audio dataset with annotation files here.
- Alternatively, you can easily get the dataset by this link.
- The detailed introduction document can be found here.
The ADD dataset includes 9401 training samples and 1408 testing samples. There are totally 11 types of this dataset, each type has about 1000 images. This dataset contains 16 classes of objects to be detected, and each type contains a different number of classes. The number of categories of the 16 detected objects is shown in the following table.
- We open-sourced 10,449 samples with annotation files here.
- Alternatively, you can easily get the dataset by this link.
- The detailed introduction document can be found here.
We utilize the short-time energy method (STE) [18] to quickly extract effective audio signals, which can be formulated as
Comparisons of different anomaly detection methods for extracting valid audio clips.
RasPiDets achieves the best performance and fast runing speed compared to other SOTA models, realizing the best trade-off between performance and speed. As shown in following figure, the performance (mAP), runing speed (FLOPs) and model size (parameters, indicated by bubble size) comparisons of different SOTA models on AAD tasks.
The average mAP of RasPiDets is much better than that of other lightweight object detection models. As shown in following figure, mAP of different models when IoU ≥ 0.75.
AAD and ADD tasks with 3 priorities reach the lowest delay at different time intervals when ACDO is adopted.
Welcome to cite our paper.
Daojun Liang, Haixia Zhang, Qiaojian Han, Dongfeng Yuan and Minggao Zhang, "RasPiDets: A Quasi-Real-Time Defect Detection Method with End-Edge-Cloud Collaboration," in IEEE Transactions on Industrial Informatics, 2025, early access.