Abstract:Tea sorting has been one of the most important links in tea production. Manual sorting has been often adopted to remove the excess impurities (such as branches and grains) from the collected fresh tea in traditional processing. However, the current sorting cannot fully meet the high requirement of taste and quality in the finished tea products after collection in recent years, due to the labor-intensive and high cost. Fortunately, machine vision has been gradually applied to tea impurity sorting, particularly for fully automatic sorting in the process of tea collection. Among them, the single-stage lightweight network (represented by YOLOv5 deep learning) can perform better performance for small targets with high detection speed and accuracy. However, the conventional YOLOv5 network cannot be used to extract the characteristics of tea impurities, due to the disorderly clusters, the generally small targets, the complex types of impurities, and the similar color to the tea. Particularly, the overlapping small targets can cause an inaccurate prediction box, leading to low accuracy or miss detection of the tea impurities. It is necessary to improve the conventional YOLOv5 network to meet the requirements of tea impurity detection. In this study, an improved YOLOv5 model was proposed to detect the tea impurity with a higher accuracy and detection speed than before. The YOLOv5 was taken as the baseline network. The K-Means clustering was applied to cluster the real boxes of impurities as the anchor frame size suitable for the characteristics of tea impurities. Convolutional Block Attention Module (CBAM) was introduced into the backbone feature extraction network (CSPDarkNet). The key features were obtained using the channel and spatial dimension of feature images. A Spatial Pyramid Pooling (SPP) module was added to the neck network, in order to integrate and extract the multi-scale features of different sensory fields. The deep separable convolution was updated to reduce the number of network parameters for the higher detection speed. The confidence loss weight of the small target prediction in the feature map was improved for the higher detection accuracy of the network for the small targets. The data set was taken as the Tieguanyin tea mixed with the rice, melon seed shell, bamboo branches, and tea stems. The results show that the improved YOLOv5 presented a higher confidence score than the conventional one, where the positioning was much more accurate without missing detection. The mAP and FPS of improved YOLOv5 reached 96.05% and 62 frames/s, respectively. The higher efficiency and robustness of the improved model were achieved to compare the mainstream target detections. The findings can provide a strong reference for the detection accuracy and speed of small target impurities in the tea production process.