Abstract:Manually pre-sowing of seed pieces cannot fully meet the large-scale potato production in China in recent years, due to the low-level mechanization, high labor costs, and intensity. Automated equipment can be expected to realize potato seed cutting. However, the potato seed eyes cannot be accurately positioned during processing using mechanized equipment, resulting in serious waste. Particularly, accurate and rapid target detection is highly required to identify the potato seed eyes, due to the small target objects. It is also necessary for the high recognition of small targets with fewer forward inference parameters. In this study, a target detection model was proposed to rapidly, accurately, and real-time recognize the potato seed eyes in the block-cutting equipment using a lightweight convolutional neural network (CNN). Firstly, a lightweight feature extraction network (GhostNetV2) was selected to replace the CSPDarkNet-53 in the backbone network of YOLOv4, in order to reduce the forward inference parameters of the model for the more focus on small target objects. Secondly, the depthwise separable convolution (DW) modules were used to further reduce the computational complexity in the neck network of YOLOv4. Finally, the bounding box loss function was changed to the SCYLLA-IoU (SIoU) loss function with the angle cost. The impact of the uncertain position was avoided in the prediction box on the convergence speed and the overall detection performance of the model. The experimental results indicated that the parameter size was 12.04 M, when the GhostNetV2 model was utilized as the backbone feature extraction network for the YOLOv4. The test dataset was also collected from the experimental platform. A better performance was achieved in the average precision of 89.13%, where the time required to detect a single image using a CPU on a laptop was 0.148 s. The F1 scores were 0.80 and 0.99 for the buds and potatoes, respectively. The improved backbone network presented approximately one-third of the original parameter size, with an increase in the detection accuracy of 1.85 percentage points, and a decrease in the detection time of 0.279 s, compared with the CSPDarkNet-53 backbone network before improvement. Furthermore, the GhostNetV2 backbone network improved the detection accuracy by 0.75, 2.67, 4.17, and 1.89 percentage points, compared with the lightweight backbone networks, including MobileNetV1, MobileNetV2, MobileNetV3, and GhostNetV1. The F1 values were also improved by 0.06, 0.07, 0.12, and 0.08 for the buds, respectively. The SIoU bounding box loss function showed detection accuracy improvements of 2.97, 4.33, 2.38, and 3.18 percentage points, compared with the GIoU, CIoU, DIoU, and EIoU ones, respectively. Moreover, the improved YOLOv4 object detection model shared the higher recognition accuracy, with increases of 23.26, 27.45, 10.51, 18.09, and 2.13 percentage points, respectively, compared with similar object detection models, such as SSD, Faster-RCNN, EfficientDet, CenterNet, and YOLOv7. In terms of the detection time, the improved YOLOv4 object detection model reduced the detection times by 0.007, 6.754, 1.891, 1.745, 0.422, and 0.326 s, compared with the SSD, Faster-RCNN, EfficientDet, CenterNet, YOLOv7, and YOLOv4, respectively. In model parameter size, the improved detection model was only 12.04M parameters. Overall, the finding can also provide new technical support for the recognition and model deployment of small target objects, such as the potato buds.