Abstract:Picking robots have been widely used for citrus harvesting in recent years. However, the rest citrus during continuous picking can be irregularly disturbed by the wind, robot force, and the load weight of bearing branches under the natural environment. The citrus in the disturbed state cannot be rapidly and accurately detected, and then localized online, leading to the low efficiency of automatic robotic picking. In this study, online target detection and rapid localization were proposed using improved YOLOv5s+DeepSORT. The position of citrus at rest was predicted using the motion-tracking trajectory of disturbed citrus within a short period of time. The coordinates of the citrus were then obtained rapidly. Firstly, the CBAM (Convolutional Block Attention Module) attention mechanism was added to the YOLOv5s network, in order to detect the small and occluded targets. The SIoU loss function was used to enhance the direction matching between the prediction and the calibration frame, in order to improve the convergence speed of regression. Secondly, the target re-identification network was improved in the DeepSORT more suitable for the feature extraction of citrus targets. The feature extraction of the network was enhanced to improve the tracking performance on the disturbed citrus; The Count counter was used to accumulate the number of tracking frames in each citrus for an optimal target. Since the disturbance of the rest citrus was progressively propagated over time, the localization prediction and picking were only for targets with optimal tracking trajectories at a time. The real-time updating was realized in real time. Finally, the values of the depth camera were combined within the critical distance range, excluding the influence of background citrus on the detection speed. The number of tracking targets each time was limited to effectively improve the tracking speed of disturbed citrus. The experimental results show that the P (precision) and mAP (average detection accuracy) of improved YOLOv5s were improved by 3.9 and 1.1 percentage points, respectively, with a detection rate of 69.3 frames per second. The MOTA (Multi-Object Tracking Accuracy) and MOTP (Multi-Object Tracking Precision) of the improved DeepSORT were improved by 9.2 and 5.4 percentage points, respectively, whereas, the average number of ID (identity) switching times of targets was reduced by 32 times. Grasping experiments were conducted in the laboratory, in which the citrus was randomly swung along different orientations with an amplitude of about 10 cm. When the predicted localization time was 1, 2, 3, 5, 7, and 10 s, the average precision values of disturbed citrus localization were 21.3%, 53.0%, 81.9%, 83.7%, 86.1%, and 94.9%, respectively. The citrus picking test was conducted with the citrus localization time of 3 s. The average grabbing time for each citrus was 12.8 s, which was 5.6 s shorter than that without the optimization. The efficiency was improved by 30.4%. This finding can provide technical support and references for citrus picking in disturbed states.