Abstract:Abstract: Dragon fruit is one of the most popular fruits in Asia. The current manual picking cannot fully meet the requirement of large-scale production in recent years, due to the labor-intensive task. Alternatively, the automated picking of dragon fruit can be expected to greatly reduce labor intensity. Among them, the vision system can be one of the most important parts of the picking robot. The commonly-used recognition cannot consider the complex growth posture of dragon fruit. The hard branches and complex postures of dragon fruit can make it difficult to achieve automatic picking. It is a high demand to distinguish the dragon fruit with the different postures, and then guide the robotic arm to approach the fruit in an appropriate path. In this study, a multi-pose detection of dragon fruit was proposed for the automatic picking using optimal YOLOv7-tiny model. 1 281 images of dragon fruit were taken in the field, including 450, 535, and 296 images under strong, weak, and artificial light conditions. The image datasets were then divided into 1 036 images for training, 116 images for validation, and 129 images for testing, according to three light levels. Among them, the light conditions were the largest influencing factor on the detection performance. A series of experiments were conducted using the dataset. Firstly, the detection performance was compared with the seven models in the YOLOv7 series. The optimal models were given for the different devices, in terms of the number of model parameters and detection performance. Secondly, the detection performance of the YOLOv7 series models was compared with the other target detection models. Finally, the YOLOv7-tiny model was deployed into the mobile device. Specifically, the depth camera was combined with the robotic arm in the field picking. The results showed that the YOLOv7-e6e model in the YOLOv7 series presented the highest precision of 85.0%, while the YOLOv7x model was the highest recall of 85.4%, and the YOLOv7 model was the highest mean average precision (mAP) of 89.3%. The YOLOv7-tiny model shared the least parameters, weight files, layers, and inference time of 6 × 106, 12MB, 255, and 1.8ms, respectively. It infers that the improved model was the most suitable for mobile devices, due to the fast inference speed. The detection precision of YOLOv7-tiny was 83.6%, the recall was 79.9%, the mAP was 88.3%, and the accuracy rate of classification for the multi-pose dragon fruits was 80.4%. Furthermore, the precision of YOLOv7-tiny increased by 16.8, 4.3, and 4.8 percentage points, respectively, whereas, the mAP increased by 7.3, 21, and 3.9 percentage points, compared with the YOLOv3-tiny, YOLOv4-tiny, and YOLOX-tiny. The precision of YOLOv7-tiny increased by 7.3, 4.2, 7.3, 6.5, 3.5, and 3.9 percentage points, respectively, compared with the YOLOv5s, YOLOXs, YOLOv4G, YOLOv4M, YOLOv5x, and YOLOXx. In addition, the mAP of YOLOv7-tiny increased by 8.2, 5.8, 4.0, and 42.4 percentage points, respectively, compared with the EfficientDet, SSD, Faster-RCNN, and CenterNet, indicating the high level of detection accuracy of YOLOv7-tiny model. The picking system of dragon fruit was constructed to verify by some picking experiments using the trained YOLOv7-tiny model. The experiment results show that the inference time of the vision system only accounted for 22.6% of the whole picking action time. The picking success rate of dragon fruits in the front view was 90%, indicating the higher performance of automatic picking than before. The conclusion can also provide technical support for fruit picking.