Abstract:Abstract: China is one of the countries with the largest cultivation area of the orchards in the world. The traditional orchard planting is quite time-consuming and laborious. An orchard robot can be expected as an important artificial intelligence (AI) tool to replace the manual labor in orchard management. However, the robot can encounter various obstacles in the actual operation, due to the complex and changeable environment of an orchard. It is necessary for the agricultural robots to real-time detect obstacles during operation. In recent years, various target detection systems have been widely used for agricultural robot avoidance, such as YOLOv4, YOLOv3, and Faster-RCNN, particularly with the rise of intelligent deep learning. Generally, there were some problems, including the unsatisfactory detection accuracy, a large number of parameters required in the models, low real-time performance, and difficulty in detecting densely overlapping target areas. In this study, an improved YOLOv4-based model of target detection was proposed with the help of the latest vision sensor technology to realize that the agricultural robots can quickly and accurately identify and classify the obstacles in the orchard. A deep separable convolution was utilized to reduce the number of parameters, and further improve the detection speed. An Inverted Residual Unit was selected to replace the Residual Unit in the core network CSP-Darknet in the previous model. In addition, a Soft DIoU-Non-Maximum Suppression (Soft-DIoU-NMS) algorithm was employed to detect the dense areas. Three common obstacles, including pedestrians, fruit trees, and telegraph poles, in the orchards were selected as the detection objects to generate an image dataset. The improved model was trained on the Tensorflow deep learning framework, and then the test images were input into the trained model to detect target obstacles at different distances. Under the same evaluation index, an evaluation was made on the improved YOLOv4, the original YOLOv4, YOLOv3, and Faster-RCNN. The results showed that the improved YOLOv4-based detection model for orchard obstacles achieved an average accuracy rate of 96.92%, 0.61 percent point higher than that of the original YOLOv4 model, 4.18 percent point higher than that of the YOLOv3 model, and 0.04 percent point higher than that of Faster-RCNN model. The recall rate of the proposed model reached 96.31%, 0.68 percent point higher than that of the original YOLOv4, 6.37 percent point higher than that of YOLOv3, and 0.18 percent point higher than that of Faster-RCNN. The detection speed in the improved YOLOv4-based video stream was 58.5 frames/s, 29.4% faster than that in the original YOLOv4, 22.1% faster than that in YOLOv3, and 346% faster than that in Faster-RCNN. The number of parameters in the improved YOLOv4-based model was reduced by 75%, compared with the original YOLOv4 model, 68.7% less than that of the YOLOv3 model, and 81% less than that of the Fasters-RCNN model. In general, the proposed model can greatly reduce its size without losing accuracy, and thereby enhance the real-time performance and robustness in the actual orchard environment. The improved YOLOv4-based model achieved ideal effects in different distance tests, indicating better performance for the obstacle detection in the orchard environment. The findings can provide a strong guarantee for the obstacle avoidance of intelligent robots in orchards.