Abstract:Litchi picking robot is an important tool for improving the automation of litchi picking operation. The spatial position information of litchi cluster needs to be acquired when the robot picks litchi normally. In order to guide the robot moving to the picking position and improve the picking efficiency, the vision pre-positioning method of litchi picking robot under large field of view is proposed in this paper studied. Firstly, using the binocular stereo vision system composed of two industrial cameras that have been calibrated, 250 pairs of litchi cluster images under large field of view was taken in the litchi orchard in Guangzhou, the spatial positions of key litchi clusters were recorded by using a laser range finder, and the results were compared with those tested in the paper. In order to expand the sample size, the original image and the polar line correction image were randomly cropped and scaled in a small range, and the final image data set was 4 000 sheets. After that, by using labeling, the data set of the target detection network was created. Secondly, by using the YOLOv3 network and the DenseNet classification network, combined with the characteristics of single target and single scene of litchi cluster detection task (only for orchard environment), the network structure was optimized, a Dense Module with a depth of 34 layers and a litchi cluster detection network YOLOv3-DenseNet34 based on the Dense Module was designed. Thirdly, Because of the the complexity of the background image under large field of view, the dense stereo matching degree of the whole image is low and the effect is poor, at the same time, some litchi clusters can not appear in the public view of the image at the same time, therefore, a method for calculating sub-pixel parallax was designed. By solving the quadratic curve composed of parallax and similarity, the parallax under sub-pixel was used to calculate the spatial positions of the litchi cluster. Through the comparison with the original network of YOLOv3, the test network performance of the paper was tested, and found that the YOLOv3-DenseNet34 network improved the detection accuracy and detection speed of the litchi cluster, the mAP (mean average precision) value was 0.943, the average detection speed was 22.11 frame/s and the model size was 9.3 MB, which was 1/26 of the original network of YOLOv3. Then, the detection results of the method were compared with the results of the laser range finder. The max absolute error of the pre-positioning at the detection distance of 3 m was 36.602 mm, the mean absolute error was 23.007 mm, and the average relative error was 0.836%. Test results showed that the vision pre-positioning method studied in this paper can basically meet the requirements of vision pre-positioning under large field of view in precision and speed. And this method can provide reference for other vision pre-positioning methods under large field of view of fruits and vegetables picking.