Abstract:Banana has been one of the major fruits in the production and consumption in China. But, the banana harvesting is a high labor consuming activity with the low efficiency and large fruit damage. This study aims to improve the operation efficiency and quality of the banana in the picking robot. An accurate and rapid recognition was also proposed to locate the fruit axis at the bottom of banana using the YOLOv5 algorithm. Specifically, a coordinate attention (CA) mechanism was fused into the backbone network. The Concentrated-Comprehensive Convolution Block (C3) feature extraction module was fused with the CA attention mechanism module to form the C3CA module, in order to enhance the extraction of the banana feature information. The original Complete Intersection over Union (CIoU) of loss function was replaced with the Efficient Intersection over Union (EIoU). As such, the convergence of the model was speeded up to reduce the loss value. After that, the anchor point was determined for the test to improve the regression formula of prediction target box. The camera coordinate system of the point was transformed to deal with the three-dimensional coordinates. D435i depth camera was then used to locate the fruit axis at the bottom of banana. The original YOLOv5, Faster R-CNN, and improved YOLOv5 model were trained to verify the model. The accuracy of the improved model increased by 2.8 percentage points, the recall rate reached 100%, and the average accuracy value increased by 0.17 percentage points, compared with the original. There were the 52.96 percentage points higher precision, 17.91 percentage points higher recall, and 21.26 percentage points higher average precision value, compared with the Faster R-CNN model. The size of the improved model was reduced by 1.06MB, compared with the original. The field test was conducted on July 1, 2022 in Dongguan Fruit and Vegetable Research Institute, Guangdong Province, China. A test was realized for the random real-time location of the fruit axis at the bottom of banana in the field environment. The original YOLOv5, Faster R-CNN, and improved YOLOv5 model were used to recognize and localize the single and double plants in the range of 1.0-2.5 m. Each model was tested for 10 times. The estimated and real values were recorded to calculate the mean error, the mean error ratio, and the mean value. The original YOLOv5, Faster R-CNN, and improved YOLOv5 model all performed better to identify the banana in the field of view within the localization range and the estimated values. Among them, the mean errors were 0.085, 0.168, and 0.063 m, respectively, while the mean error ratios were 4.165%, 8.046%, and 2.992%, respectively. The mean values of error and error ratio in the improved model were reduced by 0.105 m, and 5.054 percentage points, respectively, during the original training, compared with the Faster R-CNN model. By contrast, the error and error ratio of the improved YOLOv5 model were reduced by 0.022 m and 1.173 percentage points, respectively, compared with the original. In addition, the measurement error greater than 0.2 m in the test was a locating error. Only test 6 showed the locating errors with the low error rate in the improved YOLOv5 model. The locating errors were found in tests 3 and 4 of the original, while the Faster R-CNN model showed the localization errors in the tests of 1, 4 and 8. Together with the ideal localization, the lower error and higher dimensional accuracy, the improved YOLOv5 model was conducive to the migration application and rapid recognition of bananas in the complex environments. In this case, the vision module of banana picking robot can meet the requirements for the axial locating of the undertaking mechanism at the bottom of banana fruit in the field environment.