Abstract:Abstract: Intelligent pollination of tomatoes has been widely used in plant factories in modern agriculture in recent years. However, the low detection accuracy cannot fully meet the needs of large-scale production during robotic pollination. The imperfect pollination strategies can also be caused by the small flowers and different posture orientations in the pollination robots. In this study, tomato flower detection and classification were proposed to combine target detection, flowering classification, and posture recognition using deep learning. According to the characteristics of tomato flowers, the tomato flower detection and classification network (TFDC-Net) were also divided into two parts: The target detection of tomato flowers, and the flowering pose classification of flowers. In flower detection, the YOLOv5s network was selected to improve the target detection accuracy. Two improvements were proposed for the network structure: firstly, a Convolutional Block Attention Module (CBAM) was added to enhance the effective features but suppress the invalid ones, and secondly, a Weighted Boxes Fusion (WBF) approach was adopted to fully use the prediction information. The network was then trained using offline data augmentation to obtain the ACW_YOLOv5s model, indicating an accuracy of 0.957, a recall of 0.942, a mAP0.5 of 0.968, and a mAP0.5-0.95 of 0.620, with each index improving by 0.028, 0.004, 0.012, and 0.066, respectively, compared with the original. The actual detection performance of the model was verified for the tomato flowers. The original YOLOv5s model was selected to compare with the recognition of flowers under different complex situations. The tests show that the ACW_YOLOv5s model was used to better treat the missed detection of small distant targets, obscured targets, and false detection of overlapping targets that exist in the original YOLOv5s. At the same time, the better pollination of flowers was realized under the various flowering stages and different stamen orientations. EfficientNetV2 classification network was used to train three flowering stages and five postures of flowers, in order to obtain the flowering stage classification model and posture recognition model, respectively, indicating accuracy of 94.5%, and 86.9%, respectively. Furthermore, 300 flowering and 200 gesture images were selected to further validate the performance of the classification model. The overall accuracies were 97.0%, and 90.5%, respectively, for the flowering and gesture classification models. The TFDC-Net was obtained to integrate the ACW_YOLOv5s target detection, the flowering, and the posture classification model. As such, the detection of tomato flowers and the classification of flowering pose fully met the vision requirements of pollination robots. The TFDC-Net was also applied to the self-developed pollinator robot. It was found that the TFDC-Net was used to implement the target detection, flowering classification, and pose recognition of flowers. The target was then localized using coordinate conversion. The true 3D coordinates of the target were obtained in the coordinate system of robot arms. The feedback was received in the robot arm for the pollination of the target in full bloom with a front attitude. This finding can provide a technical basis for the target detection and localization of pollination robots.