Abstract:An accurate identification of the pruning point was proposed for the stem segmentation among the complex background with clustered plants using the improved Mask RCNN. As such, the robot was guided to precisely operate at the pruning point for tomatoes in the greenhouse. The standard requirement was then introduced on the pruning lateral branch of the tomato plant in the greenhouse. The pruning point was determined, where the intersection of center lines between the main stem and lateral branch was taken as a reference. The Mask R-CNN with feature extraction network of ResNet50 was adopted as the target area segmentation model. An images dataset of tomato plant was constructed to consider the growth stage of tomato plant (the growing and productive period of the plant), the view field scale (the close and distant view, upward and front view), and imaging posture. After that, 3 000 images were collected, among which 2 400 ones were set as a training set, A fine-tuning method was adopted to transfer the pre-training model from the MMDetection algorithms library. The various learning rates of 0.02 and 0.002 were used for the head net and backbone net, respectively. A segmentation model was established for the main stem and the lateral branch using the Mask R-CNN, with the loss was tended to convergence with training iteration. Two types of target areas were identified and located to form the plants image. The main stem and lateral branch adjacent to each other were identified as the same plant, according to the distance of the center point. The centerline of the stem was fitted, according to the second central moment feature. The pruning point was located as the point, with an offset distance of the main stem’s radius along the centerline of the lateral branch. The performance of the model was verified to identify and locate the pruning point on the test set with 80 images, excluding both in the train set and validation set. The experimental results showed that the error rate, precision rate, and recall rate on the tomato stem target’s recognition were 0.12, 0.93, and 0.94, respectively. In particular, the error rate on the main stem was less than that on the lateral branch, where the occlusion of leaves was the main reason for the false detection. Besides, the fruit stem in the front view was occasionally identified as the lateral branch. In addition, the upward view images of the plants in the productive period presented a higher accuracy rate, in which fewer leaves and fruit stems existed. In terms of location, the average error of locating pruning points was 0.34 of the diameter of the main stem, and the upward view images from the plant in the production period presented less locating error. As the empirical value of main-stem diameter was set as 15 mm, the average locating error was 5.12 mm, indicating that was easily tolerated by an additional movement of the stem grasper. Furthermore, the location error was also lower for the image of the close-upwards view. The finding can provide promising technical support to the tomato pruning robots.