Abstract:Image segmentation has been widely used for the rapid and accurate detection of plants in the various robots of modern agriculture in recent years. However, fully supervised learning cannot obtain the sufficient, effective and low-cost mask labels (manual labeling) as training samples in the segmentation task of plant image instances, due to the diversity of plant species and forms. In this study, an automatic labelling-based instance segmentation network (AutoLNet) was proposed to improve the segmentation accuracy. The weak tags were also used to train the weak supervised deep learning model. Finally, the network model was used for the image segmentation of maize seedling stage. The top view of maize seedling stage was collected by unmanned aerial vehicle (UAV). Data enhancement was then used to improve the sample diversity. A weak label self-generation module was added in front of the backbone network using the weak supervised instance segmentation model. As such, the module was composed of color space conversion, contour tracking, and the minimum peripheral rectangle. The color threshold range of corn plants was firstly set to remove the background area of the image, in order to eliminate the influence of ground shadow and land on the foreground information. The foreground corn plant area was also expanded to remove the small noise points for the binary image with only foreground corn plants. Secondly, the edge detection was carried out on the binary image after threshold segmentation. The contour point set was then set for the foreground corn plants. Finally, the minimum peripheral rectangle of the foreground object was generated automatically in the original image using the coordinates of the contour point set. The final boundary frame was obtained to filter the threshold value. The weak label was generated automatically. The weak tags were used instead of manual tags to participate in network training. The image instance segmentation of maize seedling stage was realized without the manual tags, which was greatly reduced the labor cost that required for data annotation. The test results showed that the distance intersection ratio and cosine similarity between the self-generated and manual tags reached 95.23% and 94.10%, respectively. The quality of the tags was fully met the high requirements of weak supervision training. The average accuracy of AutoLNet's output prediction frame and mask reached 68.69% and 35.07%, respectively. By contrast, the average accuracy of Autolnet's output prediction frame and mask increased by 10.83 and 3.42 percentage points, respectively, compared with the manual label models (DiscoBox and Box2Mask). The average accuracy of the forecast frame increased by 11.28 and 8.79 percentage points, respectively, whereas, that of the mask increased by 12.75 and 10.72 percentage points, respectively. The accuracy of weakly supervised learning was improved to reduce the projection and paired loss during training in the AutoLNet, compared with the fully supervised model (CondInst and Mask R-CNN). The average accuracy of prediction frame and mask in AutoLNet reached 94.32% and 83.14% of the CondInst model, 7.54 and 3.28 percentage points higher than those of prediction frame and mask R-CNN mode. Once the intersection ratio threshold was greater than or equal to 0.5, the segmentation effect of AutoLNet was better than that of the fully supervised model Mask R-CNN, similar to the CondInst. Consequently, the improved AutoLNet can be expect to automatically obtain the corn plant labels in the image using the label self-generation module. Manual labeling process was improved using the label self-generation module. Case segmentation of corn seedling images was realized for the cost saving without manual labeling. The finding can provide the solution and technical support to the high precision and low-cost segmentation task of maize seedling image instance in field environment.