Abstract:Deep learning has gradually been one of the most important technologies in the field of agriculture in recent years. However, the problems of labeling quality and cost of training samples for supervised deep learning have become the bottleneck of restricting the development of technology. In order to reduce the cost of deep model training and ensure that the model can have high image segmentation accuracy, in this study, a model named Bounding-box Mask Deep Convolutional Neural Network (BM-DCNN) was proposed to realize automatic training and segmentation for maize plant. First of all, using DJI's Genie 4-RTK drone to collect top images of maize seedlings. The flight uses an automatic take-off planned route, and the entire route covers the entire test field. Second of all, using the open source labeling tool called Labelme to label top images of maize seedlings. The top images of the original maize seedling plants need to be labeled twice. In this study, we used bounding boxes as the basic shapes for weakly supervised labels, and pixels within the bounding boxes area were marked as foreground(i.e. the possible effective pixels of a maize plant). Pixels outside the bounding boxes were marked as background. Finally, the information of bounding boxes was used to generate primary pseudo-labels on the images, and the RGB color model of the images was converted to the HSV(Hue-Saturation-Value) color model, and the full connection condition random field(DenceCRF) was used to eliminate the influence of plant shadow and the image noise on the pseudo-labels accuracy in the images. The pseudo-labels were trained on the optimized YoLact model instead of the ground truth labels. The optimized model can be used for the instance segmentation of the plants at the maize seedling stage. We designed an experiment for verification and testing of BM-DCNN. By comparing the similarity between pseudo-labels mask and ground truth, it found that the mean intersection over union (mIoU) was 81.83% and mean cosine similarity (mcos(ɑ)) was 86.14%, which was higher than the accuracy of pseudo-labels generated by Grabcut(the mIoU was 40.49% and mean cosine similarity was 61.84%). For the maize seedling image (top view), the time cost of three manual annotation methods was calculated, with bounding box labels of 2.5 min/sheet, scirbbles labels of 15.8 min/sheet, and pixel-level labels of 32.4 min/sheet. Considering that the ground truth labels had an error in the handing of maize plant details, the pseudo-labels at the accuracy can be used for deep convolutional neural network training. By comparing the accuracy of instance segmentation between BM-DCNN and fully supervised instance segmentation model, when the IoU value of the BM-DCNN was greater than 0.7(AP70), the instance segmentation accuracy corresponding to the BM-DCNN model was higher than that of the supervised model. The average accuracy of the two backbone networks of the BM-DCNN model were 67.57% and 75.37%, respectively, which were close to the supervised instance segmentation results under the same conditions (67.95% and 78.52%, respectively), and the higher average accuracy can reach 99.44% of the supervised segmentation results. Therefore, For the instance segmentation task of the maize seedling plants images(top view), the instance segmentation effect of BM-DCNN can almost achieve the segmentation effect of the supervised instance segmentation model under the same conditions. It can be seen that in the large-area operation scenario of the UAV, it was feasible to use the bounding box labels of the images to replace the ground truth labels to complete the training of deep learning model, which greatly reduced the time cost of manual labeling of the samples, and provided theoretical support for the rapid realization of the application scenarios, such as the number of plants at the seedling stage of maize and the calculation of canopy coverage.