Abstract:Weed control is an inevitably necessary task in field management. Effective recognition of crops and weeds has therefore been an essential basis to promote the development of intelligent weeding equipment. Nevertheless, the recognition targets are not fixed in images except for crops, due mainly to the variety of weeds and random distribution of their positions. It is highly demanded for better recognition performance to detect the crops from all categories of weeds in the image. All weed targets are required to be labeled in the dataset, where there are comprehensive all-inclusive weed species. However, human vision can only identify the target crops from the weeds. The species and quantity of weeds are still lacking in the identification. Moreover, the crops and weeds are usually overlapped in the field images with complex scenes. It is also difficult to accurately segment the boundary of various objects, especially when the generated anchor box was superimposed by a large area in deep overlapping. In this study, a recognition and semantic segmentation of maize at the seeding stage was proposed to identify the weeds on the premise of maize recognition using a dual attention network. Fine segmentation of morphological boundary was obtained. The main contents were as follows. 1) The original architecture of the model was determined to compare 6 state-of-the-art semantic segmentation networks. It was found that the architecture of the dual attention network presented the best performance for the training, validation, and testing dataset, thereby realizing the pixel-wise recognition and segmentation of maize field images. In the validation set, the mean intersection over union (mIoU) and mean pixel accuracy (mPA) at the end of iteration were 92.73% and 96.88%, respectively. In the test set, the mIoU and mPA were 92.8% and 94.66%, respectively, and the speed of segmentation was 15.2 frames/s. 2) The semantic segmentation model of maize at the seeding stage was established using the improved network architecture. The function of the model was a binary classification of maize pixels and all of the other pixels, suitable for the recognition and morphological segmentation of maize in complex field scenes at the seeding stage. The improved backbone was used to enhance the feature representation. More details of features were retained, while the amount of computation was reduced. Recurrent criss-cross and channel attention modules were combined to compose a dual attention mechanism, in order to synchronously construct long-range contextual dependencies in spatial and channel dimensions of the feature map. The discriminability of feature representation was improved significantly. The encoder-decoder structure was used to build the model, and then the auxiliary head was attached to optimize the underlying features. The loss function was improved, while the transfer learning strategy was formulated. 3) The segmentation map of weeds was obtained via image morphological processing on the segmentation map of maize at the seeding stage. The regions of weed were identified by the segmentation of maize, particularly without considering the pixel-wise prediction of the weed region. The results showed that the performance of the model was better than the original network in the whole training process. At the end of the iteration, the mIoU and mPA were 93.98% and 97.48%, increasing 1.35% and 0.62%, respectively, compared with the original network. There was an obvious increase in the accuracy of region segmentation, the accuracy of pixel recognition, and segmentation speed, indicating better comprehensive performance of the model. The mIoU and mPA of the test set were 94.16% and 95.68%, exceeding the baseline by 1.47% and 1.08%, respectively. The speed of segmentation achieved 15.9 frames/s, which was increased 4.61% compared with the original network. The finding can provide a promising reference for the development of intelligent weeding equipment, thereby accurately recognizing and segment the maize and weeds at the seeding stage in complex field scenes.