Abstract:Abstract: Detection of carrot defects plays an important role in the sale of carrots. The segmentation and extraction of carrot crack regions have become necessary to automatically evaluate the crack degree of carrots, and further trim the area of the crack. In the traditional detection of carrot external quality, different image processing was designed using the features of different defects, showing high complexity while low robustness. In this study, a deep multi-branch models fusion network (CS-net) was proposed to integrate the recognition of carrot defects and segmentation of crack regions. The network contained two parts: the classification of carrot defects (C-Net), and segmentation extraction of carrot crack regions (S-Net). In C-Net, the ResNet-50 pre-trained on the ImageNet dataset was taken as an image feature extractor of carrot. The output features in the 1st, 10th, 22nd, 40th and 49th layers of ResNet-50 were processed by different pooling methods, including Average Pooling (AVP), Global Average Pooling (GAP), and Spatial Pyramid Pooling (SPP), as well as dimension reduction (principal component analysis, ReliefF). The extracted features were then used as input of Support Vector Machines (SVM) to obtain five classification models. Besides, the five classification models were ensemble with different fusion strategies (hard voting, soft voting and stacking) to obtain the final classification model. In S-Net, the pre-trained ResNet-50 was served as the encoder of segmentation network, and then the network decoder was designed to build the segmentation network of carrot crack regions. The results showed that the output features in the 49th layer of the ResNet-50 with SVM model performed best with the test accuracy of 94.71% among the single model. The fusion model with the stacking ensemble performed best with the accuracy of 98.40%, indicating a better performance in the fusion model than the single model. Different pooling methods had different effects on the performance of the model. In the low-level feature maps, the order of performance for different pooling methods was SPP > AVP > GAP. However, the pooling methods had little impact on the model performance with the high-level semantic features. It was found that dimensionality reduction reduced the number of features and then improved the performance of the model. In the segmentation part, the constructed segmentation network with the U-net construction ideas (Res-U-net) performed best with the Pixel Accuracy (PA), Mean Pixel Accuracy (MPA) and mean intersection over union (MIoU) of 98.31%, 96.05% and 92.81%, respectively. The performance of Res-U-net was not affected by the cracking area and different positions of crack. Comparing with Deeplabv3+, the PA and the MIoU in the Res-U-net were similar to those of Deeplabv3+, while the MPA was better than that of Deeplabv3+, and the model size was only half of that of Deeplabv3+. In addition, the segmentation speed of single image was faster than that of Deeplabv3+. The Res-U-net reached an advanced level in the segmentation task of carrot crack defects. The defect recognition and segmentation network have a positive significance on the quantitative evaluation of carrot external quality and the automatic trim of carrot crack.