Abstract:Chili pepper is one of the most widely planted vegetables in China. The current production of fresh chili peppers, such as field management and harvesting, faces the challenges of high labor intensity and low efficiency. The chili pepper industry is ever transitioning towards mechanization and intelligent production. The rapid and accurate detection of chili fruits in the natural environment is of great significance for the automatic picking of chili peppers. However, it is still lacking in the adaptive ability and detection accuracy of the model under different light and occlusion conditions. In this study, an improved detection model, called YOLOX_Pepper, was proposed for chili fruit using YOLOX. Firstly, a fusion-efficient channel CA (coordinate attention) attention mechanism was added to the YOLOX feature fusion network, in order to capture the key features of chili fruits. Secondly, the convolution module in the feature fusion module of the backbone network was replaced with Deformable Convolutional DCNv2 (Deformable ConvNets v2), in order to improve the perceptual ability of the model in the case of the complex geometric features of chili pepper length, width, and aspect ratio, due to branch and fruit occlusion. The experimental results showed that the improved YOLOX_Pepper model had mAP (mean average precision) of 93.30%, which was 3.99, 1.58, 3.19, and 2.84 percentage points higher than that of Faster R-CNN, YOLOv5, YOLOv7, and YOLOX, respectively, with an F1 score of 96%, and an average time for the single-image detection of 0.026s. Under strong light conditions, the mAP of green and red chili fruits of the YOLOX_Pepper model was 69.16% and 89.67%, respectively, and the number of correctly detected green and red peppers was 83 and 304, respectively. Under shadow conditions, the mAP of green and red peppers of the YOLOX_Pepper model was 77.21% and 90.42%, respectively, and the number of green and red peppers was 119 and 255 correctly detected. Under the lack of light conditions, the mAP of the YOLOX_Pepper model for green peppers and red peppers were 77.38% and 75.47%, respectively, and the number of correctly detected green and red peppers were 86 and 311, respectively. The YOLOX_Pepper model performed better in various light conditions, especially in the number and accuracy of detections, compared with the YOLOV5, YOLOV7, and YOLOX models. Under fruit occlusion conditions, the mAP of YOLOX_Pepper was 71.15% and 94.87% for green and red peppers, respectively, and the number of correct detections was 79 and 650 for green and red peppers, respectively. Under branch and foliage occlusion conditions, the mAP of YOLOX_Pepper was 83.98% and 87.10% for green and red peppers, respectively, and the number of correctly detected green and red peppers was 88 and 394, respectively. The improved YOLOX_Pepper model performed better in the chili fruit detection under different occlusions, compared with the YOLOv5, YOLOv7, and YOLOX models. The YOLOX_Pepper model showed excellent performance of detection in complex environments. The effectiveness of the improved module can also provide the intelligent production of chili peppers with reliable technical support.