Abstract:The number of pigs in the shed often varies continuously in large-scale breeding scenes, due to the elimination, sale, and death. It is necessary to count the number of pigs during breeding. At the same time, the health status of the pigs is closely related to their behavior. The abnormal behavior can be predicted in time from the normal behavior of pigs for better economic benefits. Object detection can be expected to detect and count at the same time. The detection can be the basis of behavioral analysis. However, the current detection and counting performance can be confined to the blur cross-domain at the different shooting angles and distances in the complex environment of various pig houses. In this study, a novel model was proposed for pig individual detection and counting using an improved YOLOv5(You Only Look Once Version 5) in the complex cross-domain scenes. The study integrated CBAM (Convolutional Block Attention Module), a module that combined both channel and spatial attention modules, in the backbone network, and integrated the Transformer, a self-attention module, in the backbone network, and replaced CIoU(Complete IoU) Loss by EIoU(Efficient IoU) Loss, and introduced the SAM (Sharpness-Aware Minimization) optimizer and training strategies for multi-scale training, pseudo-label semi-supervised learning, and test set augment. The experimental results showed that these improvements enabled the model to better focus on the important areas in the image, broke the barrier that traditional convolution can only extract adjacent information within the convolution kernel, enhanced the feature extraction ability, and improved the localization accuracy of the model and the adaptability of the model to different object sizes and different pig house environments, thus improving the performance of the model in cross-domain scenes. In order to verify the effectiveness of the above improved methods, this paper used datasets from real scenes. There was cross-domain between these datasets, not only in the background environment, but also in the object size and the aspect ratio of the object itself. Sufficient ablation experiments showed that the improved methods used in this paper were effective. Whether integrating CBAM, integrating Transformer, using EIoU Loss, using SAM optimizer, using multi-scale training, or using a combination of pseudo-label semi-supervised learning and test set augment, the mAP (mean Average Precision) @0.5 values, the mAP@0.5:0.95 values and the MSE (Mean Square Errors) of the model where improved to varying degrees. After integrating all improvement methods, the mAP@0.5 value of the improved model was increased from 87.67% to 98.76%, the mAP@0.5:0.95 value was increased from 58.35% to 68.70%, and the MSE was reduced from 13.26 to 1.44. Compared with the classic Faster RCNN model, the VarifocalNet model for dense object detection and the YOLOX model belong to anchor-free, the detection performance and counting performance of the improved model in this paper had greater advantages regardless of which evaluation metric was chosen, and was still able to maintain a relatively fast speed. The results showed that the improved model in this paper exhibited strong feature extraction and generalization ability, and could still accurately identify most of the objects to be tested even in cross-domain scenes. The above research results demonstrated that the improved method in this paper could significantly improve the object detection effect of the existing model in complex cross-domain scenes and increase the accuracy of object detection and counting, so as to provide technical support for improving the production efficiency of large-scale pig breeding and reducing production costs.