Abstract:China has a vast grassland area, and grassland animal husbandry is the main form of animal husbandry in grassland areas. Cow manure, as a by-product of grassland animal husbandry, is distributed on the vast grasslands. Cow manure not only has a negative impact on grass growth, but also serves as an important source of energy. The distribution of cow manure in natural grasslands is mainly characterized by scattered areas and concentrated areas. The main method of collecting cow manure in pastoral areas is manual picking. With the maturity of computer vision technology, it is possible to apply it to the collection of cow manure. This article proposes a cow manure image detection model based on improved YOLOv5. Firstly, replacing New CSP-Darknet53 with EfficientFormerV2 improves boundary sensitivity and reduces model complexity. Experimental results have shown that the computational complexity is significantly reduced compared to the original model, with significant advantages. Secondly, improving PANET to BiFPN enhances feature fusion capability and improves detection accuracy. Experimental results have shown an increase in detection accuracy. Once again, replace CIoU Loss with Inner IoU Loss to improve the localization accuracy of bounding box regression. When the ratio value is greater than 1, generate larger auxiliary bounding boxes to accelerate the convergence of the sample. Through comparative experiments with different ratio values, the best effect is achieved when the ratio value is 1.10. From then on, replacing the Bottleneck in the C3 module with an improved FasterNet block and using Leaky Relu instead of Relu as the activation function for the FasterNet block module resulted in significant improvements in accuracy, recall, and average precision. Additionally, the number of parameters and floating-point operations decreased significantly, reducing computational complexity and accelerating inference speed. Finally, add a picking judgment mechanism and use the YOLOv5 detection box to judge the size of the cow manure block and the density of the cow manure group, and based on this, classify the cow manure in a certain area, providing a basis for intelligent conditional picking of cow manure. The accuracy of the improved network model is 92.6%, the recall rate is 87.7%, the average accuracy is 87.4%, the parameter count is 4.02M, and the floating-point operation count is 8.1G. The improved model significantly reduces the parameter count and operation count while ensuring detection accuracy, significantly improving the performance of the model. This study established a dataset of cow manure in pastoral areas, and to improve data diversity, the collected data was randomly combined using five methods: flipping, scaling and translation, motion blur, random occlusion, and brightness change to enhance data augmentation. Multiple network comparison experiments were conducted on the improved model on a self built dataset, and the experimental results showed that the accuracy of the improved model was superior to other models with the same level of parameter and computational complexity. The improved YOLOv5 model can classify cow manure based on its different sizes during the inference stage and distinguish it using different colored bounding boxes. This study can achieve the recognition and localization of cow manure in pastoral areas, which helps to provide technical support for the research of intelligent cow manure picking vehicles.