Abstract:Cow lameness has represented a significant challenge on the economic viability of dairy operations. The overall performance can increase the risk of health issues for affected cows, leading to reduced milk production. Consequently, cow lameness is crucial to maintain both the welfare of the herd and the profitability of dairy farms. Lame cows typically show the observable indicators during walking, such as a lowered head position, pronounced head movement, and an arched back, whereas, healthy cows demonstrate the minimal head movement, straight back, normal gait and body equilibrium. In this study, a deep learning-based algorithm was proposed to automatic detect the lameness in cows, according to these outstanding movement features. A systematic investigation was implemented to detect the cow lameness, thus tracking the movement patterns of six key anatomical points: the head, neck, shoulder, center of the back, loin, and tail. Firstly, two mobile devices were positioned adjacent to the passage, leading to the milking area. The video data was collected for 160 walking sequences from 83 cows. YOLOv8n-seg instance segmentation was employed to accurately identify the cows in the images, and then extract their coordinates and pixel regions. The computational efficiency and accuracy were improved to reduce the effects of light variations in the channel, background barbed wire fence boundaries, and foreground fence occlusion. Secondly, the six types of keypoint detection datasets were constructed after instance segmentation, including RGB images, binary mask images, segmentation images along with their cropped versions, according to the target detection frame. Four backbone networks, MobileNet-V2, ResNet-50, ResNet-101, and ResNet-152, were used to train and test these datasets. Segmented images that cropped by the detection frame were selected as the optimal input format, with ResNet-152 chosen as the best-performing backbone network. Then, the DeepLabCut algorithm was used to automatically extract the coordinates of six key points from the video sequences: the head, neck, shoulder, center of the back, waist, and tail, resulting in the creation of a lameness detection dataset. Lastly, a comparative analysis was performed to evaluate the performance of Temporal Convolutional Network (TCN), Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Bidirectional LSTM, BiLSTM, and FN-BiLSTM models in the claudication detection. (Bidirectional LSTM, BiLSTM), and FN-BiLSTM models in the lameness detection. Ablation experiments were conducted on the FN-BiLSTM model, in order to verify the effects of the Filter and Noise layers on the lameness detection in cows. The results demonstrated that the FN-BiLSTM model was achieved in the optimal performance with 97.16% accuracy, 95.71% precision, and 99.04% recall for the lameness recognition on a test set of 16 videos from 16 cows. Moreover, the instance segmentation model exhibited the high efficacy to capture the image sequences of cows and their whole-body semantic information, even under variable illumination conditions and different bovine-to-camera distances. The precision, recall, and mAP of the test set reached 99.97%, 100%, and 99.5%, respectively. During the keypoint detection phase, the optimal performance was achieved, when utilizing the cropped segmentation maps as input, with ResNet-152 as the backbone network, resulting in the root mean square errors of 2.04 pixels and 4.28 pixels for the training and test sets, respectively. These findings can offer a valuable technical approach for the automated detection of cow lameness in the livestock industry. This finding has the potential to enhance the efficiency and animal welfare of dairy operations, thereby promoting the sustainable development of the livestock.