Abstract:Accurate and continuous identification of individual cattle is crucial to precision farming in recent years. It is also the prerequisite to monitor the individual feed intake and feeding time of beef cattle at medium to long distances over different cameras. However, beef cattle can tend to frequently move and change their feeding position during feeding. Furthermore, the great variations in their head direction and complex environments (light, occlusion, and background) can also lead to some difficulties in the recognition, particularly for the bio-similarities among individual cattle. Among them, AlignedReID++ model is characterized by both global and local information for image matching. In particular, the Dynamically Matching Local Information (DMLI) algorithm has been introduced into the local branch to automatically align the horizontal local information. In this research, the AlignedReID++ model was utilized and improved to achieve the better performance in cattle Re-Identification (ReID). Initially, Triplet Attention (TA) modules were integrated into the BottleNecks of ResNet50 Backbone. The feature extraction was then enhanced through cross-dimensional interactions with the minimal computational overhead. Since the TA modules in AlignedReID++ baseline model increased the model size and Floating Point Operations (FLOPs) by 0.005 M and 0.05 G, the rank-1 accuracy and mean Average Precision (mAP) were improved by 1.0 percentage points and 2.94 percentage points, respectively. Specifically, the rank-1 accuracies were outperformed by 0.86 percentage points and 0.12 percentage points, respectively, compared with the Convolution Block Attention Module (CBAM) and Efficient Channel Attention (ECA) modules, although 0.94 percentage points were lower than that of Squeeze-and-Excitation (SE) modules. The mAP metric values were exceeded by 0.22 percentage points, 0.86 percentage points, and 0.12 percentage points, respectively, compared with the SE, CBAM, and ECA modules. Additionally, the Cross-Entropy Loss function was replaced with the CosFace Loss function in the global branch of baseline model. CosFace Loss and Hard Triplet Loss were jointly employed to train the baseline model for the better identification on the similar individuals. AlignedReID++ with CosFace Loss was outperformed the baseline model by 0.24 percentage points and 0.92 percentage points in the rank-1 accuracy and mAP, respectively, whereas, AlignedReID++ with ArcFace Loss was exceeded by 0.36 percentage points and 0.56 percentage points, respectively. The improved model with the TA modules and CosFace Loss was achieved in a rank-1 accuracy of 94.42%, rank-5 accuracy of 98.78%, rank-10 accuracy of 99.34%, mAP of 63.90%, FLOPs of 5.45 G, Frames Per Second (FPS) of 5.64, and model size of 23.78 M. The rank-1 accuracies were exceeded by 1.84 percentage points, 4.72 percentage points, 0.76 percentage points, and 5.36 percentage points, respectively, compared with the baseline model, Part-based Convolutional Baseline (PCB), Multiple Granularity Network (MGN), and Relation-aware Global Attention (RGA), while the mAP metrics were surpassed 6.42 percentage points, 5.86 percentage points, 4.30 percentage points, and 7.38 percentage points, respectively. Meanwhile, the rank-1 accuracy was 0.98 percentage points lower than TransReID, but the mAP metric was exceeded by 3.90 percentage points. Moreover, the FLOPs of improved model were only 0.05 G larger than that of baseline model, while smaller than those of PCB, MGN, RGA, and TransReID by 0.68 G, 6.51 G, 25.4 G, and 16.55 G, respectively. The model size of improved model was 23.78 M, which was smaller than those of the baseline model, PCB, MGN, RGA, and TransReID by 0.03 M, 2.33 M, 45.06 M, 14.53 M, and 62.85 M, respectively. The inference speed of improved model on a CPU was lower than those of PCB, MGN, and baseline model, but higher than TransReID and RGA. The t-SNE feature embedding visualization demonstrated that the global and local features were achieve in the better intra-class compactness and inter-class variability. Therefore, the improved model can be expected to effectively re-identify the beef cattle in natural environments of breeding farm, in order to monitor the individual feed intake and feeding time.