Abstract:Abstract: Facial expression recognition has widely been used in various life scenarios, such as medicine, criminology, education, and deep learning. Deep learning also makes this technology highly efficient and accurate at present. Much effort has been made to consider the migration of relatively mature facial recognition to animal expressions. The reason was that animals can also express their emotions through facial expressions, according to zoologists. Once the complex emotions expressed by animals can be understood, the incidence of injuries and illnesses can be early monitored in the freedom of animal expressions, thereby maintaining a happy mood for a long time, without hunger, thirst, and worries in a fully guaranteed life. As such, facial expressions can be expected to evaluate animal welfare, due mainly to a comprehensive reflection of physiology, psychology, and behavior of livestock. However, it is difficult to recognize the subtle changes in different areas of facial expressions, particularly for the simple tissue structure of facial muscles in domestic animals. In this study, a Multi-Attention cascaded Long Short Term Memory (MA-LSTM) model was proposed for the recognition of pig facial expression. The specific procedure was as follows: firstly, a simplified multi-task convolution neural network (SMTCNN) was used to detect and then locate the pig face in the frame image, where the influence of the non-pig face region on the recognition performance was removed. Secondly, a multi-attention mechanism was introduced to characterize various feature channels with different visual information and peak response regions. The facial salient regions caused by the changes of facial expression were captured via clustering the regions with similar peak responses. Then the facial salient regions were used to focus on subtle changes in the pig face. Finally, the convolution and attention features were fused and subsequently input into LSTM to classify the data. Data enhancement was performed on the original dataset, thereby obtaining a self-annotated expression dataset of domestic pigs. The expanded datasets were then utilized in the experiments. The experimental results showed that the recognition accuracy of the module with closing the multi-attention mechanism increased by 6.3 percentage points on average, while the misclassification rate was also reduced significantly, compared with the MA-LSTM model. Additionally, the average recognition accuracy of the MA-LSTM model increased by about 32.6, 18.0, 5.9, and 4.4 percentage points, respectively, compared with commonly-used facial video expression recognition. Four types of expressions were classified in visualization, such as anger, happiness, fear, and neutral. Specifically, there was a more obvious variation in the facial area of domestic pigs that was caused by anger and happiness, where the recognition accuracy was higher than others. Nevertheless, the misclassification rate was also higher, due mainly to the fact that the changes of two areas were relatively similar. In any way, the proposed MA-LSTM model was also verified by all the test data in pig face recognition.