Abstract:Accurate and rapid counting has been one of the most important steps on sea foods in modern aquaculture. Taking the sea foods in a real bottom-sowing aquaculture environment as the research object, this study aims to guarantee the cost-saving, highly efficient and easy counting on the multi-category sea foods using video multi-target tracking. The underwater videos of sea foods were taken as the data source. Firstly, the YOLOv7 algorithm with excellent performance was used to realize the target detector of sea treasure, providing the input for the multi-target tracking. Specifically, the image data was captured under real underwater using the "2017 Underwater Robot Picking Contest, URPC2017", and then used to train the YOLOv7 model. 17,655 images contained the holothurians, echinus, and scallops. The images were randomly divided into the training set of 14055 images in the ratio of 8:1:1, the verification set of 1800 images, and the test set of 1800 images. The training setting of the YOLOv7 detector was set as the image adaptive size of 640*640, the initial learning rate of 0.01, the momentum of 0.9, the weight decay of 0.0005, the batch size of 16, the number of training rounds of 300. One test was performed every 10 rounds of training round. The operating system was selected the Ubuntu18.04 for Haizhenpin detector training, the deep learning framework was PyTorch, the experimental processor was AMD Ryzen Threadripper 1920X 12 cores, and the graphics card was NVIDIA GeForce RTX2080. Secondly, the characteristics of high similarity were combined with the unclear appearance of the same type of sea foods in the real breeding environment. The multi-target tracking of the BYTE algorithm was used for the reference. A multi-category trajectory generation and a counting strategy were then designed for the sea treasure tracking using the trajectory ID number. Finally, an optimal combination of the indicators was also evaluated on the performance of the improved model. The test results show that the average counting precision, mean absolute error (MAE), root mean square error (RMSE), and frame rate were 91.62%, 5.75, 6.38, and 32 frames/s, respectively. All indicators were better than those of the current YOLOv5+DeepSORT, YOLOv7+DeepSORT, YOLOv5+BYTE, and YOLOv7+BYTE. Especially, the average counting accuracy and frame rate index ratio of YOLOv5+DeepSORT were improved by 29.51, and 28 percentage points than before. The MAE and RMSE of the improved model were reduced by 19.50 and 12.08, respectively. The quantity of underwater sea foods was effectively measured in the modern fishery. The finding can provide the technical reference for the production measurement and the scientific decision-making on the intelligent management of aquaculture. In addition, the underwater dataset was used to train the detection model under the same environment, in order to reduce the false detections caused by different conditions. Nevertheless, the underwater environment changes greatly, due to various influencing factors, such as lighting. Statistical counting was then performed to collect the video of marine treasures under the same environment every time during training. Therefore, the current model can be expected to improve on the different underwater environments in the future.