Abstract:Digital imaging has widely been used to detect pest diseases for crops in modern agriculture, particularly on deep learning and intelligent computer vision. However, accurate and rapid detection of insect pests in images still remains a great challenge in the crop field. In this study, a task-specified detector was developed to accurately detect vegetable pests of sticky trap images using an attention-driven deep network from saliency maps. Prevailing pest detectors were mainly adopted anchors to detect pests in sticky trap images. Nevertheless, the anchor-based detection accuracy depended mainly on the balance between positives and negatives, as well as the model training, due mainly to the relatively small sizes and distribution of crop insect pests in the sticky trap images. Therefore, a saliency map was established to filter simple background regions. An attention-driven neural network was also selected to better focus on key regions and then accurately detect crop insect pests of sticky trap images. Firstly, saliency maps and threshold-based techniques were employed to construct masks for rough region proposals, according to connected graphs of acquired masks. Secondly, two fully convolutional neural networks were used in a sliding window fashion to produce refined region proposals from rough region proposals, in order to deal with occlusion issues. Thirdly, each refined region proposal was then classified as one target pest category with a convolutional neural network classifier, thereby detecting the bounding boxes of target vegetable pests. Finally, an enhanced non-maximum suppression was utilized to eliminate the bounding boxes of redundant detection, where a target pest was captured by only one detection bounding box. As such, the target pest number was easily obtained to count the bounding boxes of rest detection during automatic management of vegetable insect pests. Furthermore, a piece of specific monitoring equipment was designed to evaluate the vegetable pest detector, where sticky trap images of two vegetable pests were collected, including Plutellaxylostella (Linnaeus) and Bactroceracucuribitae (Coquillett). Several experiments were also conducted on the labeled data set of collected images. The results demonstrate that the vegetable pest detector achieved a mean average precision of 86.40% and an average mean absolute error of 0.111, indicating better performance than the commonly-used pest detectors, such as SSD, R-FCN, CenterNet, Faster R-CNN, and YOLOv4. In addition, two ablation experiments were carried out to verify the attention mechanism of saliency maps and the enhanced non-maximum suppression. It was found that the attention mechanism remarkably contributed to the detection accuracy and the performance of enhanced non-maximum suppression. In the future, both top- and low-level feature maps were required in a convolutional neural network, further enchancing the robustness of the attention mechanism in the vegetable pest detector.