Abstract:Accurate mapping of the soybean planting area is greatly significant to yield estimation, crop-damage warning, and structural adjustments in modern agriculture. But there are only a few reports on the remote sensing technology in soybean identification, particularly in view of the high frequency of cloud cover, diverse types of summer crops, and complex planting structure of fields. In this study, taking Longshan and Qingtuan towns situated in typical soybean producing areas in North Anhui plain as the study area, a hierarchical extraction was proposed to obtain the spatial distribution of soybean planting area in the 2019 growing season. The Sentinel-2 image was acquired at the early pod-setting stage of soybean (August 18, 2019). A series of filtering rules for decision trees were first established to eliminate non-agricultural cover types, such as water, sparse trees, bare soil, and artificial objects (buildings, roads). As such, the overall distribution of field vegetation was obtained. The Sentinel-2 image was then utilized to generate 19 candidate features containing the reflectance of 10 spectral bands with a resolution of less than or equal to 20 m and 9 vegetation indices. ReliefF algorithm was used to evaluate the significance of each candidate feature in typical ground-feature samples. The ReliefF algorithm was combined with three machine learning, including Random Forest (RF), BP Neural Network (BPNN), and Support Vector Machine (SVM). Three models were established, including ReliefF-RF, ReliefF-BPNN, and ReliefF-SVM. The most effective features were screened out for the soybean identification, thereby evaluating the performance of three models in soybean mapping. The UAV images covering six ground samples (each was 1 km×1 km in size) were selected to evaluate the extraction. Results showed that the best performance was achieved in the ReliefF-RF model with the Kappa coefficient ranging from 0.72-0.81, and the overall accuracy of 85.92%-91.91%. The Kappa coefficient of the present model was higher than that of another two models in each ground sample, where 0.69-0.79 and 0.70-0.78 for ReliefF-BPNN and ReliefF-SVM, respectively. The ReliefF-RF was used to single out the near-infrared B8 (842 nm), red-edge normalized difference vegetation index (NDVIre2) that derived from B8 and B6, short-wave infrared B12 (2190 nm), red-edge position (REP), red-edge B6 (740 nm), green B3 (560 nm), and enhanced vegetation index (EVI). It indicated that these seven optimum features were more advantageous than other commonly-used spectral bands and remote-sensing vegetation indices in soybean identification, where the red edge-related variables were particularly highlighted. In addition, the mapping data derived from the optimum features significantly outperformed that generated from the 10 spectral bands. Since the performance of the optimum feature subset was slightly inferior to total 19 features, ReliefF-RF that contained only seven optimum features showed obvious advantages in terms of time and computation cost, as well as data volume. Consequently, the optimum features were more targeted without any inference from the proportion of non-agricultural land cover types, due mainly to the hierarchical extraction focused only on the field vegetation. Better applicability and generalization were gained in theory. The findings can provide a valuable reference for the extraction of soybean areas under complex planting conditions.