Abstract:Abstract: Soil organic carbon (SOC) is a critical soil property that has profound impact on soil quality and plant growth. It is involved in soil structural formation and atmospheric carbon sequestration. This is especially true in the arid and semi-arid regions. Accurately detecting SOC is an important issue. Traditionally, SOC is limited to laboratory determination using the techniques such as wet or dry combustion, ion sensing electrodes, loss on ignition, or via chemical assays. Yet those traditional approaches often involve expensive testing materials, time-consuming sample preparation and production of excessive environmental pollutants. An approach which can quantify SOC content with time and cost savings is needed. With 140 soil samples acquired from the Ebinur Lake wetland protection area in Xinjiang, China, this research attempts to apply 2 algorithms in hyperspectral data mining, namely, the ant colony optimization - interval partial least squares (ACO-iPLS) and recursive feature elimination - support vector machine (SVM-RFE) to improve the estimation accuracy of SOC content using the visible and near-infrared (VIS/NIR) spectroscopy of soils (350-2500 nm) in laboratory. After convolution smoothing (S-G), 2 common spectra pre-processing methods, namely, first order differential and first order differential of the logarithm of inverse, are applied in the hyperspectral data to extract the feature wavelengths. Results indicate that the feature wavelengths pertaining to SOC mainly are located within 1786-1929 nm with ACO-iPLS and 745-910, 1677, 1755, and 1911-2254 nm with SVM-RFE. With the extracted feature wavelengths, the ensuing models with the same 2 approaches are established with the half of the samples (70 soil samples) as training set and the other half (70 soil samples) as testing set. The results show that the spectra processed with the combination of the S-G and first order with reflectance perform much better than the logarithm of first order differential of the logarithm of inverse after the S-G. Compared to the linear model used commonly, i.e. ACO-iPLS, the nonlinear model SVM-RFE pre-processed with first order differential with reflectance produces the higher estimation accuracy. The root mean square error of cross validation (RMSECV) and the root mean square error of prediction (RMSEP) for the SVM-RFE approach are respectively 0.158% and 0.268% in the training and testing set. The correlation coefficient of cross validation (Rcv) and the correlation coefficient of prediction (Rp) are 0.9687 and 0.9091, respectively. The relative prediction deviation (RPD) of testing set is 2.41. The RMSECV and RMSEP for the ACO-iPLS approach are respectively 0.329% and 0.396% in the training and testing set. The Rcv and Rp are 0.8647 and 0.8297, respectively. The RPD of the testing set is 1.63. The SVM-RFE approach pre-processed with first order differential of the logarithm of inverse produces the higher estimation accuracy than the ACO-iPLS. The RMSECV and RMSEP for the SVM-RFE approach are 0.033% and 0.448%, respectively. The Rcv and Rp are 0.9989 and 0.8111, respectively. The RPD of testing set is 1.44. The RMSECV and RMSEP for the ACO-iPLS approach are 0.496% and 0.586%, respectively. The Rcv and Rp are 0.7293 and 0.586, respectively. The RPD of the testing set is 1.10. Over all, the good performance of the SVM model can be ascribed to its good capability of dealing with non-linear and hierarchical relationship between SOC and feature wavelengths. The results are fairly satisfactory. This practice provides an efficient, low-cost, potentially highly accurate approach to estimate SOC content and hence support better management and protection strategies for desert wetland ecosystems. The next step is to attempt to apply VIS/NIR spectroscopy technique in the field for further research.