Abstract:The purpose of this study is to explore the feasibility of using continuous wavelet transform to extract spectral difference information of different vigor desiccated cotton species.A method of filtering wavelet features (WFs) based on correlation and feature importance is proposed to extract the fine structure and complex information of spectral features of seeds with different vigor.Different vigor classes of desiccated cotton seeds were obtained through artificial aging experiments. And its high spectral image was collected.The raw spectra are preprocessed with Savitzky-Golay smoothing, multivariate scattering correction, first-order differentiation, and second-order differentiation.Then, the WFs extracted by wavelet basis functions such as gauss4, mexh and bior6.8 were compared.Spectral wavelet features (SFs) and WFs were downscaled using principal component analysis.Based on machine learning algorithms such as support vector machines (SVM), random forest (RF), extreme learning machines (ELM), and back propagation neural network (BPNN), a seed vigor detection model was developed for SFs principal components and WFs principal components. The accuracy of the seed vigor detection model was compared between SFs principal components and WFs principal components.The fine spectral information in WFs was further extracted based on correlation analysis and random forest feature importance evaluation. Including the 1% | R | -WFs feature set with the correlation with seed vigor at the top 1%, the 1% Importance-WFs feature set with the feature importance at the top 1% in seed vigor recognition, and the 1% | R | + 1% Importance-WFs feature set with the combination of the two, and bring these three WFs feature sets into the above machine learning model.The results showed that: 1) The bior6.8 function extracted better WFs for different vigor desiccated cotton species.Other wavelet basis functions show a clear ringing effect when extracting WFs.2) The modeling accuracy of the WFs principal components is higher than that of the SFs principal components in all machine learning models for each species.The model based on 1% |R|+1% Importance-WFs has the highest accuracy. 3) The optimal models for seed vigor detection of Jinke 21 and Jinke 20 were : 1% | R | + 1% Importance-WFs + ELM. The optimal model of Xinluzao 64 seed vigor detection is : 1% | R | + 1% Importance-WFs + any machine model and PCA-WFs + ELM / BPNN.The accuracies of the training set and test set of the optimal model of Jinke 21 are 99.63% and 98.28%, and the accuracies of the training set and test set of the optimal model of Jinke 20 and Xinluzao 64 are both 100%. The results indicate that the method proposed in this paper based on correlation and feature importance can effectively extract spectral difference information of different vitality dried cottonseeds, providing a new spectral characterization approach for seed vitality hyperspectral detection.