Abstract:Abstract:Hyperspectral remote sensing image is rich in spectral information which has great application potential in forestry, agriculture, geosciences, and other fields. In order to solve the problem of the small sample, multi-dimension, correlation and nonlinearity, and to improve the accuracy of hyperspectral remote sensing image classification, this study proposed a method of hyperspectral image dimension reduction based on t-distribution stochastic neighbor embedding (t-SNE). Convolutional neural network (CNN) was used to extract features and to classify hyperspectral remote sensing images. The t-SNE used the t distribution instead of the Gaussian distribution and defined a symmetric joint probability distribution based on the original SNE, thus it could simplify the gradient calculation. T-distribution was more sensitive to local features because of its long tail character. Using t-distribution instead of Gaussian distribution ensured that the points mapped from high-dimensional space to low-dimensional space were almost unaffected by spatial changes. It was feasible to make intra-class points aggregated closely, and inter-class points dispersed. Meanwhile, it could use the local features of high-dimensional data and maintain the non-linear features of the original data set. To improve the accuracy of hyperspectral remote sensing classification, a novel method based on manifold learning and CNN was proposed. First, the data points in the original high-dimensional space were mapped into the low-dimensional space. The dimensional reduction scale was important for classification results. In order to find the best dimensional reduction scale, an experiment with dimensions ranging from 5 to 30 was conducted. The scale of the Indian Pines dataset was set at 20, the Pavia Center dataset was set at 16 and the Pavia University dataset was set at 18. Perplexity was another important parameter and it had been set at 30 according to the test. Their topological relations were preserved after dimensional reduction. Second, a CNN with a seven layers network structure was designed. It consisted of two convolution layers, two pooling layers, two full connection layers, and one full connection layer. Two convolution layers and two pooling layers existed alternately, and the end of the network related to a full connection layer. A Softmax function was used as a classifier and the AdaGrad algorithm was used for network optimization. With the progress of the optimization process, the learning rate would be reduced for the variables that had decreased a lot. Rectified linear unit (ReLU) has been used as an activation function. The advantages of the ReLU function are more efficient in gradient descent and backpropagation because it avoids the problem of gradient explosion and gradient disappearance and it simplifies the calculation process and reduces the overall calculation cost of CNN. The hyperspectral remote sensing data after dimension reduction was used as the input layer to extract the deep features on CNN. Finally, the spatial-spectral features of hyperspectral images were classified. The robustness of the proposed algorithm was verified in three open datasets; (i) Indian Pines, (ii) Pavia Center and (iii) Pavia University. The overall accuracy of classification in three data sets had reached 99.05%, 99.43%, and 98.90%. The proposed algorithm showed a better effect on dimension reduction compared with the original CNN. Since t-SNE was more sensitive to local features and considered inter-class differences, remarkable results had been achieved for small ground object samples. Compared with the original CNN, the problem of "salt and pepper noise" in the hyperspectral image was solved effectively and the overall classification accuracy was significantly improved. The method of manifold learning and convolutional neural networks could also provide a new approach for the hyperspectral remote sensing image classification. It was usually difficult to obtain the labeled sample data of the hyperspectral image, while the performance of the deep learning model depended on many mark samples. In future work, we would consider how to construct the classification model under the condition of limited labeled samples to obtain better classification results.