Abstract:Abstract: A spectral database system (SDBS) can improve the usage efficiency and expand the application scope of spectra and their feature information, mainly referring to spectral peak information. The spectral matching algorithm (SMA) plays a decisive role in SDBS for the SMA which determines the similarity between the sample spectrum and reference spectrum, and further, decides the accuracy of database query. Traditional full spectral matching algorithms compute the distance or similarity among different spectra with spectral absorbance or reflectance directly, so they are vulnerable to noise. For a higher accuracy of a full spectral matching algorithm, this paper presents a full spectral matching algorithm based on a Jaccard similarity coefficient (JSC). JSC is a useful measure of the overlap that A and B have the same attributes which should either be 0 or 1. In order to satisfy the requirement of JSC, the first derivate of raw spectra should be computed, and a transformation process would transform negative values (of the first-order derivate) to 0 and positive values to 1, where 0 means the raw spectrum is descending in the according small region while 1 means the raw spectrum is ascending in the according small region. Different from common full spectral matching algorithms, the new proposed one calculates the similarity between different spectra with a spectral waveform but not with the absorbance or reflectance directly. Therefore, the influence of absolute absorbance or reflectance intensity was reduced and the influence of the similarity of the spectral waveform was enhanced. This mean that what substances are contained in the sample is more important than the contents of these substances. In this way, the influence of noise and the differences caused by different spectral collecting areas of solid samples was reduced to a quite low level. Comparisons among common full spectral matching algorithms and our new proposed algorithm have been carried out, and the results showed that 94.5% of the samples were correctly classified by our new proposed algorithm (4 varieties of apples, each number was 100) and the second highest classification accuracy was 73% obtained with a Euclidean distance (ED) method. This conclusion indicated that the proposed algorithm was more suitable for the classification of different kinds of samples and it would be helpful to reduce the database query scope, shorten the time consuming, and improve the accuracy of the data query. From the principle of this algorithm, it was obvious that it must be affected by the interval among the data points of the spectra. Thus, the effect of spectral resolution on the proposed algorithm was studied. In total, seven different resolutions (2~128 cm-1) were tested. It is a pity that our new proposed algorithm is sensitive to spectral resolution and the optimal resolution for this algorithm approximately is 8 or 16 cm-1 for apples' near infrared spectra. Therefore, the optimal resolution of this algorithm should be determined at first when it is used for the spectral matching of new objects. In short, our proposed spectral matching algorithm can classify NIR spectra of solid samples with higher accuracy and the application of this algorithm will be helpful in improving the accuracy of a spectral database query.