Abstract:Abstract: Above-Ground Biomass (AGB) is one of the most important indicators to reflect the status of grassland use. Accurate and rapid monitoring is of great significance to scientific management and rational use. Alternatively, remote sensing technology has been widely used to estimate the AGB in recent years. However, the estimation errors can often be caused by the common phenomenon of "same spectrum, different species" in remote sensing. One of the potential solutions can be to use the spectral and meteorological data to invert the AGB grassland. In this study, a machine learning model was developed to characterize the spectral indices and meteorological data using Landsat 8 remote sensing and ground survey as data sources. A systematic investigation was implemented to explore the performance of regression models constructed by five machine learning algorithms. Specifically, the AGB of grassland was estimated to obtain the high accuracy inversion of remote sensing for the grassland biomass. Nine vegetation indices were selected to calculate in Hulunbuir of Inner Mongolia and Dornod of Mongolia in China. An optimal Random Forest (RF) regression model was then reconstructed by feature selection. The regression validation revealed that a similar overall performance was achieved in the six machine learning models. But the lower performance was found in the spectral data as the input only (Root Mean Square Error (RMSE): 63.852-87.944 g/m2, relative Root Mean Square Error (rRMSE): 33.712%-46.432%, coefficient of determination (R2): 0.388-0.647). Furthermore, the error of all regression decreased gradually, as the number of features increased in the data combination. The model fitting ability increased gradually as well, indicating that the increasing number of features in the different regression models was effectively handled through the fusion of multiple data inputs. The best evaluation was obtained from each regression model in the data combination of spectra + precipitation + temperature. The RF also obtained the best performance (RMSE=51.702 g/m2, rRMSE=27.297%, and R2=0.749). The weights of the multiple source data in the model were determined to assess the relative importance of the input data. The results showed that the precipitation was the most important input feature of the model, with a maximum weight of more than 0.1, much higher than the other spectral data. Three vegetation indices of VARI, MSAVI, and GEMI in the spectral data were weighted more than 0.09 as the features, which was higher than the rest. The more stable performance was achieved in the optimized RF regression model, with a correlation coefficient (R2) of 0.801 between predicted and measured values, an RMSE of 43.709 g/m2, and an rRMSE of 23.077%. The AGB spatial distribution in the study area was lower in the central area, but higher on the east and west sides, with a maximum of 357.2 g/m2 and a minimum of 33.01 g/m2. It was closely related to the spatial heterogeneity of climate and grassland use patterns.