Abstract:Abstract: Streamflow (channel runoff) is one of the paramount components in the hydrological cycle from the land to waterbodies. Reliable prediction of monthly streamflow in the long lead time is of great significance for the water resource allocation, flood defense, drought mitigation, and ecological environment. The streamflow over time is closely related to precipitation, temperature, potential evapotranspiration, and antecedent streamflow. Fortunately, vine copulas can easily establish the multivariate distribution function by decomposing multidimensional variables into pair copula constructions. And, the Bayesian Model Averaging (BMA) provides outstanding advantages in multi-model ensemble prediction. In this study, a novel streamflow prediction model was proposed to integrate the multiple vine copula models with BMA, (i.e., Bayesian model averaging ensemble Vine Copula (BVC) model). The monthly streamflow predictions of Tangnaihai, Minhe, Hongqi, and Zheqiao hydrological stations in the upstream of Yellow River basin were selected as four cases. The spatial average of precipitation, temperature, and potential evapotranspiration data were calculated across the watershed controlled by each hydrological station. The precipitation, temperature, potential evapotranspiration, and streamflow in each month were firstly fitted with the best marginal distribution functions from the pool of Normal, Gamma, Weibull, and Log-Normal functions. The vine copulas model was leveraged to couple these variables (incorporated four explainable variables and a predicted variable) under five-dimensional situations. The BMA was then employed to combine the streamflow predictions of these candidate vine copula models to reduce the uncertainties caused by distinct variable ordering of individual vine copula model. Finally, the Random Forest (RF) model and the Long Short-Term Memory neural network (LSTM) model were adopted as two reference models. The results show that the best-fitted marginal distributions for precipitation, temperature, potential evapotranspiration, and streamflow were Gamma, Normal, Weibull, and Log-Normal based on the chi-square test, respectively. The minimum coefficient of the determination (R2) (Nash-Sutcliffe Efficiency coefficient (NSE)) was all above 0.83 (0.78) and the Root Mean Squared Error (RMSE) was all sustained at a lower level for the 1-3-month lead streamflow predictions using the BVC model during the validation period (1963-2006). Compared with the RF model, the BVC model greatly was captured the variations in the monthly streamflow at these hydrological stations, especially for the extreme streamflow. The prediction performances of BVC and RF models were further evaluated by leveraging the precipitation, temperature, potential evapotranspiration, and streamflow time series over the driest and wettest seasons (corresponding to the average lowest and highest streamflow of three consecutive months during 1963-2006, respectively). Among them, the driest season was found in the January-March period at four hydrological stations; the wettest season was in the July-September period at the Tangnaihai and Hongqi hydrological stations, whereas the Minhe and Zheqiao hydrological stations were found in the August-October period. Similarly, in comparison with the RF model, the BVC model yielded a better performance for streamflow predictions with 1-3-month lead times during the driest and wettest seasons, and the minimum R2 (NSE) values all exceeded 0.57 (0.61). Moreover, the BVC model also outperformed the RF and LSTM models for the 1-3-month lead times during the validation period (2007-2016), in terms of R2, NSE, and RMSE. The findings can provide a theoretical framework for streamflow prediction, and can serve as a guidance for water resources management and risk assessment.