Abstract:Abstract: Dissolved oxygen plays a vital role in water management as it is an important factor that determines the growth status of the fish. Either inadequate or excessive level of dissolved oxygen will be harmful to the survivability of the fish in their respective habitats. The accurate analysis of the data collected from the aquaculture ponds and the prediction for the anticipated level of dissolved oxygen are helpful for both water quality management and aquaculture production. Current studies reveal and understand the complex features of the water quality process mainly from the perspective of mathematical statistics. However, they cannot analyze the effects of changes in the environment on water quality, and cannot do well in dissolved oxygen prediction under the changing environment either. This paper proposed a new strategy to predict dissolved oxygen based on K-means clustering and ELM (extreme learning machine) neural networks. As the curves of similar days showed high correlation of dissolved oxygen, the history samples were divided into several classes to optimize sample space and improve prediction accuracy. After data normalization, the weights of the environmental factors on the dissolved oxygen were determined by Pearson correlation coefficient. The similarity statistics of similar days were improved and defined, which overcame the limitation of Euclidean distance and cosine calculation method. According to the similarity statistics, K-means clustering method was employed to divide the historical samples into several clusters with different daily samples. When the most similar cluster to the forecasting day was identified, the way could reduce the interference between samples and mine the inherent law of the dissolved oxygen data. Then, the ELM neural network of the identified cluster was constructed with the training samples and test data set, and the future amount of dissolved oxygen was predicted with the similar sample set and the real-time environmental factors of the forecasting day as the input data. A total of 23 424 data records of the aquaculture ponds in Wujin, Changzhou, China, were collected and used in the experiments. Taking 5 clusters as the example, ELM neural network was compared with other traditional BP (back propagation) neural networks and SVM (support vector machine). Its prediction accuracy was acceptable, and the running time was only 0.1 s, while that of BP neural network was 10.25 s and that of SVM was slower. It is visible ELM prediction network has a great advantage. Additionally, the caculation speed and prediction efficiency of the model are better than others in terms of the root mean square error (RMSE) and the mean absolute percentage error (MAPE). Experiment results showed that MAPE and RMSE of our prediction method reached 1.4% and 10.8% respectively under normal climate condition. In case of a sudden change of weather, the MAPE and RMSE were 2.6% and 11.6%, respectively. It has higher forecasting accuracy and faster computation speed, which is beneficial to water quality control in aquaculture.