In recent years, machine learning (ML) models have increasingly been adopted in hydrology for predictions and understanding factors influencing different variables of interest. A particular application where deep learning (DL) has been successful is predicting river flows and other variables in unmonitored regions, a significant challenge in hydrological modeling. Large-scale DL models incorporating data and physical characteristics from thousands of monitored locations have outperformed traditional numerical models for out-of-sample spatiotemporal predictions. However, selecting optimal inputs for ML models remains challenging, especially with hundreds of potentially relevant site characteristics. Moreover, many studies do not effectively use ML ensembles, limiting their accuracy and uncertainty quantification (UQ).
Here, I describe our approaches for using ML to predict stream temperatures in unmonitored basins of the contiguous United States. Since relevant input data are not widely available, our goal was to optimize the trade-off between model complexity and predictability, by developing models that make the most accurate predictions with minimal data requirements. We evaluated different approaches for selecting optimal input data and creating effective ensembles. We examined four ML architectures: long short-term memory (LSTM), gated recurrent unit (GRU), temporal convolution network (TCN), and extreme gradient boosting (XGB). We also evaluated how performance was affected by the choice of inputs. Model performance and spread were quantified using deterministic and probabilistic metrics.
Our results clearly show that ensembles improve the accuracy of predictions compared to deterministic models. Creating ensembles by varying the input data or combining different model architectures were the best approaches for predicting both average and extreme values. However, we also find that it is not always necessary to use all available data, and sometimes simpler models can achieve similar predictive outcomes. Notably, we find XGB has the highest accuracy but lower spread relative to DL architectures. This raises the important question of whether DL is necessary for such hydrological applications.