Managing water quality in the United States involves sizable benefits and costs. Given substantial public and private welfare at stake, understanding the scientific, social, and policy drivers of water quality is crucial. But understanding the status, trends, and drivers of water pollution may require extraordinary data. This research first assesses strengths and weaknesses of available national‐scale ambient water quality data. Our longer‐term goal is to understand drivers of ambient water quality itself.

We seek to understand the strengths and limitations of available water quality data. A point of departure is the simple observation that the data set is deceptive. For example, although our data set contains nearly 50 million observations on dissolved oxygen (DO) alone, significant gaps in coverage exist across time and space. Our first goal is to understand what types of questions can – and cannot – be answered with existing data. The second goal is to highlight gaps in order to spur more effective, systematic, and statistically representative water quality data collection in the future.

We explore heterogeneity in data availability across officially designated watershed
stations, trends stations, and ad hoc stations. We explore heterogeneity in data availability across source types like rivers/streams, lakes/ponds/reservoirs, and bays/estuaries, as well as across entire rivers or hydrologic basins. Finally we explore socio‐demographic correlates of water quality availability, across space and across time. By examining the record of available data, we are able to separated evidence from inference and suggest improvements to water quality monitoring protocol and policy.