Introduction and Motivation
Gauge data is an essential component of water systems research projects; however, data acquisition, processing, and exploratory (spatio-temporal) data analysis often consumes a large chunk of limited project research time. I developed the EnGauge GitHub repository to reduce the time required to download, process, and explore streamflow, water quality, and weather station gauge data that are hosted primarily on U.S. government servers. This repository compiles and modifies functions from other Packages for Hydrological Data Retrieval and Statistical Analysis, and develops new functions for processing and exploring the data.
Data Acquisition
Given a polygon shapefile of the region of interest and an optional radial buffer size, the types of gauge data downloaded can include:
- USGS streamflow from the NWIS portal
- EPA STORET, USGS, USDA and other water quality data via the water quality portal
- NOAA ACIS, GHCN weather station data
The USGS R package dataRetrieval and the NOAA rnoaa package contain the primary functions used for data acquisition. Additional references to learn about these packages are available in the EnGauge README file and at the provided web links.
Data Processing
Significant processing is required to use some of these gauge datasets for environmental modeling. The EnGauge repository has functions that may be used to address the following common data processing needs:
- Check for duplicate records
- Check for zeros and negative values
- Check detection limits
- Fill date gaps (add NAs to dates missing from timeseries)
- Aggregate to daily, monthly, and/or annual timeseries
- Project spatial data to a specified coordinate system
- Write processed data to shapefiles, .txt files, and lists that can be loaded into other software for further analysis and/or modeling.
Data Visualization and Exploratory Data Analysis – From GitHub Example
This example is applied to the Gwynns Falls watershed in the Baltimore Ecosystem Study Long Term Ecological Research site. The following figures are some of the output from the EnGague USGSdataRetrieval.R script (as of commit 2fc84cd).
- Record lengths at each gauge
- Locations of sites with zero and/or negative values
- Locations of sites with different water quality information: total nitrogen and total phosphorus in this example
- Locations of sites with certain weather station data: maximum temperature in this example
- Visualizing quality codes on timeseries
- Summary exploratory spatial data analysis for sites
- Summary daily, monthly, annual information
- Monthly heatmap
- Outlier visualization: currently implements a simplistic global spatio-temporal method defined by flows greater than a selected quantile. Plots offer qualitative support for the flows at other stations on the dates with high outliers at the reference station.
- DEM vs. Gauge Elevation: If you supply a DEM, the reported gauge elevation can be compared to the DEM elevation within the region of interest (ROI)
- Seasonal Scatterplot with Histograms: If you have two timeseries of different data types, e.g. streamflow and water quality, a scatterplot by season may be made (not in example code, but a function is available in the repository).
Concluding Thoughts
This repository can be used to download gauge data from several sources, to employ standard data processing methods across those sources, and to explore the resulting data. Spend less time getting your data ready to do your research, and more time thinking about what your data are telling you and actually using it for modeling. Check out the EnGague repository for your next research project!