EnGauge: R Code Repository for Environmental Gauge Data Acquisition, Processing, and Visualization

Introduction and Motivation

Gauge data is an essential component of water systems research projects; however, data acquisition, processing, and exploratory (spatio-temporal) data analysis often consumes a large chunk of limited project research time. I developed the EnGauge GitHub repository to reduce the time required to download, process, and explore streamflow, water quality, and weather station gauge data that are hosted primarily on U.S. government servers. This repository compiles and modifies functions from other Packages for Hydrological Data Retrieval and Statistical Analysis, and develops new functions for processing and exploring the data.

Data Acquisition

Given a polygon shapefile of the region of interest and an optional radial buffer size, the types of gauge data downloaded can include:

  1. USGS streamflow from the NWIS portal
  2. EPA STORET, USGS, USDA and other water quality data via the water quality portal
  3. NOAA ACIS, GHCN weather station data

The USGS R package dataRetrieval and the NOAA rnoaa package contain the primary functions used for data acquisition. Additional references to learn about these packages are available in the EnGauge README file and at the provided web links.

Data Processing

Significant processing is required to use some of these gauge datasets for environmental modeling. The EnGauge repository has functions that may be used to address the following common data processing needs:

  1. Check for duplicate records
  2. Check for zeros and negative values
  3. Check detection limits
  4. Fill date gaps (add NAs to dates missing from timeseries)
  5. Aggregate to daily, monthly, and/or annual timeseries
  6. Project spatial data to a specified coordinate system
  7. Write processed data to shapefiles, .txt files, and lists that can be loaded into other software for further analysis and/or modeling.

Data Visualization and Exploratory Data Analysis – From GitHub Example

This example is applied to the Gwynns Falls watershed in the Baltimore Ecosystem Study Long Term Ecological Research site. The following figures are some of the output from the EnGague USGSdataRetrieval.R script (as of commit 2fc84cd).

  1. Record lengths at each gaugeStremflowGauges_RecordLengths
  2. Locations of sites with zero and/or negative valuesStreamflow_ZerosNegsMap_fn
  3. Locations of sites with different water quality information: total nitrogen and total phosphorus in this exampleTNTPsites
  4. Locations of sites with certain weather station data: maximum temperature in this exampleNOAA_Datatype_TMAX
  5. Visualizing quality codes on timeseriesTP_Timeseries_MDDNR-GWN0115
  6. Summary exploratory spatial data analysis for sitesStreamflowExceedanceTimeseries_Map_01589330 
  7. Summary daily, monthly, annual informationStreamflowEDA_01589330 
  8. Monthly heatmapTNMonthly_MDDNR-GWN0115 
  9. Outlier visualization: currently implements a simplistic global spatio-temporal method defined by flows greater than a selected quantile. Plots offer qualitative support for the flows at other stations on the dates with high outliers at the reference station.Outlier99Quantile_01589320 
  10. DEM vs. Gauge Elevation: If you supply a DEM, the reported gauge elevation can be compared to the DEM elevation within the region of interest (ROI)CompareGaugeElevToDEM_ROI_fn
  11. Seasonal Scatterplot with Histograms: If you have two timeseries of different data types, e.g. streamflow and water quality, a scatterplot by season may be made (not in example code, but a function is available in the repository).TN_ScatterHist01583570 POBR

Concluding Thoughts

This repository can be used to download gauge data from several sources, to employ standard data processing methods across those sources, and to explore the resulting data. Spend less time getting your data ready to do your research, and more time thinking about what your data are telling you and actually using it for modeling. Check out the EnGague repository for your next research project!

Packages for Hydrological Data Retrieval and Statistical Analysis

Packages for Hydrological Data Retrieval and Statistical Analysis

I recently searched for publicly available code to run statistical analyses that are commonly employed to evaluate (eco)hydrology datasets. My interests included data retrieval, data visualization, outlier detection, and regressions applied to streamflow, water quality, weather/climate, and land use data. This post serves as a time snapshot of available code resources.

Package Languages

I focused on packages in R and Python. R seems to have more data analysis and statistical modeling packages, whereas Python seems to have more physical hydrological modeling packages. As a result of my search interests, this post highlights the available R packages. Here are good package lists for both languages:

Hopefully, the time required to prepare your data and learn how to use these functions is less than the time to develop similar functions yourself. Most packages have vignettes (tutorials) that should help to learn the functions, and may also serve as teaching resources. In R, the function browseVignettes(‘packageName’) will open a webpage containing links to vignette PDFs (if available) and source code resources for the specified package. The datasets.load package in R is useful to discover if a specified package has sample datasets available to use.

Example webpage output from browseVignettes(‘smwrQW’):

USGS R Packages

R is currently the primary development language for hydrological statistical analyses within the United States Geological Survey (USGS). There are nearly 100 USGS-R Github repositories, from which I’ve selected a subset that seem to be actively maintained and applicable to the interests listed above. I’m not sure if similar functions as the ones contained within these repositories are available and tested in other programming languages. These USGS-R packages have test functions, which provides a baseline level of confidence in their development. Many packages are contributed or maintained by Laura DeCicco, and by David Lorenz for statistical analysis and water quality.

Selected USGS R Packages

  • dataRetrieval for streamflow and water quality gauge/site data retrieval from the National Water Information System (NWIS) and the Water Quality Portal (WQP). This package has several great vignettes that demonstrate how to use the functions. The package is simple to use, but the data require processing to be useful for research (I’m currently developing a code repository for this purpose).
  • Statistical Methods in Water Resources (smwr). All have vignettes.

smwrBase: Generic hydrology data processing tools

smwrStats: Statistical analysis and probability functions, and a possible companion book by Helsel and Hirsch (2002)

  • WREG: Predictions in ungauged basins w/ OLS, GLS, WLS
  • nsRFA: Regional Frequency Analysis, focusing on the index-value method
  • DVstats: Daily flow statistical analysis and visualization
  • HydroTools: Seems useful based on the function names, but no documentation outside of the functions themselves. Seems to be under development.

 

Additional R Hydrology Packages

  • hydroTSM: Excellent graphics package for time series data. Two examples below of one-line plot functions and summary statistics.
  • fasstr: hydrological time series analysis with a quick-start Cheat Sheet
  • FlowScreen: temporal trends and changepoint analysis
  • hydrostats: Computation of daily streamflow metrics (e.g. low flows, high flows, seasonality)
  • FAdist: Common probability distributions for frequency analysis. These distributions are not available in base R.
  • lmom and lmomRFA: L-moments for common probability distributions, and regional frequency analysis.
  • nhdR: National Hydrography Dataset downloads

 

Water Quality

 

Streamflow & Weather Generators

 

Weather Station Data

  • rnoaa: NOAA station data downloads using R (tip: the meteo_tidy_ghcnd() function provides nicely formated year-month-day datasets)
    • GHCNpy: similar package in Python for GHCN-Daily records. Also has visualization functions.
  • countyweather: US County-level weather data
  • getMet: Seems like a SWAT model companion for weather data
  • meteo: Data Analysis and Visualization

 

Climate Assessment

  • musica: Climate model assessment tools

 

Hydrological Models in R:

  • topmodel: Topmodel for flow modeling
  • swmmr: SWMM model for stormwater
  • swatmodel: SWAT model for ecohydrological studies

 

Land Use / Land Cover Change

  • lulcc: Land Use and Land Cover Change modeling

 

If you know of other good packages, please add them to the comments or write to me so that I can add them to the lists.