October 2014 – Water Programming: A Collaborative Research Blog

Often when working with raster data, you’ll want to find the value of a variable at a particular point on the globe rather than plotting the entire map. This is especially true in water management applications, where the point of interest might be a city or watershed outlet.

Here is a minimum working example of how to do that. I’m using GDAL to work with the raster files (the prior post describes GDAL in more detail), along with geopy to perform lat/lon lookups for cities. (Geopy is not required for this, it just makes it much easier to find coordinates).

	from __future__ import division
	from osgeo import gdal
	from geopy.geocoders import Nominatim

	def get_value_at_point(rasterfile, pos):
	gdata = gdal.Open(rasterfile)
	gt = gdata.GetGeoTransform()
	data = gdata.ReadAsArray().astype(np.float)
	gdata = None
	x = int((pos[0] - gt[0])/gt[1])
	y = int((pos[1] - gt[3])/gt[5])
	return data[y, x]

	city = 'Ithaca, NY'
	g = Nominatim().geocode(city, timeout=5)
	p = (g.longitude,g.latitude)
	print get_value_at_point('path/to/my/raster/file', p)

view raw raster-point.py hosted with ❤ by GitHub

The get_value_at_point function does all of the work. Note that the indices x and y inside the function are being converted to int, which means there may be some loss of accuracy (i.e. it will round to the nearest pixel). This is most likely not an issue, but is worth remembering. You could easily build on this function to explore values at a single point across multiple raster files (for example, comparing climate projections). Hope this helps!

The problem: you have some georeferenced raster data, but you’re only interested in the points that lie inside of a shapefile boundary. This could be a country, state, watershed, etc. — it doesn’t matter, as long as you have a shapefile to define the boundary.

(Aside: georeferenced raster data can come in a number of formats, some are listed here.)

The solution: we’re going to clip the raster data to remove points outside the shapefile boundary, and then plot it with Python.

The ingredients: If you’re following along, we’ll be using some CMIP5 precipitation projections at 5-minute spatial resolution, which you can find here. (In particular, the 2070 projections from the GFDL-CM3 model under the rcp45 emissions scenario, but any of them will do). You’ll also need a shapefile, so if you don’t have one handy you can grab the US states shapefile.

(If you’re interested in watershed boundary shapefiles, here are the latest HUC classifications from USGS. If you grab the WBD_National.zip file, it contains all of HUC{2,4,6,8,10,12} shapefiles. Be warned, it is large, and you may be able to find your specific watershed elsewhere.)

If we plot the raster data—in this case, projected average August precip in millimeters—on top of the shapefile, we’ll get something like this, where the data covers all of North America:

I’ll put an example of making this plot with Python+Basemap at the end of the post.

To clip the raster with our US shapefile, we’ll be using GDAL, a collection of tools for manipulating spatial data. On Windows, the GDAL binaries and Python wrapper libraries all come bundled with the Python(x,y) installation, which is what I’m using. On *nix, pip install gdal should achieve the same thing, but you may need to fiddle around with base packages and installation settings.

Anyway, once you have the GDAL binaries, go to your command line and run this:

gdalwarp -cutline path/to/states.shp \
-crop_to_cutline -dstnodata "-9999.0" \
my_raster_file.tif \
my_raster_file_usa.tif

The -dstnodata option specifies the “nodata” value to use in the destination file, where points outside the shapefile will be masked. If all goes well, you should now have my_raster_file_usa.tif which you can plot:

Which is the same data, of course, but clipped! You should be able to do this with any state, county, or watershed you want, as long as the resolution of the raster isn’t coarser than the area of the shapefile (in which case you would only be clipping one pixel).

As promised, here’s how to make these plots in Python. The line cmap.set_under('1.0') is deceptively important, because it sets all the “nodata” values in the file to a white background (remember, we used -9999.0 as the nodata value). Thanks for reading!

Update 3/14/17: Ben asked a good question about how this approach will treat pixels along the edge of the shapefile. Some visual investigation suggests that it (1) removes all pixels that extend beyond the border, and (2) adds a “border” of zero-valued pixels along the edge. It does not seem to aggregate the fractional pixels.

[gist jdherman/7434f7431d1cc0350dbe]

Water Programming: A Collaborative Research Blog

Tips and tricks on programming, evolutionary algorithms, and doing research

Month: October 2014

Python – Extract raster data value at a point

Python – Clip raster data with a shapefile