The problem: you have some georeferenced raster data, but you’re only interested in the points that lie inside of a shapefile boundary. This could be a country, state, watershed, etc. — it doesn’t matter, as long as you have a shapefile to define the boundary.
(Aside: georeferenced raster data can come in a number of formats, some are listed here.)
The solution: we’re going to clip the raster data to remove points outside the shapefile boundary, and then plot it with Python.
The ingredients: If you’re following along, we’ll be using some CMIP5 precipitation projections at 5-minute spatial resolution, which you can find here. (In particular, the 2070 projections from the GFDL-CM3 model under the rcp45 emissions scenario, but any of them will do). You’ll also need a shapefile, so if you don’t have one handy you can grab the US states shapefile.
(If you’re interested in watershed boundary shapefiles, here are the latest HUC classifications from USGS. If you grab the WBD_National.zip
file, it contains all of HUC{2,4,6,8,10,12} shapefiles. Be warned, it is large, and you may be able to find your specific watershed elsewhere.)
If we plot the raster data—in this case, projected average August precip in millimeters—on top of the shapefile, we’ll get something like this, where the data covers all of North America:
I’ll put an example of making this plot with Python+Basemap at the end of the post.
To clip the raster with our US shapefile, we’ll be using GDAL, a collection of tools for manipulating spatial data. On Windows, the GDAL binaries and Python wrapper libraries all come bundled with the Python(x,y) installation, which is what I’m using. On *nix, pip install gdal
should achieve the same thing, but you may need to fiddle around with base packages and installation settings.
Anyway, once you have the GDAL binaries, go to your command line and run this:
gdalwarp -cutline path/to/states.shp \
-crop_to_cutline -dstnodata "-9999.0" \
my_raster_file.tif \
my_raster_file_usa.tif
The -dstnodata
option specifies the “nodata” value to use in the destination file, where points outside the shapefile will be masked. If all goes well, you should now have my_raster_file_usa.tif
which you can plot:
Which is the same data, of course, but clipped! You should be able to do this with any state, county, or watershed you want, as long as the resolution of the raster isn’t coarser than the area of the shapefile (in which case you would only be clipping one pixel).
As promised, here’s how to make these plots in Python. The line cmap.set_under('1.0')
is deceptively important, because it sets all the “nodata” values in the file to a white background (remember, we used -9999.0
as the nodata value). Thanks for reading!
Update 3/14/17: Ben asked a good question about how this approach will treat pixels along the edge of the shapefile. Some visual investigation suggests that it (1) removes all pixels that extend beyond the border, and (2) adds a “border” of zero-valued pixels along the edge. It does not seem to aggregate the fractional pixels.
[gist jdherman/7434f7431d1cc0350dbe]