Python – Clip raster data with a shapefile

The problem: you have some georeferenced raster data, but you’re only interested in the points that lie inside of a shapefile boundary. This could be a country, state, watershed, etc. — it doesn’t matter, as long as you have a shapefile to define the boundary.

(Aside: georeferenced raster data can come in a number of formats, some are listed here.)

The solution: we’re going to clip the raster data to remove points outside the shapefile boundary, and then plot it with Python.

The ingredients: If you’re following along, we’ll be using some CMIP5 precipitation projections at 5-minute spatial resolution, which you can find here. (In particular, the 2070 projections from the GFDL-CM3 model under the rcp45 emissions scenario, but any of them will do). You’ll also need a shapefile, so if you don’t have one handy you can grab the US states shapefile.

(If you’re interested in watershed boundary shapefiles, here are the latest HUC classifications from USGS. If you grab the WBD_National.zip file, it contains all of HUC{2,4,6,8,10,12} shapefiles. Be warned, it is large, and you may be able to find your specific watershed elsewhere.)

If we plot the raster data—in this case, projected average August precip in millimeters—on top of the shapefile, we’ll get something like this, where the data covers all of North America:

figure_2

I’ll put an example of making this plot with Python+Basemap at the end of the post.

To clip the raster with our US shapefile, we’ll be using GDAL, a collection of tools for manipulating spatial data. On Windows, the GDAL binaries and Python wrapper libraries all come bundled with the Python(x,y) installation, which is what I’m using. On *nix, pip install gdal should achieve the same thing, but you may need to fiddle around with base packages and installation settings.

Anyway, once you have the GDAL binaries, go to your command line and run this:

gdalwarp -cutline path/to/states.shp \
-crop_to_cutline -dstnodata "-9999.0" \
my_raster_file.tif \
my_raster_file_usa.tif

The -dstnodata option specifies the “nodata” value to use in the destination file, where points outside the shapefile will be masked. If all goes well, you should now have my_raster_file_usa.tif which you can plot:

figure_1

Which is the same data, of course, but clipped! You should be able to do this with any state, county, or watershed you want, as long as the resolution of the raster isn’t coarser than the area of the shapefile (in which case you would only be clipping one pixel).

As promised, here’s how to make these plots in Python. The line cmap.set_under('1.0') is deceptively important, because it sets all the “nodata” values in the file to a white background (remember, we used -9999.0 as the nodata value). Thanks for reading!

Update 3/14/17: Ben asked a good question about how this approach will treat pixels along the edge of the shapefile. Some visual investigation suggests that it (1) removes all pixels that extend beyond the border, and (2) adds a “border” of zero-valued pixels along the edge. It does not seem to aggregate the fractional pixels.

[gist jdherman/7434f7431d1cc0350dbe]
Advertisements

14 thoughts on “Python – Clip raster data with a shapefile

  1. This is very useful, thanks Jon. Is there a way to call the GDAL binary from the python script? Maybe using the SUBPROCESS module (or OS.SYSTEM call)? I have not used GDAL before. Looks like I should come up to speed on it.

    Steve

  2. Hi Steve, thanks for reading. Sure, you can call gdalwarp from inside python. Here is an example of doing the clipping with os.system(‘…’):
    http://gis.stackexchange.com/questions/75086/how-to-for-loop-a-folder-to-batch-clip-rasters-by-polygon-using-python-and-qgis?rq=1

    Unfortunately I don’t think there’s a way to do it without a subprocess, I don’t see any functions for gdalwarp in the python wrapper.

    In the past I’ve tried to manipulate binary spatial data manually, which is just a terrible idea. I’m no expert but GDAL is well worth the few minutes of setup time.

  3. Pingback: Python – Extract raster data value at a point | Water Programming: A Collaborative Research Blog

  4. Thanks for the very informative post, Jon. I have a question–I attempted something similar and got the error message below (running on mac/linux), to which I wonder if there were any special installation options needed to get the cutline ogr functionality.

    ERROR 1: Request to load a cutline failed, this build does not support OGR features.

    Thanks

    Ben

  5. Hi Ben, thanks for trying it out. This is a really cryptic error message that makes me think some part of the library didn’t install properly. Did you build from source or install from a package manager? I can’t remember if I tried this on OSX or not.

    For lack of better advice, I’d say try reinstalling GDAL. You can test whether ogr is enabled using `gdal-config –ogr-enabled`. (http://www.gdal.org/gdal-config.html).

    • Hi Jon,

      After a long hiatus, another question–Can you comment on whether this method would aggregate the fractional raster pixels that lie on the boundary of the shapefile?

      Thanks!

      Ben

      • Hi Ben, see the update to the end of the post above (can’t add images to comments).

        This behavior seems unintuitive. I don’t know much more about it other than zooming in on the map, though 🙂

  6. Hi Arnau,

    You don’t need Python to do the cropping. I only mentioned that because installing Python(x,y) was how I got the GDAL library in the first place. But you can install GDAL separately.

    The command line is your regular system terminal. This is the program called “terminal” on Mac or Linux, or “cmd” on Windows. (No guarantees that this will work on Windows, though).

    • Great work ,this is one of the very few desirable solutions on the whole Internet to the problems of Python GIS clipping.Thank you for your work.By the way,is it OK to share the raw codes?Sorry that I did not find it if you have already posted your codes.Looking in eager for your reply.THX again

      • Hi Majiali, sure thing, share away! The actual cropping is done on the command line with gdalwarp, but if you want the python code for plotting it’s here:

        This link was originally included in the article but it looks like wordpress changed its “gist” feature.

  7. Hi,
    Thank you very much for your help.
    I tried it and I got an error message : cannot compute bounding box of cutline
    Do you have any idea why is it this way ?

    I’m working on Linux with a netcdf file over France

    Thank you a lot

  8. Pingback: Water Programming Blog Guide (Part 2) – Water Programming: A Collaborative Research Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s