Using arcpy to calculate area-weighted averages of gridded spatial data over political units (Part 2)

Weather Data Processing

This is the second entry in a two-part blog on using arcpy, the Python package for performing ArcGIS spatial analysis. In the first part, I showed how to calculate area-weighted averages of gridded spatial data over political units using ZonalStatisticsAsTable. This method is only applicable to raw data in the form of an ArcGIS raster. Here I will show another method for when the data does not come in this form, like our weather data. Again, for this discussion, I will assume the data is being processed at the woreda level.

The temperature data used in our study was stored in a netCDF file and the precipitation data in several binary files. While the data is not stored in a GIS raster, it is gridded satellite data, so the first step was to create a polygon grid using arcpy and determine which grid cells corresponded to which locations in the raw data file. Next, this grid was intersected with the woreda shapefile to compute the percentage of each woreda’s area covered by each grid cell. Then, area-weighted average temperature and precipitation values could be computed for each woreda.

All of the temperature and precipitation processing can be run from the functions makeTempDBF.py, and processRainData.sh, respectively. The temperature and precipitation data are downloaded using downloadTempData.py and downloadRainData.py. The temperature data is stored in the netCDF file air.mon.mean.nc, which can be read by Python if you have the netCDF4 package.  The precipitation data is stored in separated binary files for each day of the year. The FORTRAN code convert_rfe_bin2asc.f provided by NOAA is used to convert the binary files to human-readable text files. This is done in the shell script with the following commands:

f95 convert_rfe_bin2asc.f -o convert_rfe_bin2asc

BINFILES=$(find . -name all_products.bin\*)

for BINFILE in ${BINFILES}
do
	./convert_rfe_bin2asc ${BINFILE} ${BINFILE}.txt
	rm ${BINFILE}
done

The temperature data is for the entire globe, and the precipitation data for all of Africa, so the first step in the processing is to determine which elements of the temp variable matrix in the netCDF file, and which rows of the daily precipitation text files, correspond to points in Ethiopia. I did this through arcpy with the function intersectGrid.py, shown below.

import arcpy as ap
import os
from convertDBFtoCSV import convertDBFtoCSV

def intersectGrid(AggLevel, workingDir, variable):
    '''Intersects the GHCN temperature data grid with the AggLevel = "Woreda" or "Kebele" shapefile'''

    #create grid shapefile
    Grid = workingDir + "\\All" + variable + "Grid.shp"
    if(os.path.exists(Grid)==False):
        if variable == "Temp":
            origin_coord = "-180 -90"
            nrows = "360"
            ncols = "720"
            polygon_width = "0.5 degrees"
        else:
            origin_coord = "-20.05 -40.05"
            nrows = "801"
            ncols = "751"
            polygon_width = "0.1 degrees"

        polygon_height = polygon_width
        ap.GridIndexFeatures_cartography(Grid, "", "", "", "", polygon_width, polygon_height, origin_coord, nrows, ncols)
        ap.DefineProjection_management(Grid,coor_system="GEOGCS['GCS_WGS_1984',DATUM['D_WGS_1984',\
        SPHEROID['WGS_1984',6378137.0,298.257223563]],PRIMEM['Greenwich',0.0],UNIT['Degree',0.0174532925199433]]")

        #add 3 or 4 fields to grid shapefile: latitude (LAT), longitude (LONG) and
        #for precipitation, row (row) of text file corresponding to each grid in the shapefile;
        #for temperature, row (row) and column (col) of netCDF file corresponding to each grid in the shapefile
        ap.AddField_management(Grid, "LAT", "DOUBLE", 7, 2, "", "", "", "", "")
        ap.AddField_management(Grid, "LONG", "DOUBLE", 7, 2, "", "", "", "", "")
        ap.AddField_management(Grid, "row", "SHORT", 6, "", "", "", "", "", "")
        if variable == "Temp":
            ap.AddField_management(Grid, "col", "SHORT", 5, "", "", "", "", "", "")

        #calculate lat and long fields
        expression1 = "float(!SHAPE.CENTROID!.split()[0])"
        expression2 = "float(!SHAPE.CENTROID!.split()[1])"
        ap.CalculateField_management(Grid, "LONG", expression1, "PYTHON")
        ap.CalculateField_management(Grid, "LAT", expression2, "PYTHON")

        #calculate row and col fields
        if variable == "Temp":
            Grid = calcTempFields(Grid)
        else:
            Grid = calcRainFields(Grid)

    #clip the grid to Ethiopia and convert its .dbf to a .csv for later use
    GridClip = workingDir + "\\" + variable + "GridClip" + AggLevel + ".shp"
    if AggLevel == 'Woreda':
        EthiopiaBorders = os.path.dirname(workingDir) + "\\Shapefiles\\WoredasAdindan.shp"
    elif AggLevel == 'Kebele':
        EthiopiaBorders = os.path.dirname(workingDir) + "\\Shapefiles\\Ethiopia Kebeles without Somali region.shp"

    ap.Clip_analysis(Grid, EthiopiaBorders, GridClip)
    dbf = GridClip[0:-4] + ".dbf"
    GridCSV = convertDBFtoCSV(dbf)

    #intersect the clipped grid with the woreda or kebele shapefile and project to Adindan
    GridIntersect = workingDir + "\\" + variable + AggLevel + "Intersect.shp"
    ap.Intersect_analysis([GridClip, EthiopiaBorders],GridIntersect)
    GridIntersectProject = GridIntersect[0:-4] + "Project.shp"
    ap.Project_management(GridIntersect,GridIntersectProject,out_coor_system="PROJCS['Adindan_UTM_Zone_37N',GEOGCS['GCS_Adindan',\
    DATUM['D_Adindan',SPHEROID['Clarke_1880_RGS',6378249.145,293.465]],PRIMEM['Greenwich',0.0],UNIT['Degree',0.0174532925199433]],\
    PROJECTION['Transverse_Mercator'],PARAMETER['False_Easting',500000.0],PARAMETER['False_Northing',0.0],\
    PARAMETER['Central_Meridian',39.0],PARAMETER['Scale_Factor',0.9996],PARAMETER['Latitude_Of_Origin',0.0],\
    UNIT['Meter',1.0]]",transform_method="Adindan_To_WGS_1984_1",in_coor_system="GEOGCS['GCS_WGS_1984',DATUM['D_WGS_1984',\
    SPHEROID['WGS_1984',6378137.0,298.257223563]],PRIMEM['Greenwich',0.0],UNIT['Degree',0.0174532925199433]]")

    #calculate area of intersection between grid and woreda or kebele shapefile after adding a field to store it
    ap.AddField_management(GridIntersectProject, "PartArea", "DOUBLE", 12, 6, "", "", "", "", "")
    expression = "float(!SHAPE.AREA@SQUAREKILOMETERS!)"
    ap.CalculateField_management(GridIntersectProject, "PartArea", expression, "PYTHON")

    #convert GridIntersect's .dbf to a .csv for later use
    dbf = GridIntersectProject[0:-4] + ".dbf"
    intersectCSV = convertDBFtoCSV(dbf)

    return intersectCSV, GridCSV

def calcTempFields(Grid):
    expression1 = "getCol(!LONG!)"
    code_block1 = """def getCol(long):
    if long > 0:
        return 2*(long+0.25)-1
    elif long < 0:
        return 2*(long-0.25)+720"""
    ap.CalculateField_management(Grid, "col", expression1, "PYTHON", code_block1)

    expression2 = "180-2*(!LAT!-0.25)-1"
    ap.CalculateField_management(Grid, "row", expression2, "PYTHON")

    return Grid

def calcRainFields(Grid):
    expression = "750*(10*(!LAT!+40))+10*(!LONG!+20)"
    ap.CalculateField_management(Grid, "row", expression, "PYTHON")

    return Grid

Lines 11-22 of this function define the location of the grid cell boundaries for the variable of interest (“Temp” or “Rain”). These are used to make a polygon shapefile called AllTempGrid.shp or AllRainGrid.shp in lines 23-25 using the functions ap.GridIndexFeatures_cartography and ap.DefineProjection_management. The first creates the grid, and the second defines the projection. Since the grid was formed from latitude and longitude coordinates, a geographic coordinate system was chosen for the projection.

Confession: I didn’t know what the proper arcpy command was for this projection, so I first did it in the interface. I used “ArcToolbox\Data Management Tools\Projections and Transformations\Define Projection,” chose AllTempGrid as my “Input Dataset or Feature Class” and then selected “Geographic Coordinate System\World\WGS1984” for the Coordinate System. After executing a command in the interface, you can find the corresponding Python code by Opening “Results,” right-clicking on the command and selecting “Copy as Python Snippet.” See screen shots below.

Figure7Figure8Figure9Figure10

After defining the grid’s projections, the function ap.AddField_management is used in lines 30-34 to add columns to the attribute table of this shapefile for the latitude and longitude, and for the temperature grid, row and column of the temp variable matrix in the netCDF file corresponding to that grid point. For the precipitation data, just the row of the text files corresponding to that grid cell is added (all of the data is stored in the same column of the text files, but each row is a different point). The latitude and longitude values are calculated and populated in lines 37-40 using the function ap.CalculateField_management. The row (and for temperature, column) values are calculated in lines 43-46 using the same function, but with conditional statements.

The gridded shapefile is then clipped to the woreda shapefile in lines 49-55 to form TempGridClipWoreda.shp or RainGridClipWoreda.shp using ap.Clip_analysis. Clip_analysis operates on shapefiles, whereas Clip_management (which I used to process the soil data) operates on rasters. The database file of TempGridClipWoreda.shp/RainGridClipWoreda.shp is converted to a csv called GridCSV in lines 56-57 for later use. Finally, ap.Intersect_analysis is used to intersect the clipped shapefiles with the woreda shapefile to form TempWoredaIntersect.shp or RainWoredaIntersect.shp in lines 60-61. The intersections look like this:

Figure11

The last step is to determine the area of intersection between the woredas and the grid cells. This is done by adding a field for the “partial area,” or as I called it on line 71, “PartArea,” using ap.AddField_management. The woreda area given by the field “WorAreaKm2” is in square kilometers, so I wanted to calculate the partial area in square kilometers as well. In order to calculate areas, the projection must be a projected coordinate system, not a geographic coordinate system. The coordinate system of TempWoredaIntersect.shp and RainWoredaIntersect.shp is the same as the geographic coordinate system of the grid, so on line 63 it is converted to the projected coordinate system of the woreda shapefile, Adindan_UTM_ZONE_37N using the function ap.Project_management. This command was again found using the interface. The Output Coordinate System was found under “Projected Coordinate System\UTM\Africa\Adindan UTM Zone 37N.”

Figure12

Once the intersection shapefile is projected, its area is then calculated on lines 72-73. Finally, the database file of the intersected shapefile is converted to a csv in lines 76-77. The intersected shapefile’s csv gives the area of intersection of each grid cell with each woreda. This, along with GridCSV is then used by the function findTempWeightMatrix.py (called on line 26 of makeTempDBF.py) or findRainWeightMatrix.py (called on line 22 of makeRainDBF.py) to determine the percentage of each woreda’s area that is covered by each temperature/precipitation grid cell. These percentages are stored in a matrix called WeightMatrix. The code for makeTempDBF.py is shown below.

import os
import numpy as np
from downloadTempData import downloadTempData
from intersectGrid import intersectGrid
from readTempData import readTempData
from findTempWeightMatrix import findTempWeightMatrix

def makeTempDBF(AggLevel):
    '''Calculates weighted average monthly temperature at AggLevel = "Woreda" or "Kebele"'''

    #Set the working directory
    workingDir = os.getcwd()

    #Download the monthly temperature data from NOAA GHCN CAMS
    DataFile = downloadTempData(workingDir)

    #Intersect the GHCN grid with the Woreda or Kebele shapefile and convert the .dbf to a .csv
    intersectCSV, GridCSV = intersectGrid(AggLevel, workingDir, "Temp")

    #Read in raw monthly temperature data at every GHCN grid cell and store it in the matrix 'allData'
    #dimensions are a x b x c where a = # of gridcells, b = 5 (lat, long, yr, mo, temp),
    #and c = # of months
    allData, gridCells = readTempData(DataFile, GridCSV)

    #Read in weights of each data point to each woreda or kebele
    WeightMatrix, ID = findTempWeightMatrix(AggLevel, intersectCSV, gridCells)
    Year = allData[0,2,:]
    Month = allData[0,3,:]

    #Calculate area-weighted average temperature data in each woreda or kebele
    Temp = np.dot(np.transpose(WeightMatrix),allData[:,4,:])

    #Write the data to 'WoredaTempDBF.csv' or 'KebeleTempDBF.csv'
    writeFiles(AggLevel, Temp, ID, Year, Month) 

    return None

def writeFiles(AggLevel, Temp, ID, Year, Month):
    if AggLevel == 'Kebele':
        f = open('KebeleTempDBF.csv','w')
        f.write('AggLevel,RK_CODE,Year,Month,Temp\n')
    elif AggLevel == 'Woreda':
        f = open('WoredaTempDBF.csv','w')
        f.write('AggLevel,WID,Year,Month,Temp\n')

    for i in range(np.shape(Temp)[0]):
        for j in range(np.shape(Temp)[1]):
            if AggLevel == 'Woreda':
                f.write('W,'+ str(ID[i]) + ',' + str(int(Year[j])) + ',' + str(int(Month[j])) + ',')
            elif AggLevel == 'Kebele':
                f.write('K,'+ str(ID[i]) + ',' + str(int(Year[j])) + ',' + str(int(Month[j])) + ',')
            f.write(str(Temp[i,j]))
            f.write('\n')

    f.close()

    return None

All of the raw data is read in by readTempData.py on line 23 of makeTempDBF.py and stored in a matrix allData. The area-weighted average temperature data can then be calculated from the dot product of allData and WeightMatrix, as shown on line 31. Finally, all of the area-weighted average woreda data is written to WoredaTempDBF.csv on line 34. A snippet of this CSV is shown below. The process is analogous for precipitation.

Figure13

As you can see, data processing with GIS is much easier if the data come in the form of a GIS raster or shapefile like our soil data! If the soil data had been at a lower resolution, though, we probably would have had to use the same method as for the weather data, since the area-weighting is more accurate.

Advertisements

2 thoughts on “Using arcpy to calculate area-weighted averages of gridded spatial data over political units (Part 2)

  1. Pingback: Using arcpy to calculate area-weighted averages of gridded spatial data over political units (Part 1) | Water Programming: A Collaborative Research Blog

  2. Pingback: Water Programming Blog Guide (Part 2) – Water Programming: A Collaborative Research Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s