Networks on maps: exploring spatial connections using NetworkX and Basemap

This blogpost is about generating network graphs interlaid on spatial maps. I’ll be using the data provided by this paper (in the supplementary material) which estimates flows of food across US counties. All the code I’m using here can be found here.

The dataset included in erl_14_8_084011_sd_3.csv of the supplementary material lists the tons of food transported per food category, using the standard classification of transported goods (SCTG) food categories included in the study. The last two columns, ori and des, indicate the origin and destination counties of each flow, using FIPS codes.

To draw the network nodes (the counties) in their geographic locations I had to identify lat and lon coordinates for each county using its FIPS code, which can be found here 1.

Now, let’s these connections in Python, using NetworkX and Basemap. The entire script is here, I’ll just be showing the important snippets below. In the paper, they limit the visualization to the largest 5% of food flows, which I can confirm is necessary otherwise the figure would be unreadable. We first load the data using pandas (or other package that reads csv files), identify the 95th percentile and restrict the data to only those 5% largest flows.

data = pd.read_csv('erl_14_8_084011_sd_3.csv')
threshold = np.percentile(data['total'], 95)
data = data.loc[(data['total'] > threshold)]

Using NetworkX, we can directly create a network out of these data. The most important things I need to define are the dataframe column that lists my source nodes, the column that lists my destination nodes and which attribute makes up my network edges (the connections between nodes), in this case the total food flows.

G = nx.from_pandas_edgelist(df=data, source='ori', target='des', edge_attr='total',create_using = nx.DiGraph())

Drawing this network without the spatial information attached (using the standard nx.draw(G)) looks something like below, which does hold some information about the structure of this network, but misses the spatial information we know to be associated with those nodes (counties).

To associate the spatial information with those nodes, we’ll employ Basemap to create a map and use its projection to convert the lat and lon values of each county to x and y positions for our matplotlib figure. When those positions are estimated and stored in the pos dictionary, I then draw the network using the specific positions. I finally also draw country and state lines. You’ll notice that I didn’t draw the entire network but only the edges (nx.draw_networkx_edges) in an effort to replicate the style of the figure from the original paper and to declutter the figure.

plt.figure(figsize = (12,8))
m = Basemap(projection='merc',llcrnrlon=-160,llcrnrlat=15,urcrnrlon=-60,
urcrnrlat=50, lat_ts=0, resolution='l',suppress_ticks=True)
mx, my = m(pos_data['lon'].values, pos_data['lat'].values)
pos = {}
for count, elem in enumerate(pos_data['nodes']):
     pos[elem] = (mx[count], my[count])
nx.draw_networkx_edges(G, pos = pos, edge_color='blue', alpha=0.1, arrows = False)
m.drawcountries(linewidth = 2)
m.drawstates(linewidth = 0.2)
m.drawcoastlines(linewidth=2)
plt.tight_layout()
plt.savefig("map.png", dpi = 300)
plt.show()

The resulting figure is the following, corresponding to Fig. 5B from the original paper.

I was also interested in replicating some of the analysis done in the paper, using NetworkX, to identify the counties most critical to the structure of the food flow network. Using the entire network now (not just the top 5% of flows) we can use NetworkX functions to calculate each node’s degree and between-ness centrality. The degree indicates the number of nodes a node is connected to, between-ness centrality is an indicator of the fraction of shortest paths between two nodes that pass through a specific node. These are network metrics that are unrelated to the physical distance between two counties and can be used (along with several other metrics) to make inferences about the importance and the position of a specific node in a network. We can calculate them in NetworkX as shown below and plot them using simple pyplot commands:

connectivity = list(G.degree())
connectivity_values = [n[1] for n in connectivity]
centrality = nx.betweenness_centrality(G).values()

plt.figure(figsize = (12,8))
plt.plot(centrality, connectivity_values,'ro')
plt.xlabel('Node centrality', fontsize='large')
plt.ylabel('Node connectivity', fontsize='large')
plt.savefig("node_connectivity.png", dpi = 300)
plt.show()

The resulting figure is shown below, matching the equivalent Fig. 6 of the original paper. As the authors point out, there are some counties in this network, those with high connectivity and high centrality, that are most critical to its structure: San Berndardino, CA; Riverside, CA; Los Angeles, CA; Shelby, TN; San Joaquin, CA; Maricopa, AZ; San Diego, CA; Harris, TX; and Fresno, CA.

1 – If you are interested in how this is done, I used the National Counties Gazetteer file from the US Census Bureau and looked up each code to get its lat and lon.

Everything you want to know about subplots in Python’s Matplotlib

Sometimes its helpful to get back to basics. I recently created a summer course on data visualization with Python, and the experience made me realize that the workings of Python’s main visualization library, Matplotlib, are often left of out formal Python courses. Over the course of my graduate career I’ve taken several courses on programming and Python, and each time visualization or plotting was viewed as an afterthought. I don’t think my experience is uncommon, creating basic visualizations in Python is fairly straight forward, and a quick Google search will yield tutorials on how to make most common plot types. While constructing the summer course, I realized how much time I could have saved if I had learned how Matplotlib worked earlier in my PhD. With this in mind, I’m writing this post to serve as a guide for those new to plotting with Matplotlib, and to help fill in some gaps for those who are already experienced Matplotlib users.

I’ve chosen to devote this post to making subplots, a task often necessary for scientific visualizations that can be surprisingly frustrating if you don’t understand Matplotlib’s structure. Since Python is an open source language, there are multiple ways of creating and working with subplots, so in this post I’ll outline a few ways that work for me, and provide some context about how things work behind the scenes in Matplotlib.

A brief introduction to Matplotlib’s object oriented syntax

Let’s start with some history. As it’s name suggests, Matplotlib was originally created as a means of replicating Matlab style plotting functionality in Python. As such, one way we can create visualizations using Matplotlib is through “Matlab style” syntax which is contained within Matplotlib’s flagship module, Pyplot. For example, to create a simple line plot we can use the following code:

import matplotlib.pyplot as plt
import numpy as np
x = np.arange(0,10)
y = np.arange(0, 10)

plt.plot(x,y)

Which will generate this plot:

While the Matlab style syntax is easy to use, it is actually quite limiting in its ability to create custom visualizations. Luckily, the Matlab style syntax is simply hiding an object oriented code structure that is at the core of Matplotlib. By directly accessing and working with this object oriented structure we can create highly customized visualizations.

For the purposes of this blog, there are two Matplotlib objects that are important: Figure objects and Axes objects. Figure objects represent the containers that hold a visualization. I think of this as the blank canvas that the plots will be generated on. Figure objects can contain one or multiple Axes objects, each representing an individual chart (this can be a big source of confusion when learning Matplotlib, the term “axes object” is inherited from Matlab and refers to an entire chart rather than a x or y axis) . To create the plot above using Matplotlib’s object oriented syntax we need to make a few small modifications to the original code. First, we’ll generate a Figure object using Pyplot which will automatically create an axes object behind the scenes. Next we access this Axes object using the Figure object’s “gca” method, which stands for “get current axes”. Finally, we’ll use the Axes object to generate the plot:

fig = plt.figure()
ax = plt.gca()
ax.plot(x,y)

This will make the identical plot as above:

Note that we can use the Axes object to create any kind of plot that can be created with the Matlab style syntax. Examples include ax.plot (line plot), ax.scatter (scatter plot), ax.bar (bar plot), ax.imshow (heatmaps and color mesh) etc.

Creating Basic subplots

Now that we have some insight into the object oriented structure of Matplotlib, we can start making some subplots. There are two main ways we can make subplots with Matplotlib, the first is to use Pyplot’s “subplots” function, which allows us to specify the number of rows and columns of plots in your figure. This function will return both a Figure object and a list of Axes objects. Note that the list of Axes objects is arranged in the same way as the subplots are within the figure (i.e. list entry [0,0] is the top left subplot):

fig, [[ax0, ax1],[ax2, ax3]] = plt.subplots(nrows=2, ncols=2)
ax0.text(0.5, 0.5, "This is axes object 0", ha='center')
ax1.text(0.5, 0.5, "This is axes object 1", ha='center')
ax2.text(0.5, 0.5, "This is axes object 2", ha='center')
ax3.text(0.5, 0.5, "This is axes object 3", ha='center')

Which will make the following set of subplots:

Instead of specifying individual names of each axes, we can alternatively store them in a single named list and axes them via indices like this:

fig, axes = plt.subplots(nrows=2, ncols=2)
axes[0,0].text(0.5, 0.5, "This is axes object 0", ha='center')
axes[0,1].text(0.5, 0.5, "This is axes object 1", ha='center')
axes[1,0].text(0.5, 0.5, "This is axes object 2", ha='center')
axes[1,1].text(0.5, 0.5, "This is axes object 3", ha='center')

An alternative way to create subplots is to first make the Figure object on its own, and then add subplots one at a time using the method “add_subplot” from the figure object. This method takes one argument, a three digit number. The first digit represents the number of rows, the second represents the number of columns and the third represents the placement of the individual subplot (the location from left to right, top to bottom, with the first placement being the top left and the last being the lower right. Oddly, this is 1 indexed unlike everything else in the entire Python language):

fig = plt.figure()
ax0 =fig.add_subplot(221)
ax0.text(0.5, 0.5, "This is axes object 0", ha='center')
ax1 =fig.add_subplot(222)
ax1.text(0.5, 0.5, "This is axes object 1", ha='center')
ax2 =fig.add_subplot(223)
ax2.text(0.5, 0.5, "This is axes object 2", ha='center')
ax3 =fig.add_subplot(224)
ax3.text(0.5, 0.5, "This is axes object 3", ha='center')

This script will make the identical set of subplots as shown above:

Giving your plots some room to breath

You may have noticed that the plots above are very cramped. By default, there is very little room between adjacent subplots. One remedy is to use Matplotlib’s tight_layout function, which will automatically fit your subplots into the figure. Just add this one line after creating your subplots. For demonstration, I’ll also add titles and axes labels to each subplot:

fig, [[ax0, ax1],[ax2, ax3]] = plt.subplots(nrows=2, ncols=2)
ax0.text(0.5, 0.5, "This is axes object 0", ha='center')
ax0.set_title('Title 0')
ax0.set_xlabel('X-axis')
ax0.set_ylabel('Y-axis')
ax1.text(0.5, 0.5, "This is axes object 1", ha='center')
ax1.set_title('Title 1')
ax1.set_xlabel('X-axis')
ax1.set_ylabel('Y-axis')
ax2.text(0.5, 0.5, "This is axes object 2", ha='center')
ax2.set_title('Title 2')
ax2.set_xlabel('X-axis')
ax2.set_ylabel('Y-axis')
ax3.text(0.5, 0.5, "This is axes object 3", ha='center')
ax3.set_title('Title 3')
ax3.set_xlabel('X-axis')
ax3.set_ylabel('Y-axis')
plt.tight_layout()
plt.savefig('tight_layout.png')

This will create the following set of plots:

If we want to add some extra padding between subplots, we can add some arguments to override the default tight_layout parameters. The argument “pad” adds padding between subplots and the figure boarders, while “h_pad” and “w_pad” add height and width padding between subplots. The units of this padding are in percentages of the default font size.

plt.tight_layout(pad=1.5, h_pad=2.5, w_pad=2)

Note that to add the necessary padding, tight_layout will make the subplots themselves smaller and smaller. To fix this, we can increase the size of the figure object using the “figsize” argument when we create the Figure object. The units of this function are in inches:

fig, [[ax0, ax1],[ax2, ax3]] = plt.subplots(nrows=2, ncols=2, figsize=(8,6))

Before moving on, I should note that the function: plt.subplots_adjust() has very similar functionality to tight_layout and can allow you to adjust left, right, bottom and top paddings individually. For the sake of brevity I’m omitting it here.

Getting fancy: creating subplots of different sizes

So far we’ve been creating subplots of uniform size. In practice however, it can be helpful to generate plots of varying sizes. We can do this using the Gridspec class. Gridspec will create a grid of subplot locations within a Figure object. When generating subplots, we can assign each a location and size in Gridspec coordinates. Importantly, a single subplot can span multiple rows or columns of Gridspec coordinates. Below, I’ll make a 2×3 Gridspec and use it to create different sized subplots.

fig = plt.figure(figsize=(8,6))
gspec = fig.add_gridspec(nrows=2, ncols=3)

# the first subplot will span one row and two columns
# it will start at the top left
ax0 = fig.add_subplot(gspec[0,:2])
ax0.text(0.5, 0.5, "This is axes object 0, \ngspec coordinates [0,:2]", ha='center')

# the second subplot will span one row and one column
# it will start at the bottom left
ax1 = fig.add_subplot(gspec[1,0])
ax1.text(0.5, 0.5, "This is axes object 1, \ngspec coordinates [1,0]", ha='center')

# the third subplot will span one row and one column
# it will start at the bottom middle
ax2 = fig.add_subplot(gspec[1,1])
ax2.text(0.5, 0.5, "This is axes object 2, \ngspec coordinates [1,1]", ha='center')

# the fourth subplot will span two rows and one column
# it will start at the top right
ax3 = fig.add_subplot(gspec[:,2])
ax3.text(0.5, 0.5, "This is axes object 3, \ngspec coordinates [:,2]", ha='center')

plt.tight_layout()

Which will make this plot:

We can also customize the height and width of each Gridspec coordinate using the arguments “height_ratios” and “width_ratios” respectively when creating the Gridspec object.

fig = plt.figure(figsize=(8,6))
# parameters to specify the width and height ratios between rows and columns
widths= [1, 1.5, 2]
heights = [1, .5]

gspec = fig.add_gridspec(ncols=3, nrows=2, width_ratios = widths, height_ratios = heights)

# the first subplot will span one row and two columns
# it will start at the top left
ax0 = fig.add_subplot(gspec[0,:2])
ax0.text(0.5, 0.5, "This is axes object 0, \ngspec coordinates [0,:2]", ha='center')

# the second subplot will span one row and one column
# it will start at the bottom left
ax1 = fig.add_subplot(gspec[1,0])
ax1.text(0.5, 0.5, "This is axes object 1, \ngspec coordinates [1,0]", ha='center')

# the third subplot will span one row and one column
# it will start at the bottom middle
ax2 = fig.add_subplot(gspec[1,1])
ax2.text(0.5, 0.5, "This is axes object 2, \ngspec coordinates [1,1]", ha='center')

# the fourth subplot will span two rows and one column
# it will start at the top right
ax3 = fig.add_subplot(gspec[:,2])
ax3.text(0.5, 0.5, "This is axes object 3, \ngspec coordinates [:,2]", ha='center')

plt.tight_layout()

Final thoughts

There are many more ways we can customize subplots in Matplotlib, but the material in this post suits my needs 99% of the time. For further reading, check out the Matplotlib documentation and examples: https://matplotlib.org/gallery/subplots_axes_and_figures/subplots_demo.html#sphx-glr-gallery-subplots-axes-and-figures-subplots-demo-py