or how to replicate the New York Times presidential election shift map
This week’s blogpost is a visualization demo replicating a popular map from last year. The map below shows the shift in voter margin between the 2016 and 2020 Presidential Elections by the two major political parties in the United States. The direction and color of the arrows indicates the party and the length of the arrow indicates the shift. This type of figure can be useful in visualizing many types of spatially distributed changes (e.g. population change in a city, change in GDP per capita, losses and gains). This blogpost shows how to replicate it in Python using commonly used packages.
Even though the creators of the original provide their 2020 data, their 2016 data is not available so the data I’ll be using came from the MIT Election Data and Science Lab and can be downloaded here: https://doi.org/10.7910/DVN/VOQCHQ. All the code and data to replicate my figure can be found in this repository: https://github.com/antonia-had/election_data_shift
The main packages we’ll be using for this are cartopy and matplotlib to create the map and annotate elements on it, pandas for some simple data analysis and haversine to convert distances on the map (which you might not need if you’re applying the code to a small spatial scale).
First thing we do is load our packages and data.
counties.csv contains the latitude and longitude for every country we’ll be plotting.
countypres_2000-2020.csv contains our downloaded election data. As you can see in the code comments, I had to clean out some of the datapoints due to inconsistencies or errors. I’ll also only be plotting the contiguous US to simplify the exercise, but you can definitely include code to also plot Alaska and Hawaii in the same figure.
import matplotlib.pyplot as plt import cartopy.crs as ccrs import pandas as pd import cartopy.io.shapereader as shpreader from haversine import inverse_haversine, Direction # Read in county position data pos_data = pd.read_csv('./data/counties.csv', delimiter=',', index_col=0) # Read in county election data # Data from https://doi.org/10.7910/DVN/VOQCHQ # Data points without county FIPS code removed all_election_data = pd.read_csv('./data/countypres_2000-2020.csv') # Filter data to only keep years 2016 and 2020 # Dataset reports issues with Alaska data so filter those out too # Missing data for 2020 for some counties # County with FIPS code 46113 was assigned a new FIPS code (46102) which is changed in the downloaded data mask = (all_election_data['year'] >= 2016) & \ (all_election_data['state'] != 'ALASKA') &\ (all_election_data['state'] != 'HAWAII') & \ (all_election_data['county_fips'] != 11001) & \ (all_election_data['county_fips'] != 51515) & \ (all_election_data['county_fips'] != 36000) election_data = all_election_data[mask]
Next we calculate the percentage of votes each party gained at each election and compare the results between the two elections to calculate their shift. A simplifying assumption here is that we’re only focussing on the top two parties (but you can do more with different color arrows for example). We’re also copying the latitude and longitude of each county so everything is in one dataframe.
# Calculate vote percentage per party election_data['percentagevote'] = election_data['candidatevotes']/election_data['totalvotes'] * 100 # Create new dataframe to store county change results shift = election_data[['state', 'county_name', 'county_fips']].copy() # Drop duplicate rows (original dataframe was both 2016 and 2020) shift = shift.drop_duplicates(['county_fips']) # Create columns to store change for every party shift['DEMOCRAT'] = 0.0 shift['REPUBLICAN'] = 0.0 #Create columns for latitude and longitude so everything is in the same dataframe shift['lat'] = 0.0 shift['lon'] = 0.0 # Iterate through every county and estimate difference in vote share for two major parties for index, row in shift.iterrows(): county = row['county_fips'] for party in ['DEMOCRAT', 'REPUBLICAN']: previous_result = election_data.loc[(election_data['year'] == 2016) & (election_data['county_fips'] == county) & (election_data['party'] == party)]['percentagevote'].values new_result = election_data.loc[(election_data['year'] == 2020) & (election_data['county_fips'] == county) & (election_data['party'] == party)]['percentagevote'].values # If any of the two results is nan assign zero change if pd.isna(new_result) or pd.isna(previous_result): shift.at[index, party] = 0 else: shift.at[index, party] = new_result - previous_result # Combine lat and long values also so it's all in one dataframe shift.at[index, 'lat'] = pos_data.at[county, 'lat'] shift.at[index, 'lon'] = pos_data.at[county, 'lon']
To create our map we do the following.
Set up matplotlib figure with the map extent of the contiguous United States and use cartopy geometries to add the shapes of all states.
fig = plt.figure(figsize=(12, 8)) ax = fig.add_subplot(1, 1, 1, projection=ccrs.LambertConformal(), frameon=False) ax.set_extent([-120, -74, 24, 50], ccrs.PlateCarree()) # Add states shape shapename = 'admin_1_states_provinces_lakes' states_shp = shpreader.natural_earth(resolution='110m', category='cultural', name=shapename) ax.add_geometries(shpreader.Reader(states_shp).geometries(), ccrs.PlateCarree(), facecolor='#e5e5e5', edgecolor='white', zorder=0)
We then need to determine how the shift should be plotted in each county. A simplifying assumption here is that we’re showing the largest positive shift (i.e., if both parties lost votes we’re only showing a small grey point). There’s several ways to draw an arrow at each point, depending on what you’d like to show and the complexity you’re comfortable with. The way I am showing here is exploiting the matplotlib
annotate function, typically used to annotate a figure with text and arrows.
The way I’m going about this is a little mischievous but works: I’m only using the arrow component of it with a blank text annotation and identify a point where each arrow should be pointing to by using each county’s lat and long and the estimated shift. If this was a simple matplotlib figure using cartesian coordinates, calculating the end point would be simple trigonometry. Since latitude and longitude are not on a cartesian plane, we need to convert them using the haversine formula (or its inverse). It’s fairly easy to implement yourself but since there already exceeds a handy python package for it, I’m using that instead. The transform function I am using up top is necessary for matplotlib to know how to transform the points from the annotation function (typically not necessary to do if using, say,
ax.scatter()), some explanation of why that is can be found here. The colors and all other customization is done so the figure looks as close as possible to the original.
transform = ccrs.PlateCarree()._as_mpl_transform(ax) for index, row in shift.iterrows(): # Determine arrow color dem_shift = shift.at[index, 'DEMOCRAT'] rep_shift = shift.at[index, 'REPUBLICAN'] # Check if both lost votes, then set arrow to grey if dem_shift<0 and rep_shift<0: arrow_color = 'grey' ax.scatter(shift.at[index, 'lon'], shift.at[index, 'lat'], color=arrow_color, transform=ccrs.PlateCarree(), s=0.5) # If at least one of them gained votes else: if dem_shift >= rep_shift: arrow_color = '#1460a8' direction = Direction.NORTHWEST change = dem_shift else: arrow_color = '#bb1d2a' direction = Direction.NORTHEAST change = rep_shift end_location = inverse_haversine((shift.at[index, 'lat'], shift.at[index, 'lon']), change*25, direction)[::-1] ax.annotate(" ", xytext=(shift.at[index, 'lon'], shift.at[index, 'lat']), xy=end_location, arrowprops=dict(facecolor=arrow_color, edgecolor=arrow_color, width=0.2, headwidth=3, headlength=5), xycoords=transform, zorder=1) plt.tight_layout() plt.savefig('electionshiftmap.png', dpi=300)
The resulting figure looks like this, which I am calling pretty close, considering the dataset differences. Tinkering with colors, widths, lengths and transforms can get you a different look if you’re after that.