12 Years of WaterProgramming: A Retrospective on >500 Blog Posts

Just over 12 years ago, on January 9th 2012, the first WaterProgramming post was published. It was written by Joe Kasprzyk who is now an Associate Professor at CU Boulder, but at the time was a graduate student in the Reed Research Group. The post reads, in it’s entirety:

Welcome!

This blog shares tips for writing programs and running jobs associated with using multiobjective evolutionary algorithms (MOEAs) for water resources engineering.  It will be informal, with posts on a number of topics by a number of folks.

Since that first post, there have been 538 posts on the WaterProgramming blog!

Since that time, the content and style of posts has naturally evolved alongside the groups research foci and training needs. As we transition into a new year, I wanted to take the opportunity to look back and study the 12 years of activity on the WaterProgramming blog.

In preparing for this post, I have downloaded the entirety of the WaterProgramming blog archive and performed some fun analysis to look more closely at what has been made over the years.

To those of you who are regular readers of the blog, thank you for the continued interest! To those who may be less familiar, I hope this post helps to give you a bigger-picture of what goes on in this niche corner of the internet.

New tools to support the blog, and our top posts of all time

Before going any further, I want to point out a few new tools we have developed to support the blog content and anyone who is interested in our training activities. Given the number of posts on this site, it may be difficult to navigate the different posts and topics.

To make learning with the blog easier, we created the Reed Group Lab Manual (which was highlighted in Andrew’s blog post last fall) that includes:

Now, to kick us off, I want to highlight our five most-popular blog posts to-date. The top five posts of all time, based on total views are:

  1. PyCharm as a Python IDE for Generating UML Diagrams by Tom Wild
  2. Converting Latex to MS Word docx (almost perfectly) by Bernardo Trindade
  3. A quick example code to write data to a csv file in C++ by David Gold
  4. Types of Errors in Numerical Methods by Rohini Gupta
  5. Running a Python script using Excel macros by Lillian Lau

Post length over time

Perhaps one of the most obvious changes which has taken place over the last 12 years is the change in average blog post length. The figure below shows the length of each individual post (blue) with the annual average post length overlaid (yellow).

At the start of it’s life, WaterProgramming posts could be characterized as bite-sized tips-and-tricks which were often 200-500 words in length. In the first year along, there were more than 80 WaterProgramming posts!

In more recent years, the style of post has evolved to be quite a bit longer often coming in at 500-1500 words. Consequently, the posting frequency has been reduced (see figure below) and we have stabilized to an average of ~40 posts per year (with there being 35 posts in 2023).

Our most common topics

While the original Welcome! post emphasized our focus on “writing programs and running jobs associated with using multiobjective evolutionary algorithms” there has been a large variety of different posts since then.

Here, I took a look at all of the blog post titles over the years, and have identified the most frequent words (see figure below).

Looking at this plot, one thing stands out very clearly: we like working with Python! The most-frequent words reflect the “WaterProgramming” title and are: Python, Data, Analysis, Borg, Code.

However, I also want to highlight the frequency with which our posts provide some sort of demonstration and/or training activity which is a focus for our group. This focus on reproducibility and open-science is shown by the fact that some of the other most-frequent title words include:

  • Training
  • Interactive
  • Example

Another theme revealed here is that we aim to keep the content accessible across audiences, with titles frequently including the words “introduction”, “basic”, and “simple”.

And lastly, I will employ a highly sophisticated (/s) data visualization technique to help illustrate the key WaterProgramming themes in a more appealing way: the word cloud.

Conclusion and Thank You

As I was getting established in the Reed Research Group, I personally found the WaterProgramming blog to be a priceless resource. Now, I am very glad to able to contribute to this site, be part of the community, and support others in their learning.

I want to close out with a big THANK YOU to all of the contributors over the years. You all rock. In the table below I want to acknowledge anyone and everyone who has contributed to this blog in this past, along with a link to their top blog post. The majority of these folks have moved on from the Reed Group (or were external contributors) who are off doing great work; the table below does not include their impressive titles or accolades.

In no particular order:

AuthorTop-Post
Joe KasprzykUsing a virtual machine to run 32-bit software on a modern PC
Jon HermanRunning Sobol Sensitivity Analysis using SALib
Julie QuinnFitting Hidden Markov Models Part II: Sample Python Script
Jazmin ZatarainVisualization strategies for multidimensional data
Bernardo TrindadeConverting Latex to MS Word docx (almost perfectly)
Keyvan MalekTaylor Diagram
David GoldMake LaTeX easier with custom commands
Rohini GuptaTypes of Errors in Numerical Methods
Lillian LauRunning a Python script using Excel macros
Antonia HadjimichaelNondimensionalization of differential equations – an example using the Lotka-Volterra system of equations
Andrew HamiltonBivariate choropleth maps
Tom WildPyCharm as a Python IDE for Generating UML Diagrams
Trevor AmestoyMarkdown -Based Scientific and Computational Note Taking with Obsidian
Jon LamontagnePlotting geographic data from geojson files using Python
Tina KarimiParallel processing with R on Windows
Jared SmithPackages for Hydrological Data Retrieval and Statistical Analysis
William RasemanMultivariate Distances: Mahalanobis vs. Euclidean
Jan KwakkelScenario Discovery in Python
Andrew DirksRemote terminal environment using VS Code for Windows and Mac
David HadkaIntroduction to OpenMORDM
Travis ThurberContinuous Deployment with GitHub Actions (or, What Gives Life to a Living eBook?)
Peter StormLaunching Jupyter Notebook Using an Icon/Shortcut in the Current Working Directory Folder
Calvin WealtonCustom Plotting Symbols in R
Anaya GangadharVisualizing large directed networks with ggraph in R
Josh KollatAeroVis Documentation
Tori WardCompiling shared libraries on Windows (32 bit and 64 bit systems)
Charles RougeEvaluating and visualizing sampling quality
Sara AlalamIntroduction to Borg Operators Part 1: Simplex Crossover (SPX)
Gregory GarnerSurvival Function Plots in R
Veysel YildizHow ChatGPT Helped Me To Convert a MATLAB Toolbox to Python and Learn Python Coding
Michael LuoRuntime Visualization of MOEA with Platypus in Python
Ryan McKellyCommon PBS Batch Options
MattConverting an SVG to EPS
Nasser NajibiWeather Regime-Based Stochastic Weather Generation (Part 2/2)
Yu LiUse python cf and esgf-python-client package to interact with ESGF data portal
Ben LivnehInterpolating and resampling across projections and spatial resolutions with GDAL
Raffaele CestariLegacy Code Reborn: Wrapping C++ into MATLAB Simulink

Creating Interactive Geospatial Maps in Python with Folium

Interactive mapping and data visualization provide data scientists and researchers with a unique opportunity to explore and analyze geospatial data, and to share their work with stakeholders in a more engaging and accessible way.

Personally, I’ve found that constructing an interactive map for my own research, and iteratively updating it as the work progresses, has been great for improving my own understanding of my project. My project map (not the one in this demo) has been a helpful narrative aid when introducing someone to the project.

I’ve had a lot of success using the folium Python package to create interactive maps, and want to share the tool with you all here.

In this post, I provide a demonstration on how to use the folium package to create an HTML based interactive map of a hydrologic watershed.

Folium: a quick introduction

Folium is a Python package that is used to create interactive maps.

It is built on top of the Leaflet JavaScript library and can be used to visualize geospatial data in unique ways. Using folium, users construct maps by adding various features including lines, markers, pop-up text boxes, and different base-maps (aka, tiles), and can export maps as interactive HTML documents.

The full Folium documentation is here, however I think it is best to learn through an example.

Demo: Mapping a hydrologic basin data with Folium

All of the code and data used in this post is located in a GitHub repository: trevorja/Folium_Interactive_Map_Demo.

To interact with the demo map directly, download the file here: basin_map.html

The folium_map_demo.ipynb Jupyter Notebook steps through the process shown below and will re-create the map shown above. You can create a similar plot for a different basin by changing the station_id number inside ‘retrieve_basin_data.ipynb’ to a specific USGS gauge of interest.

Here is a brief video showcasing the interaction with the final map:

Map Data

For this demo, I am plotting various features within a hydrologic basin in northern New Mexico.

All of the data in the map is retrieved using the pynhd package from the HyRiver suite for python. For more information about these packages, see my previous post, Efficient hydroclimatic data accessing with HyRiver for Python.

The script ‘retrieve_basin_data.ipynb’ is set up to retrieve several features from the basin, including:

  • Basin boundary
  • Mainstem river
  • Tributary rivers
  • USGS gauge locations
  • New Mexico Water Data Initiative (NMWDI) Sites
  • HUC12 pour points

The geospatial data (longitude and latitudes) for each of these features are exported to data/basin_data.csv and used later to generate the folium map.

Constructing a folium hydrologic map

Like many other data visualization programs, folium maps are constructed by iteratively adding features or map elements to the main map object. It is easy to add a map feature, however it takes more care to ensure that the features are well-organized and respond to user interaction the way you want them to.

In this demo, I deconstruct the process into five parts:

  1. Initializing the map
  2. Initializing feature groups (layers)
  3. Adding points to feature layers
  4. Adding polygons & lines to feature layers
  5. Adding layers onto the map

Before going any further, we need to import the necessary packages, load our basin data, and convert the geospatial data to geopandas.GeoDataFrame() geometry objects:

# Import packages
import pandas as pd
import folium
from folium.plugins import MarkerCluster
import geopandas as gpd
from shapely.geometry import Point, LineString
from shapely import wkt

# Specify the coordinate reference system (CRS)
crs = 4386

# Load the basin data
basin_data = pd.read_csv('./data/basin_data.csv', sep = ',', index_col=0)

# Convert to a geoDataFrame and geometry objects
basin_data['geometry'] = basin_data['geometry'].apply(wkt.loads)
geodata = gpd.GeoDataFrame(basin_data, crs = crs)

Additionally, I start by specifying a couple design options before getting started (colors, line widths, etc.). I do this at the start so that I can easily test different colors, and I store all of these specifications in a dict. This step is not necessary, but just helps to stay organized. The following code block shows some of these specifications, but some are left out just to make this post shorter:

## Plot options
# Line widths
basin_linewidth = 2
mainstem_linewidth = 3
tributary_linewidth = 1

# ...
# More color and design specs here...
# ...

# Combine options into a dictionary
plot_options = {
    'station':{'color': usgs_color,
               'size': usgs_size},
    'pourpoint': {'color': pourpoint_color,
                  'size': pourpoint_size},
    'nmwdi': {'color': nmwdi_color,
              'size': nmwdi_size}
              }

With that out of the way, we will start mapping.

Initializing the map

We start to construct our map using the folium.Map() function:

# Initialize the map
geomap = folium.Map(location = [36.5, -106.5],
                    zoom_start = 9.2,
                    tiles = 'cartodbpositron',
                    control_scale = True)

The location and zoom_start arguments set the default view; the user will be able to pan and zoom around the map, but this will be the starting location.

The tile argument in the initial folium.Map() calls sets the default base-map style, but different styles can be added using the folium.TileLayer() function. Each TileLayer() call adds a different base-map style that is then available from the drop-down menu in the final figure.

## Add different tiles (base maps)
folium.TileLayer('openstreetmap').add_to(geomap)
folium.TileLayer('stamenwatercolor').add_to(geomap)
folium.TileLayer('stamenterrain').add_to(geomap)

Here is a side-by-side comparison of the four different tiles used in this demo:

Initializing feature groups (layers)

Before we add any lines or makers to the map, it is best to set up different layers using folium.FeatureGroup(). When the map is complete, the different feature groups can be toggled on-and-off using the drop-down menu.

Below, I initialize a folium.FeatureGroup for each of the six feature types in the map.

# Start feature groups for toggle functionality
basin_layer = folium.FeatureGroup(name = 'Basin Boundary', show=True)
usgs_layer = folium.FeatureGroup(name = 'USGS Gauges', show=True)
mainstem_layer = folium.FeatureGroup(name = 'Mainstem River', show=True)
tributary_layer = folium.FeatureGroup(name='Tributary Rivers', show=False)
pourpoint_layer = folium.FeatureGroup(name= 'HUC12 Pour Points', show=False)
nmwdi_layer = folium.FeatureGroup(name='NM Water Data Initiative Gauge', show=True)

The name argument is what will be displayed on the drop-down menu. The show argument indicates whether that layer is visible by default (when the map is first opened), or whether it first needs to be selected using the drop-down menu.

Now that the layers are initialized, we can begin adding the features (polygons, lines, and point markers) to each layer.

Adding points to feature layers

The folium.CircleMarker() is used to add circle points using a specific coordinate location.

The following code shows how I am iteratively adding different points to the different layers of the map based upon the feature type.

For each point, I extract the latitude and longitude (coords = [point.geometry.y, point.geometry.x]) and pass it to the folium.CircleMarker() function with colors and sizes specific to each of the three different point features (USGS gauge stations (station), NMWDI (nmwdi), and HUC12 pourpoints (pourpoint)):

plot_types = ['station', 'nmwdi', 'pourpoint']
plot_layers = [usgs_layer, nmwdi_layer, pourpoint_layer]

# Loop through the different feature point types
for i, feature_type in enumerate(plot_types):

	# Specify the correct feature layer to add the point too
	map_layer = plot_layers[i]

	# Add each point
    for _, point in geodata.loc[geodata['type'] == feature_type].iterrows():    
        coords = [point.geometry.y, point.geometry.x]
        
        # Add the popup box with description
        textbox = folium.Popup(point.description,
                               min_width= popup_width,
                               max_width= popup_width)

		# Add the marker at the coordinates with color-coordination
        folium.CircleMarker(coords,
                            popup= textbox,
                            fill_color = plot_options[feature_type]['color'],
                            fill = True,
                            fill_opacity = fill_opacity,
                            radius= plot_options[feature_type]['size'],
                            color = plot_options[feature_type]['color']).add_to(map_layer)

Adding polygons & lines to feature layers

I’ve found that it can be difficult to add Polygons or Lines to a folium map if the coordinate geometry are not formatted correctly. For me, the best method has been to convert the polygon or line data to a geopandas.GeoSeries and then converting this to JSON format data using the .to_json() method.

Once in JSON format, the data can be added to the map using the folium.GeoJson() method similar to other features. Although, rather than adding it to the map directly, we add it to the specific feature layer so that we can toggle things later.

Below, I show how I add the basin boundary polygon to the map. This needs to be repeated for the mainstem and tributary river lines, and the full code is included in folium_map_demo.ipynb.

## Plot basin border
for i,r in geodata.loc[geodata['type'] == ].iterrows():
	# Convert the Polygon or LineString to geoJSON format
    geo_json = gpd.GeoSeries(r['geometry']).simplify(tolerance = 0.000001).to_json()
    geo_json = folium.GeoJson(data= geo_json,
                              style_function=lambda x: {'fillcolor': 'none',
                                                        'weight': basin_linewidth,
                                                        'color': basin_color,
                                                        'opacity': 1,
                                                        'fill_opacity': 1,
                                                        'fill': False})
	# Add popup with line description
    folium.Popup(r.description,
                 min_width = popup_width,
                 max_width= popup_width).add_to(geo_json)
    
    # Add the feature to the appropriate layer
    geo_json.add_to(basin_layer)

And with that, the hard part is done.

Last step: adding layers onto map

Now, if you try to visualize the map it will be empty! This is because we have not added the feature layers to the map. In this last step, we add each of the six feature layers to the map and also add the folium.LayerControl() which will allow for us to toggle the different layers on-and-off:

# Add all feature layers to the map
basin_layer.add_to(geomap)
usgs_layer.add_to(geomap)
mainstem_layer.add_to(geomap)
tributary_layer.add_to(geomap)
pourpoint_layer.add_to(geomap)
nmwdi_layer.add_to(geomap)

# Add the toggle option for layers
folium.LayerControl().add_to(geomap)

Ready for the grand reveal?

Viewing, saving, and sharing the map

Viewing your map is as easy as calling the map name at any point in the script (i.e., geomap), and folium makes it easy to save the map as an HTML using the map.save() function as shown here:

# Save and view the map
geomap.save("basin_map.html")
geomap

Once you have your HTML saved, and you’ve taken a moment to open it on your computer and made sure that the features are situated nicely, then it comes time to share. Other users can view the maps simply by opening the HTML file on their local machine, or you can add the HTML to a website.

Concluding thoughts

I hope you’ve found some inspiration here, and find a way to incorporate interactive geospatial mapping in your project. I don’t think it can be overstated how much an interactive visual such as a folium map can serve to broaden the access to your dataset or model.

Thanks for reading!

A template for reproducible papers

Writing fully reproducible papers is something everyone talks about but very few people actually do. Following nice examples I’ve seen developed by others (see here and here), I wanted to develop a GitHub template that I could easily use to organize the analysis I perform for each paper. I wanted it to be useful for the Reed group in general, but also anyone else who’d like to use it, so the version I’m presenting today is an initial version that will be adapted and evolve as our needs grow.

The template can be found here: https://github.com/antonia-had/paper_template and this blogpost will discuss its contents. The repository is set up as a template, so you can use “Import repository” when you create a new repository for your project or click on the green “Use this template” button on the top right.

The idea is that everything is organized and documented well so that another person can easily replicate your work. This will help with your own tools being more widely used and cited, but also future group members to easily pick up from where you left. The other selfish way in which this has helped me is that it forces me to spend some time and arrange things from the beginning so I can be more organized (and therefore more productive) during the project. Most importantly, when a paper does get accepted you don’t need to go back and organize everything so it looks halfway decent for a public repository. For these reasons I try to use a template like this from the early stages of a project.

A lot of the template is self explanatory, but I’ll go through to explain what is in it in short. The idea is you take it and just replace the text with your own in the README files and use it as a guide to organize your paper analysis and results.

There are directories to organize your content to code, data, and results (or anything else that works for you). Every directory has its own README listing its contents and how they should be used. All code that you didn’t write and data that you didn’t generate need to be cited. Again, this is useful to document from the beginning so you don’t need to search for it later.

Most of my work is done in Python, so I wrote up how to handle Python dependencies. The way I suggest going about it is through a ‘.yml‘ file that specifies all the dependencies (i.e. all the packages and versions your script uses) for your project. I believe the best way to handle this is by creating a Python environment for every project you work on so you can create a separate list of dependencies for each. We have a nice blogpost on how to create and manage Python environments here.

When the project is done and you’re ready to submit or publish your paper, export all dependencies by running:

conda env export > environment.yml --no-builds

and store your environment.yml in the GitHub repository. When someone else needs to replicate your results, they would just need to create the same Python environment (running conda env create --file environment.yml) before executing your scripts.

Finally, you could automate the analysis and figure production with a makefile that executes your scripts so who ever is replicating does not need to manually execute all of them. This also helps avoiding somebody executing the scripts in the wrong order. An example of this can be found in this template. The makefile can also be in Python, like Julie did here.

In recognition that this is not an exhaustive template of everything one might need, the aim is to have this blog post and the template itself evolve as the group identifies needs and material to be added.

WordPress: How to post a Screenr/YouTube video

Update: As of October 2015, Screenr has been discontinued.

As a guide for other folks trying to post videos on this blog (and, I suppose, on WordPress in general), here is the workflow:

  1. Create an account on screenr. In order to make a video, simply press Record and follow the instructions.
  2. After your video is done, it will give you a link to a screenr video. You can simply post a link to this video and have the users navigate to it on their site. The cooler thing to do, though, is to embed the video on YouTube. So…
  3. Create an acccount on YouTube. Nowadays you can link your Google account so it’s very seemless.
  4. Back in Screenr, click Publish to YouTube. It will ask for your YouTube name and password. The video is automatically sent to YouTube, and you have to navigate back to the YouTube page to manage it. As it explains on the screen, it may take a few minutes for the transfer to complete.
  5. In YouTube, navigate to My Videos to manage your new video. One suggestion is to make the video Unlisted, which means that you need a direct link in order to watch it. In the Advanced tools, make sure Enable Embedding is clicked.
  6. Click Watch on Video Page to see what the video will look like inside YouTube. Then, click Share, and then Embed to get the Embed code. Copy it to the clipboard.
  7. Now we’re ready to make our WordPress post. Log in to WordPress. Navigate to the WordPress “dashboard” to create a new post (you want to be in the Dashboard to get all the advanced settings for a full post, not just the quick post editor). Type a description of your video. When you’re ready to put the YouTube embed code, open the “Text” tab in the editor and paste the embed code.
  8. Publish the WordPress post and you’re done!

Some WordPress Tips

When you log in to wordpress, there should be a gray bar at the top that has your username. If you click your username, you’ll be taken into a screen that has all sorts of helpful links on the left hand side. Here you can create posts and edit existing ones. The editor that you reach from this menu seems better than another editor I’ve seen pop up, so use this one.

A few helpful features:

  • “Upload/Insert” allows you to post pdfs, pictures, powerpoints, you name it.  You can upload documents you’ve already written, such as software documentation for example.
  • In the “Publish” box on the right hand side of the editor, you can set the visibility of a post to private.  Once the blog goes public, this will be important.
  • Please set “Categories” for your posts which will help organization later on.  You can add a new category, or use existing ones.  There are also “Tags” for more specific content organization, but we haven’t really decided how to use one versus the other yet.
  • The following page talks about posting source code on WordPress.
  • If you have changed a page (like I did with this one) and want to move it up to the top of the blog, just change its publishing date to today’s date!
  • Tags allow us to be more easily found through WordPress.  Try to add them if you can!

As usual please feel free to add more WordPress tips to this post as we go along!

Publishing websites at Penn State

Here’s how to make a basic website at Penn State and some tips on getting started with HTML/CSS.

Most students should already have web space set up through personal.psu.edu, such as my personal site.  To access it, you can use portal.psu.edu.  Click “log in”, and then access the file transfer utility.  Anything you upload to the /www/ folder will be available through your personal site.  Penn State launched an “e-portfolios” initiative a few years ago, with a nice site with web publishing advice.  I haven’t checked it out in a while, but I remember there were some good resources there.

As far as website design, I’d recommend checking out free design template resources such as freecsstemplates.org.  That cite uses CSS templates, which stands for Cascading Style Sheets.  The idea is that the CSS file controls all the design (the background color, the fonts, what links look like, etc.) and you just have to add content through html code.  The html commands are pretty basic, and there is a starter html page included at the free CSS website that should help you get started.

Alternatively, you can use a hosting site such as wordpress, such as what handles this blog.  It’s great for news-style posts, but not as good if you want to have permanent content such as a resume or sets of links.