Cythonizing your Python code, Part 2: Next steps with reservoir simulation

Cython is an extension of the Python programming language that allows you to add C-like functionality, such as static typing and compilation. This can lead to significant efficiency gains over pure Python. In my last blog post, I introduced Cython’s motivation and benefits using a simple Fibonacci sequence example. Cythonizing this program led to a 20-30x speedup on my laptop, for very little work!

However, you may be wondering about the feasibility and impact of Cython for more complex programs, such as simulation models used in research. In this blog post, I will demonstrate how Cython can be applied to larger, object-oriented programs that use functions, classes, and NumPy arrays, using a reservoir simulation model as an example. Readers unfamiliar with Cython are encouraged to read the first post in this series prior to proceeding.

Reservoir simulation model

All code used in this blog post can be found in this GitHub repository, in the reservoir_sim directory. This contains nine sub-directories with different versions of the model. I will describe the differences between the versions later in this post, but first I will briefly introduce the major components of the simulation model using the py_fast version. The model simulates a two-reservoir system, one of which is downstream from the other. Both have their own inflows, which are sinusoidal plus noise. Additionally, both reservoirs have their own demands and minimum flow thresholds, which are also sinusoidal plus noise. Each time step, the upper reservoir attempts to release the minimum flow requirement if there is enough water, then tries to satisfy the demand if possible. The lower reservoir then repeats the same process, with the upper reservoir’s releases added to its own catchment inflows. The simulation is run over a number of years, and the stochastic inputs (inflows, upstream releases, minimum flow requirements, demand) and outputs (releases, deliveries, storages) are stored as time series.

The directory py_fast contains three Python files, representing the Demand, Reservoir, and Model classes. The simplest class, Demand, is initialized with inputs that determine the parameterization of the random demand process, and a function that can be called each day to randomly sample from this process:

### Demand class - pure Python
from random import gauss
from math import pi, sin

class Demand():
  def __init__(self, name, demand_params):
    self.name = name
    # amplitude, phase, shift, & noise std for sine wave of demands
    self.demand_amp, self.demand_phase, self.demand_shift, self.demand_noise = demand_params
    # save 2pi/365 to save conversion time
    self.days_to_radians = 2. * pi / 365.

  ### demand function
  def demand_t(self, t):
    noise = gauss(0., self.demand_noise)
    demand_t = self.demand_amp * sin((t - self.demand_phase)  * self.days_to_radians) + self.demand_shift + noise
    return demand_t

The Reservoir class is similar, with parameter initialization and its own stochastic inflow and minimum flow sampling methods. The initialization also instantiates a Demand object belonging to the Reservoir object.

### Reservoir class - pure Python
import numpy as np
from Demand import Demand

class Reservoir():
  def __init__(self, name, inflow_params, min_flow_params, demand_params, storage_params):
    self.name = name
    # amplitude, phase, shift, & noise std for sine wave of inflows
    self.inflow_amp, self.inflow_phase, self.inflow_shift, self.inflow_noise = inflow_params
    # amplitude, phase, and shift for sine wave of minimum flows
    self.min_flow_amp, self.min_flow_phase, self.min_flow_shift = min_flow_params
    # reservoir capacity and initial storage
    self.capacity, self.storage = storage_params
    # set up demand object
    self.demand = Demand(name, demand_params)

  ### inflow function
  def inflow_t(self, t):
    noise = np.random.normal(0., self.inflow_noise, 1)[0]
    inflow_t = self.inflow_amp * np.sin((t - self.inflow_phase)  * 2. * np.pi / 365.) + self.inflow_shift + noise
    return inflow_t

  ### min flow function
  def min_flow_t(self, t):
    min_flow_t = self.min_flow_amp * np.sin((t - self.min_flow_phase)  * 2. * np.pi / 365.) + self.min_flow_shift
    return min_flow_t

The Reservoir class also has a step method, which dictates the release, delivery, and updated storage based on the availability of water.

  ### step reservoir another day
  def step(self, t, upstream_release):
    inflow = self.inflow_t(t)
    min_flow = self.min_flow_t(t)
    demand = self.demand.demand_t(t)
    
    # first assume release & delivery meet min_flow and demand exactly
    release = min_flow
    delivery = demand
    self.storage += inflow + upstream_release - release - demand

    # check if storage overflow
    if self.storage > self.capacity:
      release += self.storage - self.capacity
      self.storage = self.capacity

    # check if storage went negative. if so, curtail demand first, then env flows
    elif self.storage < 0:
      if delivery > (-self.storage):
        delivery += self.storage
        self.storage = 0.
      else:
        self.storage += delivery
        delivery = 0
        if release > (-self.storage):
          release += self.storage
          self.storage = 0
        else:
          print('This should not be happening')
    
    return (inflow, upstream_release, min_flow, demand, release, delivery, self.storage)

Finally, the Model class is used to set up and run the simulation. Upon initialization, the Model object first creates the upper and lower Reservoir objects, and creates a dictionary of lists to hold the results.

### Model class for running simulation
import matplotlib.pyplot as plt
import random
import numpy as np
from Reservoir import Reservoir

class Model():
  def __init__(self, years, plot, seed):
    # length of simulation (no leap years)
    self.years = years
    self.days = 365 * self.years
    # boolean for whether to plot
    self.plot = plot
    # random seed
    random.seed(seed)

    ### set up upper reservoir
    # inflow params in terms of sinusoid, (amplitude, phase, shift, noise). units = AF/day
    inflow_params = (300., 0., 500., 20.)
    # min flow params in terms of sinusoid, (amplitude, phase, shift). units = AF/day
    min_flow_params = (200., 122., 300.)
    # demand params in terms of sinusoid, (amplitude, phase, shift, noise). units = AF/day
    demand_params = (300., 163., 600., 30.)
    # storage params (capacity, starting storage). units = AF
    storage_params = (500000., 250000.)
    # initialize upper reservoir
    self.reservoir_upper = Reservoir('upper', inflow_params, min_flow_params, demand_params, storage_params)

    ### set up lower reservoir
    # inflow params in terms of sinusoid, (amplitude, phase, shift, noise). units = AF/day
    inflow_params = (200., 30., 600., 100.)
    # min flow params in terms of sinusoid, (amplitude, phase, shift). units = AF/day
    min_flow_params = (100., 91., 400.)
    # demand params in terms of sinusoid, (amplitude, phase, shift, noise). units = AF/day
    demand_params = (100., 152., 300., 10.)
    # storage params (capacity, starting storage). units = AF
    storage_params = (20000., 15000.)
    # initialize lower reservoir
    self.reservoir_lower = Reservoir('lower', inflow_params, min_flow_params, demand_params, storage_params)

    ### set up data storage
    self.reservoir_list = [self.reservoir_upper, self.reservoir_lower]
    self.output = {}
    for reservoir in self.reservoir_list:
      self.output[reservoir.name] = {}
      self.output[reservoir.name]['inflow'] = []
      self.output[reservoir.name]['upstream_release'] = []
      self.output[reservoir.name]['min_flow'] = []
      self.output[reservoir.name]['demand'] = []
      self.output[reservoir.name]['release'] = []
      self.output[reservoir.name]['delivery'] = []
      self.output[reservoir.name]['storage'] = [reservoir.storage]

The Model.run function performs the simulation loop.

  def run(self):
    for t in range(self.days):
      # step upper
      (inflow, upstream_release, min_flow, demand, release, delivery, storage) = self.reservoir_upper.step(t, 0.)
      self.output['upper']['inflow'].append(inflow)
      self.output['upper']['upstream_release'].append(upstream_release)
      self.output['upper']['min_flow'].append(min_flow)
      self.output['upper']['demand'].append(demand)
      self.output['upper']['release'].append(release)
      self.output['upper']['delivery'].append(delivery)
      self.output['upper']['storage'].append(storage)

      (inflow, upstream_release, min_flow, demand, release, delivery, storage) = self.reservoir_lower.step(t, release)
      self.output['lower']['inflow'].append(inflow)
      self.output['lower']['upstream_release'].append(upstream_release)
      self.output['lower']['min_flow'].append(min_flow)
      self.output['lower']['demand'].append(demand)
      self.output['lower']['release'].append(release)
      self.output['lower']['delivery'].append(delivery)
      self.output['lower']['storage'].append(storage)

    # return total storage last time step to make sure versions are equivalent
    return self.output['upper']['storage'][-1] + self.output['lower']['storage'][-1]

We can run the model and plot the results as follows (plot function not shown for brevity, see repository).

from Model import Model

# initialize model
model = Model(years = 5, plot = True, seed = 101)

# run simulation
tot_storage = model.run()

# plot results
model.plot_results()
Figure 1: Five-year realization of simulated inflows with and without upstream release (column 1), demand vs. deliveries (column 2), minimum flow requirement vs. release (column 3), and storage (column 4) for the upper (row 1) and lower (row 2) reservoirs.

Cythonizing a class

Now that we have our model in place, we can begin to add static type information and Cythonize it. I will focus now on the cy version of the model, which builds on the py_fast model. First consider Demand.pyx.

### Demand class - cython
from random import gauss
from math import pi, sin

cdef class Demand():
  def __init__(self, name, demand_params):
    self.name = name
    # amplitude, phase, shift, & noise std for sine wave of demands
    self.demand_amp, self.demand_phase, self.demand_shift, self.demand_noise = demand_params
    # save 2pi/365 as double to save conversion time
    self.days_to_radians = 2. * pi / 365.

  ### demand function
  cdef double demand_t(self, double t):
    cdef double noise, demand_t, sin_t
    
    noise = gauss(0., self.demand_noise)
    sin_t = sin((t - self.demand_phase) * self.days_to_radians)
    demand_t = self.demand_amp * sin_t + self.demand_shift + noise
    return demand_t

This is similar to Demand.py in the Python version, other than a few main differences. Cython classes are defined with “cdef” rather than “def”. Within a class, we can also define its methods using the cdef keyword. Similar to the Fibonacci sequence example in my earlier blog post, we provide type information for the return value and arguments in the function definition for demand_t(). Lastly, we can provide type information for any variables used within the function, such as noise, again using the cdef signifier.

It is not necessary to provide this information for every variable and function. For example, the __init__ function in general cannot be defined with cdef. However, the more type information you can provide, the more Cython will be able to speed up your code. Inspecting the annotated html that Cython outputs (see the first post in this series), as well as code profiling (see this blog post by Antonia Hadjimichael), can help to figure out which variables and methods are the most important bottlenecks to speed up, and which have diminishing marginal returns and can be left as untyped Python variables/methods.

You may have noticed that I did not provide static type information for class attributes (e.g., self.name, self.demand_amp). These variables are defined and typed in a separate file, Demand.pxd.

cdef class Demand():

  cdef:
    public str name

    public double demand_amp, demand_phase, demand_shift, demand_noise, days_to_radians

  cdef double demand_t(self, double t)

A .pxd “definition file” is the Cython equivalent of a header file in C; it defines attributes and methods of the class and gives type information. Any class that has a definition file will create a C extension type. Variables and methods in the definition file can only be accessed by other Cython scripts, rather than pure Python code. Cython also has a third kind of function definition, “cpdef”, which can be called from either Cython or Python. This can be useful if your application needs to be interoperable with Cython and Python, but will not perform as efficiently as cdef methods.

One important thing to notice is that each variable belonging to the class should be preceded by “public” if we want the attribute to be accessible to other objects. Also note that only cdef-defined functions should be included here, rather than pure Python functions.

The Reservoir class is Cythonized in a similar way, and the interested reader is referred to the GitHub repository to see the changes in detail. Moving to the Model class, there are several important differences to note in the .pxd file.

from Reservoir cimport Reservoir

cdef class Model():

  cdef:
    public int years, days

    public double tot_storage

    public bint plot

    public list reservoir_list

    public dict output

    public Reservoir reservoir_upper, reservoir_lower
  
  cdef double run(self) except *

First, notice that the Reservoir class is imported with the cimport command (“Cython import”) rather than import, since it is now an extension type. This should be done in both the .pyx and .pxd files. Second, I have included “except *” after the parentheses in the definition of the run() method. This instructs our compiled C code to pass exceptions through to the Python runtime. Otherwise, the program will try to treat errors as warnings and keep running, which makes it challenging to debug and verify behavior. This should be included in both the .pyx and .pxd files. Third, bint is the C equivalent of the bool type in Python.

The last important change in the Cython version is the inclusion of a new file, main.pyx. As noted previously, Cython extension types can only be imported using cimport from Cython, rather than pure Python code. For this reason, we can no longer directly import and run the Model class as we did in the pure Python implementation above. Instead, main.pyx is used as a go-between. This script is Cythonized (note the .pyx ending), but is not an extension type since it does not include a definition file (.pxd) with cythonized methods or attributes. Thus the file is allowed to import extension types (since it is a Cython file), while also remaining importable itself into pure Python (since it is not an extension type).

from Model cimport Model

cdef class main():
  cdef public Model model

  def __init__(self, int years, bint plot, int seed):
    # initialize model
    self.model = Model(years = years, plot = plot, seed = seed)

    # run simulation
    self.model.tot_storage = self.model.run()

    # plot results
    if self.model.plot:
      self.model.plot_results()

The main class can now be imported in pure Python to run the simulation, in much the same way as we imported and ran the Model class before.

from main import main

# run model
main_obj = main(years = 5, plot = True, seed = 101)

# verify that final storage is the same
print(main_obj.model.tot_storage)

For both py_fast and cy versions, we can verify that the final storage at the end of five years is 12,066. However, as we shall see later in this post, the Cythonized version completes the procedure more quickly.

NumPy and Memory Views

Most scientific programs rely heavily on NumPy arrays for numeric computation. For example, the py_numpy version of the Model class stores time series outputs in a NumPy array rather than nested dictionaries and lists. This output array is initialized in the class constructor, and then populated with outputs as the model is run (compare this to the original Python version above).

class Model():
  def __init__(self, years, plot, seed):    
    
    ...

    self.output = np.empty((num_reservoirs * self.num_step_outputs, self.days + 1))
    for i in range(len(reservoir_list)):
      self.output[i * self.num_step_outputs + self.num_step_outputs - 1, 0] = reservoir_list[i].storage


  def run(self):
    for t in range(1, self.days + 1):
      t_run = t - 1

      ### step upper
      self.output[:7, t] = self.reservoir_upper.step(t_run, 0.)

      ### step lower, using upper release as inflow
      self.output[7:, t] = self.reservoir_lower.step(t_run, self.output[4, t])

    # return total storage last time step to make sure versions are equivalent
    return self.output[6, -1] + self.output[-1, -1]

Because a NumPy array is not a native Python type (e.g., float, list), it cannot be used as a static type for Cython (i.e., we can’t simply define an array arr with “cdef np.array arr“). However, we can use “memoryviews,” which are a more general form of typed memory buffer than NumPy (and are, in fact, what NumPy is based on). The nuances of memoryviews can be a bit challenging, and a full explanation is beyond the scope of this post. But I will give a simple example usage for Cython, and the interested reader is referred to the docs here and here for more information.

Memoryviews are created by designating the type of variable to go inside the array (e.g., double), followed by square brackets with colons and commas dictating the number of dimensions. For example, in the cython_numpy version of Model.pxd, we define the 2-dimensional memory view to store results as follows:

cdef class Model():

  cdef:
    
    ...

    public double[:,:] output

Now in the model constructor, we initialize the memory view as follows:

    output_np = np.empty((num_reservoirs * self.num_step_outputs, self.days + 1))
    self.output = output_np

    for i in range(len(reservoir_list)):
      self.output[i * self.num_step_outputs + self.num_step_outputs - 1, 0] = reservoir_list[i].storage

The only difference from the Pythonic implementation is that we have to split the creation of the memoryview into two steps; first we create a NumPy array (output_np), then we set the memoryview (np.output) to correspond to the block of memory holding output_np. Once we have created the memoryview, we can access and reassign values using standard NumPy syntax. For example, here is the part of the Model.run() function that steps the upper reservoir.

      (inflow, upstream_release, min_flow, demand, release, delivery, storage) = self.reservoir_upper.step(t_run, 0.)
      self.output[0, t] = inflow
      self.output[1, t] = upstream_release
      self.output[2, t] = min_flow
      self.output[3, t] = demand
      self.output[4, t] = release
      self.output[5, t] = delivery
      self.output[6, t] = storage

Again, this has to be split into two steps; first call the reservoir’s step function and receive the output as a tuple of variables, and then reset each value in the memoryview using this data. We cannot simply set the entire slice self.output[:7, t] directly using the tuple returned by the function, like we did in python_numpy, because Cython is expecting an array rather than a tuple. It is possible to convert the tuple into a NumPy array and then reassign the slice using this NumPy array, but this was found to be less efficient when I timed different versions. Alternatively, we can alter the definition of Reservoir.step() to return a memoryview rather than a tuple, which allows the Model class to reassign values in the memoryview directly from the function call. However, this too was found to be less efficient, as we will see in the next section. The interested reader is referred to the cy_numpy_reservoir version of the model for implementation details.

Testing the performance of different versions

Now that I have described how to Cythonize your classes and NumPy arrays, you may be wondering, “Ok, but was it worth it? How much does this speed up the code?” We can answer this question with a timing experiment. The reservoir_sim directory in the GitHub repository contains nine versions of the code, summarized as follows:

  1. py_slow: My first attempt at creating the example reservoir simulation model in Python, which contains several bottlenecks.
  2. py_fast: An improved Python-only model after eliminating bottlenecks, used as a baseline for improvement.
  3. py_numpy: Python-only version that uses a NumPy array to store the time series of results.
  4. py_numpy_reservoir: Python-only version that uses NumPy arrays to store results inside the daily time step as well.
  5. py_cy: A copy of py_fast, which is then Cythonized without explicitly adding static type information.
  6. cy: Updates to py_fast to add static type information.
  7. cy_numpy: Similar to py_numpy, but Cythonized with static types and memory views.
  8. cy_numpy_reservoir: Similar to py_numpy_reservoir, but Cythonized with static types and memory views.
  9. cy_numpy_noCheck: Copy of cy_numpy, with the additional step of disabling Python’s automatic bounds checking and index wrapping.

This directory also includes two scripts: time_versions.sh will first Cythonize all the Cython versions of the model, and then run a timing experiment with all versions, using the script time_sim.py. The experiment will output the relative efficiency of each version, averaged over 4 trials of 20 simulations, each of which is 50 years long.

Starting timings

py_slow: answer = 10494.223405436973, time = 1.1308925539487973, efficiency  = 1.0

py_fast: answer = 9851.320332349133, time = 0.26572948903776705, efficiency relative to py_slow = 4.25580374253483

py_numpy: answer = 9851.320332349133, time = 0.3013913669856265, efficiency relative to py_fast = 0.8816758479032341

py_numpy_reservoir: answer = 9851.320332349133, time = 0.38436312798876315, efficiency relative to py_fast = 0.6913501053762255

py_cy: answer = 9851.320332349133, time = 0.23376567207742482, efficiency relative to py_fast = 1.1367344344286598

cy: answer = 9851.320332349133, time = 0.13966371200513095, efficiency relative to py_fast = 1.9026380240273488

cy_numpy: answer = 9851.320332349133, time = 0.09388123091775924, efficiency relative to py_fast = 2.8304857791068843

cy_numpy_reservoir: answer = 9851.320332349133, time = 0.22499373101163656, efficiency relative to py_fast = 1.181052857975068

cy_numpy_noCheck: answer = 9851.320332349133, time = 0.09497074002865702, efficiency relative to py_fast = 2.7980143037485474

Here are the major takeaways from comparing the speedups across versions:

  1. Make sure your Python code is optimized prior to using Cython. py_slow was 4.3x more efficient than py_fast. The only major difference between these is that in the former, I used the numpy.random module to generate stochastic realizations each day, whereas in the latter I used the basic random module. This small change led to significant gains due to avoiding the extra overhead of the NumPy generator (this is also why these versions got different answers, due to different random number generators). The point here is that major gains can often be made with very little work by finding and eliminating bottlenecks. As mentioned above, code profiling can be extremely helpful to make sure that you have eliminated major bottlenecks prior to implementing larger changes in Cython. Note that py_fast is used as the baseline comparison for all other versions, rather than py_slow. (For any of you thinking I should have just used NumPy to pre-generate an entire array of random numbers rather than generating them one-by-one each day – yes you are right, that would have probably been faster. But I set it up like this to provide a more challenging example of an object-oriented model with function calls between nested objects).
  2. NumPy arrays are not always faster. Like many scientists/engineers, I tend to rely on NumPy a lot, both due to their convenience and due to the fact that they are often praised for speed (NumPy is largely written in C, after all). However, in some cases this can actually degrade performance due to the larger overhead of creating NumPy arrays relative to lists. The py_numpy version that uses NumPy to hold the results was actually 12% less efficient than the baseline version using nested dictionaries and lists. Moreover, py_numpy_reservoir, which uses a new NumPy array to store results for each reservoir each day, is 31% slower than baseline. This is not to say NumPy arrays can’t be beneficial – they are extremely fast at vectorized numerical operations. But for situations with smaller arrays and/or minimal opportunity for vectorized computation, they may not be worth the overhead.
  3. Cythonizing Python code can improve performance. The py_cy version, which does not include static type information and is identical to py_fast except for the Cythonization, leads to a 20% efficiency gain. This is not huge, but could be meaningful and requires little extra work. Meanwhile, the cy version with statically typed Cython extension types has a larger efficiency gain of 90%.
  4. Cython and NumPy can be synergistic if used properly. The cy_numpy version further improves performance, with an efficiency of 2.8 relative to py_fast. Interestingly, this version is identical to py_numpy other than the Cython-related changes, but is found to lead to significant gains whereas the former leads to degraded performance. Cython and memoryviews allow us to interface more directly with NumPy’s underlying C code, reducing the Python-related overhead. However, this is not universally true – cy_numpy_reservoir is found to degrade performance significantly relative to the other Cythonized versions, suggesting that the creation of NumPy arrays within the inner daily step routine is not worth the overhead.
  5. Disabling certain Python features may not be worthwhile. The last version, cy_numpy_noCheck, builds on cy_numpy by adding two compiler directives to model.run(): @boundscheck(False) and @wraparound(False). The first tells Cython to disable Python’s automatic bounds checking for lists, tuples, etc., while the second disables the ability to count backwards when indexing (e.g., arr[-3:]). These directives are meant to allow for further efficiency gains by reducing the Pythonic overhead. However, in this particular case, it isn’t found to have a meaningful impact. Since these directives can lead to mysterious bugs for users accustomed to Python syntax, they should be used with caution, and only after verifying that they actually create efficiency gains that are worth the risk.

Overall, Cython is found to improve the efficiency of this reservoir simulation model by 2-3x on my laptop. Unfortunately, this is much lower than the 20-30x speedups observed for the Fibonacci sequence example in the last blog post. This is due to the more complex operations and object-oriented structure of the simulation model, which leads to more interactions with the Python interpreter compared to the tight numerical loop of the Fibonacci generator. However, a 2-3x efficiency gain can still be very meaningful, especially in situations involving large ensembles of expensive simulations in parallel computing environments. In general, the potential gain from Cython, and whether or not this is worth the trouble, will depend on factors such as the structure of the model, its runtime, and the skill level of model developers and users.

Further reading

If this blog post has piqued your interest and you want to apply Cython to your own research, you might try looking through more advanced water resources simulation models that make use of Cython, such as Pywr or the California Food-Energy-Water System (CALFEWS).

For more general information, of course see the Cython docs. I also highly recommend Kurt W. Smith’s O’Reilly Media book, “Cython: A Guide for Python Programmers” for those wanting a deeper understanding. It also looks like the author also has a tutorial, but I haven’t tried it yet. Lastly, for information on how to improve the efficiency of your Python code more generally (prior to or in lieu of using Cython), I recommend Micha Gorelick and Ian Ozsvald’s O’Reilly Media book, “High Performance Python: Practical Performant Programming for Humans.”

Leave a comment