Introduction to Unit Testing

Motivation

Here is something that has happened to most or all of us engineers who code. We write a function or a class and make sure it works by running it and printing some internal state output on the console. The following week, after adding or making improvements on seemingly unrelated parts of the code, the results start to look weird. After a few hours of investigation (a.k.a. wasted time) we find out the reason for the weird results is that that first function or class is either not giving us the right numbers anymore or not behaving as expected with different inputs. I wonder how many papers would have been written, how many deadlines would have been met, and how many “this $@#%& does not @#$% work!!!” thoughts and comments to friends and on our codes themselves would have been avoided if we managed to get rid of such frustrating and lengthy situations.

In this post I will introduce a solution to this rather pervasive issue: unit testing, which amounts to ways of creating standard tests throughout the development of a code to avoid the mentioned setbacks and simplify debugging. I will also get into some of the specifics of unit testing for Python and demonstrate it with a reservoir routing example. Lastly, I will briefly mention a unit testing framework for C++.

The General Idea

Unit test frameworks are libraries with functions (or macros) that allow for the creation of standard tests for individual parts of code, such as function and classes, that can be run separately from the rest of code. The advantages of using Unit Testing are mainly that (1) it eliminates the need of running the entire code with a full set of inputs simply to see if a specific function or class is working, which in some cases can get convoluted take a really long time, (2) it allows for easy and fast debugging, given that if a certain test fails you know where the problem is, (3) it prevents you or someone else from breaking the code down the road by running the tests frequently as the code is developed, and (4) it prevents an error in a function or class to be made noticible in another, which complicates the debugging process. Of course, the more modular a code is the easier it is to separately test its parts — a code comprised of just one 1,000-lines long function cannot be properly unit-tested, or rather can only have one unit test for the entire function.

Example Code to be Tested

The first thing we need to perform unit testing is a code to be tested. Let’s then consider the reservoir routing code below, comprised of a reservoir class, a synthetic stream flow generation function, a function to run the simulation over time, and the main.:

import numpy as np
import matplotlib.pyplot as plt

class Reservoir:
    __capacity = -1
    __storage_area_curve = np.array([[], []])
    __evaporation_series = np.array([])
    __inflow_series = np.array([])
    __demand_series = np.array([])
    __stored_volume = np.array([])

    def __init__(self, storage_area_curve, evaporations, inflows, demands):
        """ Constructor for reservoir class

        Arguments:
            storage_area_curve {2D numpy matrix} -- Matrix with a
            row of storages and a row of areas
            evaporations {array/list} -- array of evaporations
            inflows {array/list} -- array of inflows
            demands {array/list} -- array of demands
        """

        self.__storage_area_curve = storage_area_curve
        assert(storage_area_curve.shape[0] == 2)
        assert(len(storage_area_curve[0]) == len(storage_area_curve[1]))

        self.__capacity = storage_area_curve[0, -1]

        n_weeks = len(demands)
        self.__stored_volume = np.ones(n_weeks, dtype=float) * self.__capacity
        self.__evaporation_series = evaporations
        self.__inflow_series = inflows
        self.__demand_series = demands

    def calculate_area(self, stored_volume):
        """ Calculates reservoir area based on its storave vs. area curve

        Arguments:
            stored_volume {float} -- current stored volume

        Returns:
            float -- reservoir area
        """

        storage_area_curve_T = self.__storage_area_curve.T .astype(float)

        if stored_volume  self.__capacity:
            print "Storage volume {} graeter than capacity {}.".format(
                stored_volume, self.__capacity)
            raise ValueError

        for i in range(1, len(storage_area_curve_T)):
            s, a = storage_area_curve_T[i]
            # the &st; below needs to be replace with "smaller than" symbol. WordPress code highlighter has a bug that was distorting the code because of this "smaller than."
            if stored_volume &st; s:
                sm, am = storage_area_curve_T[i - 1]
                return am + (stored_volume - sm)/ (s - sm) * (a - am)

        return a

    def mass_balance(self, upstream_flow, week):
        """ Performs mass balance on reservoir

        Arguments:
            upstream_flow {float} -- release from upstream reservoir
            week {int} -- week

        Returns:
            double, double -- reservoir release and unfulfilled
            demand (in case reservoir gets empty)
        """

        evaporation = self.__evaporation_series[week] *\
            self.calculate_area(self.__stored_volume[week - 1])
        new_stored_volume = self.__stored_volume[week - 1] + upstream_flow +\
            self.__inflow_series[week] - evaporation - self.__demand_series[week]

        release = 0
        unfulfiled_demand = 0

        if (new_stored_volume > self.__capacity):
            release = new_stored_volume - self.__capacity
            new_stored_volume = self.__capacity
        elif (new_stored_volume < 0.):
            unfulfiled_demand = -new_stored_volume
            new_stored_volume = 0.

        self.__stored_volume[week] = new_stored_volume

        return release, unfulfiled_demand

    def get_stored_volume_series(self):
        return self.__stored_volume

def run_mass_balance(reservoirs, n_weeks):
    upstream_flow = 0
    release = 0
    total_unfulfilled_demand = 0
    for week in range(1, n_weeks):
        for reservoir in reservoirs:
            release, unfulfilled_demand = reservoir.mass_balance(release, week)

def generate_streamflow(n_weeks, sin_amplitude, log_mu, log_sigma):
    """Log-normally distributed stream flow generator.

    Arguments:
        n_weeks {int} -- number of weeks of stream flows
        sin_amplitude {double} -- amplitude of log-mean sinusoid fluctuation
        log_mu {double} -- mean log-mean
        log_sigma {double} -- log-sigma

    Returns:
        {Numpy array} -- stream flow series
    """

    streamflows = np.zeros(n_weeks)

    for i in range(n_weeks):
        streamflow = np.random.randn() * log_sigma + log_mu *\
            (1. + sin_amplitude * np.sin(2. * np.pi / 52 * (i % 52)))
        streamflows[i] = np.exp(streamflow)

    return streamflows

if __name__ == "__main__":
    np.random.seed(41)
    n_weeks = 522
    sin_amplitude, log_mean, log_std = 1., 2.0, 1.8
    streamflows = generate_streamflow(n_weeks, sin_amplitude, log_mean, log_std)
    storage_area_curve = np.array([[0, 1000, 3000, 4000], [0, 400, 600, 900]])

    reseroir1 = Reservoir(storage_area_curve, np.random.rand(n_weeks) / 10,\
        streamflows, np.random.rand(n_weeks) * 60)

    run_mass_balance([reseroir1], n_weeks)

    plt.plot(reseroir1.get_stored_volume_series())
    plt.xlabel("Weeks [-]")
    plt.ylabel("Stored Volume [MG]")
    plt.title("Storage Over Time")
    plt.ylim(0, 4100)
    plt.show()

We have quite a few functions to be tested in the code above, but lets concentrate on Reservoir.calculate_area and generate_streamflow. The first is a function within a class (Reservoir) while the second is a standalone function. Another difference between both functions is that we know exactly what output to expect from Reservoir.calculate_area (area given storage), while given generate_stream flow is setup to return random numbers we can only test their estimated statistical properties, which will change slightly every time we run the function. We will use the PyTest framework to perform unit testing in the reservoir routing code above.

PyTest

PyTest is one among many available unit test frameworks for Python. The reason for its choice is that PyTest is commonly used by Python developers. You can find the API (list of commands) here and examples more here.

Basic Work Flow

Using PyTest is pretty simple. All you have to do it:

  • Install PyTest with pip or conda (pip install pytest or conda install pytest)
  • Create a file (preferably in the same directory as the file with the code you want to test) whose name either begins or end with “test_” or “_test.py,” respectively — this is a “must.” In this example, the main code is called “reservoir_routing.py,” so I created the PyTest file in the same directory and named it “test_reservoir_routing.py.”
  • In the unit test file (“test_reservoir_routing.py”), import from the main code file the functions and classes you want to test, the PyTest module pytest and any other packages you may use to write the tests, such as Numpy.
  • Write the tests. Tests in PyTest are simply regular Python functions whose names begin with “test_” and that have assertions (“assert,” will get to that below) in them.
  • On the Command Prompt, PowerShell, or Linux terminal, run the command “pytest.” PyTest will then look for all your files that begin with “test_” or end in “_test.py,” scan them for functions beginning with “test_,” which indicate a test function, run them and give you the results.

Assertions

Assertions are the actual checks, the cores of the tests. Assertions are performed with the assert command and will return “pass” if the condition being asserting is true and “fail” otherwise. Below is an example of an assertion, which will return “pass” if the function being tested returns the value on the right-hand-side for the given arguments or “false” otherwise:

assert function_calculate_something(1, 'a', 5) == 100

Sometimes we are not sure of the value a function will return but have an approximate idea. This is the case of functions that perform stochastic calculations or calculate numerically approximate solutions — e.g. sampling from distributions, Newton-Rhapson minimum finder, E-M algorithm, finite-differences PDE solvers, etc. In this case, we should perform our assertion against an approximate value with:

# assert if function returns 100 +- 2
assert function_calculate_something(1, 'a', 5) == pytest.approx(100., abs=2.)

or with:

# assert if function returns 100 +- 0.1 (or 100 +- 100 * 1e-3)
assert function_calculate_something(1, 'a', 5) == pytest.approx(100., rel=1e-3)

Sometimes (although, unfortunately, rarely) we add code to a function to check the validity of the arguments, intermediate calculations, or to handle exceptions during the function’s execution. We can also test all that with:

# Test if an exception (such as Exception) are properly raised
with pytest.raises(Exception):
    function_calculate_something(1345336, 'agfhdhdh', 54784574)

Test File Example

Putting all the above together, below is a complete PyTest file with two tests (two test functions, each with whatever number of assertions):

from reservoir_routing import Reservoir, generate_streamflow
import numpy as np
import pytest

def test_calculate_area():
    n_weeks = 522
    storage_area_curve = np.array([[0, 500, 800, 1000], [0, 400, 600, 900]])

    reseroir1 = Reservoir(storage_area_curve, np.random.rand(n_weeks),
                            np.zeros(n_weeks), np.random.rand(n_weeks) * 20)

    # Test specific values of storages and areas
    assert reseroir1.calculate_area(500) == 400
    assert reseroir1.calculate_area(650) == 500
    assert reseroir1.calculate_area(1000) == 900

    # Test if exceptions are properly raised
    with pytest.raises(ValueError):
        reseroir1.calculate_area(-10)
        reseroir1.calculate_area(1e6)

def test_generate_streamflow():
    n_weeks = 100000
    log_mu = 7.8
    log_sigma = 0.5
    sin_amplitude = 1.
    streamflows = generate_streamflow(n_weeks, sin_amplitude, log_mu, log_sigma)

    whitened_log_mu = (np.log(streamflows[range(0, n_weeks, 52)]) - log_mu) / log_sigma

    # Test if whitened mean in log space is 0 +- 0.5
    assert np.mean(whitened_log_mu) == pytest.approx(0., abs=0.05)

If I open my Command Prompt, navigate to the folder where both the main code and the test Python files are located and run the command pytest I will get the output below:

F:\Dropbox\Bernardo\Blog posts>pytest
============================= test session starts =============================
platform win32 -- Python 2.7.13, pytest-4.2.0, py-1.7.0, pluggy-0.8.1
rootdir: F:\Dropbox\Bernardo\Blog posts, inifile:
collected 2 items

test_reservoir_routing.py F.                                             [100%]

================================== FAILURES ===================================
_____________________________ test_calculate_area _____________________________

    def test_calculate_area():
        n_weeks = 522
        storage_area_curve = np.array([[0, 500, 800, 1000], [0, 400, 600, 900]])

        reseroir1 = Reservoir(storage_area_curve, np.random.rand(n_weeks),
                                np.zeros(n_weeks), np.random.rand(n_weeks) * 20)

        # Test specific values of storages and areas
        assert reseroir1.calculate_area(500) == 400
>       assert reseroir1.calculate_area(650) == 500
E       assert 400 == 500
E        +  where 400 = (650)
E        +    where  = .calculate_area

test_reservoir_routing.py:14: AssertionError
========================== deprecated python version ==========================
You are using Python 2.7.13, which will no longer be supported in pytest 5.0
For more information, please read:
  https://docs.pytest.org/en/latest/py27-py34-deprecation.html
===================== 1 failed, 1 passed in 1.07 seconds ======================

F:\Dropbox\Bernardo\Blog posts>

This seems like a lot just to say that one test passed and the other failed, but there is actually some quite useful information in this output. The lines below (20 and 21 of the output above) indicate which assertion failed, so you can go ahead and debug your the corresponding specific part of code with a debugger:

>       assert reseroir1.calculate_area(650) == 500
E       assert 400 == 500

I actually found this error with PyTest as I was writing the reservoir routing code for this post and used VS Code, which has a debugger integrated with PyTest, to debug and fix the error. The error is happening because of an int to float conversion that is not being done correctly in line 45 of the original code, which can be replaced by the version below to fix the problem:

storage_area_curve_T = self.__storage_area_curve.T<strong>.astype(float)</strong>

The last piece of important information contained in the last few lines of the output is that support for Python 2.7 will be discontinued, as it is happening with other Python packages as well. It may be time to finally switch to Python 3.

Practice Exercise

As a practice exercise, try writing a test function for the Reservoir.mass_balance function including cases in which the reservoir spills, gets empty, and is partly full.

Unit Testing in C++

A framework for unit testing in C++ that is quite popular is the Catch2 framework. The idea is exactly the same: you have a code you want to test and use the framework to write test functions. An example of Catch2 being used to check a factorial function is shown below, with both the main and the test codes in the same file:

#define CATCH_CONFIG_MAIN  // This tells Catch to provide a main() - only do this in one cpp file
#include "catch.hpp"

unsigned int Factorial( unsigned int number ) {
    return number <= 1 ? number : Factorial(number-1)*number;
}

TEST_CASE( "Factorials are computed", "[factorial]" ) {
    REQUIRE( Factorial(1) == 1 );
    REQUIRE( Factorial(2) == 2 );
    REQUIRE( Factorial(3) == 6 );
    REQUIRE( Factorial(10) == 3628800 );
}

Conclusion

Setting up and running unit testing is not rocket science. Also, with my test functions I do not have to use my actual reservoir system test case in order to check if the surface area calculations are correct. If I find they are not, I can just re-run the test function with a debugger (VS Code is great for that) and fix my function without having to deal with the rest of the code. You would be surprised by how many hours of debugging can be saved by taking the unit test approach. Lastly, be sure to run your tests after any change to your code you consider minimally significant (I know, this sounds vague). You never know when you mess something up by trying to fix or improve something else.

2 thoughts on “Introduction to Unit Testing

  1. Pingback: Performing Experiments on HPC Systems – Water Programming: A Collaborative Research Blog

  2. Pingback: Writing sharable Python code part II: Unit Testing – Water Programming: A Collaborative Research Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s