Infrastructure Investment Selection as a Two-Stage Stochastic Programming Problem (Part 2)

In this post, we will continue where we left off from Part 1, in which we set up a two-stage stochastic programming problem to identify the optimal decision for the type of water supply infrastructure to build. As a recap, our simple case study involves a small water utility servicing the residents of Helm’s Keep (population 57,000) to identify the following:

  • The best new water supply infrastructure option to build
  • Its final built capacity
  • The total bond payment to be made
  • The expected daily deliveries that ‘ its final built capacity
  • The expected daily deliveries that it needs to meet

To identify these values, we will be setting up a two-stage stochastic problem using the Python cvxpy library. In this post, we will first review the formulation of the problem, provide information on the installation and use of the cvxpy library, and finally walk through code to identify the optimal solution to the problem.

Reviewing the infrastructure investment selection problem

In our previous post, we identified that Helm’s Keep would like to minimize their infrastructure net present cost (INPC), giving the following objective function:

min  f_{INPC} = \sum_{i=1}^{N}\sum_{s=1}^{S}p_s\sum_{y=1}^{30}\frac{PMT_{s,i}}{(1+d_{s,i})^y}

where

  • N=3 and S = 3 are the total number of infrastructure options and potential future scenarios to consider
  • p_s is the probability of occurrence for scenario s \in S
  • y is one year within the entire bond term T =[1,30]
  • PMT is the total bond payment, or bond principal
  • d_{s,i} is the discount rate in scenario s for infrastructure option i

In achieving this objective, Helm’s Keep also has to abide by the following constraints:

y_1 + y_2 + y_3 = 1

x_i \leq My_i

D_{s,i} \geq \frac{\delta_{s}}{1-MAX(\rho)}

D_{s,i} \leq \sum_{i=1}^{3}\frac{x_i}{1-\rho_i}

D_{s,i} \leq 8

PMT_{s,i} \leq 1.25R_{s,i}

where

  • x_i is the final built capacity of infrastructure option i
  • y_i is a binary [0,1] variable indicating if an infrastructure option is built (1) or not (0)
  • \delta_{s} is the daily demand in scenario s
  • D_{s,i} is the daily water deliveries from infrastructure option i in scenario s
  • \rho_i is the ratio of non-revenue water (NRW) if infrastructure option i is built
  • R_{s,i} is the net revenue from fulfilling demand (after accounting for NRW) using infrastructure option i in scenario s

For the full formulations of PMT and R, please refer to Part 1 of this tutorial.

In this problem, we our first-stage decision variables are the infrastructure option y_i to build and its final built capacity x_i. Our second-stage decision variables are the daily water deliveries made D_{s,i} in each scenario s.

The CVXPY Python library

To solve this problem, we will be using Python’s cvxpy library for convex optimization. It is one of the many solvers available for performing convex optimization in Python including Pyomo, as demonstrated in Trevor’s earlier blog post. Some other options options include PuLP, scipy.optimize, and Gurobi. For the purposes of our specific application, we will be using cvxpy as it can interpret lists and dictionaries of variables, constraints, or objectives. This allows direct appending and retrieval of said objects. This feature therefore makes it convenient to use with for-loops, which are useful in the case of two- or multi-stage stochastic problems where the decision variable space can exponentially grow with the number of scenarios or options being considered. You can find an introduction to cvxpy, documentation, and examples at the CVXPY website.

If you use pip, you can install cvxpy using the following:

pip install cvxpy

To install specific solver names, you can alternatively install cvxpy using

pip install cvxpy[CBC,CVXOPT,GLOP,GLPK,GUROBI,MOSEK,PDLP,SCIP,XPRESS]

where you can substitute the names of any convex optimization solver in the square brackets.

If you use conda, install cvxpy using

conda create --name cvxpy_env
conda activate cvxpy_env
conda install -c conda-forge cvxpy

Once you’ve installed cvxpy, you’re ready to follow along the next part!

Solving the two-stage stochastic programming problem

First, we import the cvxpy library into our file and define the constants and helper functions to calculate the values of PMT and R

import cvxpy as cp

# define constants
households = 57000  # number of households
T = 30  # bond term (years) for an infrastructure option 
UR = 0.06 # the uniform water rate per MGD

def calculate_pmt(br, C_i, y_i, VCC_i, x_i, T=30):
    """Calculate the annual payment for a loan.

    Args:
        br (float): The annual interest rate.
        C_i (float): capital cost of infrastructure option i 
        y_i (boolean): 1 if option i is built, 0 if not
        VCC_i (float): variable capital cost of infrastructure option i
        x_i (float): final built capacity of infrastructure option i
        T (const): T=30 years, the bond term in years.

    Returns:
        float: The annual payment amount.
    """
    # calculate p_i, the bond principal (total value of bond borrowed) for
    # infrastructure option i
    p_i = (C_i*y_i) + (VCC_i * x_i)

    # Convert the annual interest rate to a monthly rate.
    pmt = p_i*(br*(1+br)**T)/(((1+br)**T)-1)

    # Calculate the monthly payment.
    return pmt

def calculate_R(D_i, rho_i, VOC_i, households=57000, UR=0.06):
    """
    Calculates the potential net revenue from infrastructure option i in a given scenario.

    Args:
        D_i (float): per-capita daily water demand in MGD
        rho_i (float): percentage of water lost during transmission (non-revenue water, NRW) from infrastructure i to the city
        VOC_i (float): variable operating cost of i
        households (const): households=57000 the number of households in the city
        UR (const): UR=0.06/MGD the per-household uniform water rate

    Returns:
        R_i (float): The potential net revenue from infrastructure option i in a given scenario.
    """
    OC_i = VOC_i * (D_i/rho_i)
    R_i = ((D_i * UR * households)*(1-rho_i)) - OC_i
    return R_i

Then, we define all the variables required. We will first define the first-stage decision variables:

# Infrastructure options as boolean (1, 0) variables
y1 = cp.Variable(boolean = True, name='y1')
y2 = cp.Variable(boolean = True, name='y2')
y3 = cp.Variable(boolean = True, name='y3')

# capacity of each infrastructure option
x1 = cp.Variable(nonneg=True, name = 'x1')
x2 = cp.Variable(nonneg=True, name = 'x2')
x3 = cp.Variable(nonneg=True, name = 'x3')

# infrastructure option parameters
C = [15.6, 11.9, 13.9] # capital cost of each infrastructure option in $mil
VCC = [7.5, 4.7, 5.1] # variable capital cost of each infrastructure option in $mil/MGD capacity
VOC = [0.2, 0.5, 0.9] # variable operating cost of each infrastructure option in $mil/MGD
NRW = [0.2, 0.1, 0.12] # non-revenue water (NRW) for each infrastructure option

Next, we define the second-stage decision variable and the parameter values related to each potential scenario:

# volume of water delivered to the city in each scenario
D = {}
for i in range(3):
    D[i] = cp.Variable(nonneg=True)

P = [0.35, 0.41, 0.24]  # probability of each scenario
demand_increase = [4.4, 5.2, 3.9]  # per-capita daily demand increase in MGD
bond_rate = [0.043, 0.026, 0.052]  # bond rate in each scenario
discount_rate = [0.031, 0.026, 0.025]  # discount rate in each scenario

Note the for loop in the Lines 2-4 of the code snippet above. The cvxpy enables variables to be added to and accessed via a dictionary that allows access via both explicit and in-line for loop, as we will see below in the objective function code definition:

min_inpc = sum(P[s]*sum((calculate_pmt(bond_rate[s], C[0], y1, VCC[0], x1)/((1+discount_rate[s])**t)) for t in range(1,T+1)B) for s in range(3)) + \
    sum(P[s]*sum((calculate_pmt(bond_rate[s], C[0], y2, VCC[1], x2)/((1+discount_rate[s])**t)) for t in range(1,T+1)) for s in range(3)) + \
    sum(P[s]*sum((calculate_pmt(bond_rate[s], C[2], y3, VCC[2], x3)/((1+discount_rate[s])**t)) for t in range(1,T+1)) for s in range(3))

Some explanation is required here. Our goal is to find the minimum INPC required to build the supply required to meet potential demand growth. Our objective function formulation therefore is the sum of the INPC of all three potential infrastructure options, each calculated across the three scenarios. As the y_i variable is binary, only the sum across the three scenarios that requires the optimal infrastructure option will be chosen.

To constrain the solution space of our objective function, we define our constraints. Below, we can see the application of the ability of the cvxpy library that allows constraints to be added iteratively to a list:

constraints = []

# set an arbitrarily large value for M
M = 10e12

for s in range(3):
    constraints += [D[s] >= demand_increase[s]]   # daily water deliveries must be more than or equal to daily demand increase
    constraints += [D[s] <= ((x1/0.1) + (x2/0.1) + (x2/0.12))/1.2]   
    constraints += [calculate_pmt(bond_rate[s], C[0], y1, VCC[0], x1) <= 1.25*calculate_R(demand_increase[s], NRW[0], VOC[0])]
    constraints += [calculate_pmt(bond_rate[s], C[1], y2, VCC[1], x2) <= 1.25*calculate_R(demand_increase[s], NRW[1], VOC[1])]
    constraints += [calculate_pmt(bond_rate[s], C[2], y3, VCC[2], x3) <= 1.25*calculate_R(demand_increase[s], NRW[2], VOC[2])]

constraints += [y1 + y2 + y3 == 1]
constraints += [x1 <= M * y1]
constraints += [x2 <= M * y2]
constraints += [x3 <= M * y3]

Finally, we solve the problem using the Gurobi solver. This solver is selected as it comes pre-installed with the cvxpy library and does not require additional steps or licensing during installation. We also print the objective value and the solutions to the problem:

# set up the problem as a minimization
problem = cp.Problem(cp.Minimize(min_inpc), constraints)

# solve using Gurobi
problem.solve(solver=cp.GUROBI, verbose=False)

print(f'Optimal INPC: ${problem.value} mil' )
for variable in problem.variables():
  print(f"{variable.name()} = {variable.value}")

Obtaining the solutions

If you have closely followed the steps shown above, you would have identified that Helm’s Keep should build Infrastructure Option 3 (a new groundwater pumping station), to a total built capacity that allows total expected daily deliveries of 3.27MGD. This will result in a final INPC of USD$35.62 million. There are our first-stage decision variables.

In each scenario, the following daily deliveries (second-stage decision variables) should be expected:

ScenarioScenario probability (%)Demand increase (MGD)Daily deliveries (MGD)
1354.45.5
2415.26.5
3243.94.875

The values from the second and third column can be found in Part 1 of this tutorial. The final daily deliveries account for the maximum possible portion of NRW.

Let’s identify how much Helm’s Keep will require to pay in total annual bond payments and how much their future expected daily deliveries will be:

total_bond_payment = sum(P[s]*calculate_pmt(bond_rate[s], C[1], 1, VCC[1], x2.value) for s in range(3))
expected_daily_deliveries = sum(P[s]*D[s].value for s in range(3))

If you have closely followed the steps shown above, you would have obtained the following values:

Total annual bond paymentUSD$1.55 million
Expected total daily water deliveries5.76 MGD

Conclusion

Congratulations – you just solved a two-stage programming stochastic programming problem! In this post, we reviewed the content of Part 1, and provided a quick introduction to the cvxpy Python library and justified its use for the purpose of this test case. We also walked through the steps required to solve this problem in Python, identified that it should build a new groundwater pumping station with a 3.27MGD capacity. We also identified the total annual amount Helm’s Keep would have to pay annually to fulfill its debt service requirements, and how much it water would, on average, be expected to provide for its constituents.

I hope that both Parts 1 and 2 provided you with some information on what stochastic programming is, when to use it, and some methods that might be useful to approach it. Thank you for sticking around and happy learning!

An introduction to linear programming for reservoir operations (part 2) – Implementation with Pyomo

Introduction

Previously, in Part 1 I used a very simple reservoir operations scenario to demonstrate some linear programming (LP) concepts.

After some feedback on my initial formulation I went back and revised the formulation to make sure that (1) both reservoir releases and storage levels are optimized simultaneously and (2) the LP handles decisions over multiple timesteps (1,…,N) during optimization. Definitely look back at Part 1 for more context.

The current LP formulation is as follows:

In this post, I show a simple implementation of this LP using the Pyomo package for solving optimization problems in Python.

I have shared the code used in this demo in a repository here: Reservoir-LP-Demo


Constructing the LP model with Pyomo

While Pyomo can help us construct the LP model, you will need access to a separate solver software in order to actually run the optimization. I don’t get into the details here on how to set up these solvers (see their specific installation instructions), but generally you will need this solver to be accessible on you PATH.

Two solvers that I have had good experience with are:

As always, it’s best to consult the Pyomo documentation for any questions you might have. Here, I highlight a few things that are needed for our implementation.

We start by importing the pyomo.environ module:

import pyomo.environ as pyo

From this module we will need to use the following classes to help build our model:

  • pyo.ConcreteModel()
  • pyo.RangeSet()
  • pyo.Var()
  • pyo.Objective()
  • pyo.Constraint()
  • pyo.SolverFactory()

The nice thing about using pyomo rather than trying to manage the LP matrices yourself is that you can specify objectives and constraints as functions.

For example, the objective function is defined as:

# Objective Function
def objective_rule(m):
    return -sum((eta_0 * m.R[t]) + (m.S[t]/S_max*100) for t in m.T)

And a constraint used to enforce the lower limit of the storage mass balance can defined as:

def S_balance_lower(m, t):
    if t == 0:
        return m.S[t] + m.R[t] <= initial_storage + I[t] - D[t]
    return m.S[t] + m.R[t] <= m.S[t-1] + I[t] - D[t]

Rather than picking the full implementation apart, I present the entire function below, and encourage you to compare the code implementation with the problem definition above.

def pyomo_lp_reservoir(N, S_min, S_max, R_min, R_max, 
                       eta_0, I, D,  
                       initial_storage):

    # Model
    model = pyo.ConcreteModel()

    # Time range
    model.T = pyo.RangeSet(0, N-1)  

    # Decision Variables
    model.S = pyo.Var(model.T, bounds=(S_min, S_max))  # Storage
    model.R = pyo.Var(model.T, bounds=(R_min, R_max))  # Release

    # Objective Function
    def objective_rule(m):
        return -sum((eta_0 * m.R[t]) + (m.S[t]/S_max*100) for t in m.T)
    model.objective = pyo.Objective(rule=objective_rule, sense=pyo.minimize)

    # Constraints
    def S_balance_lower(m, t):
        if t == 0:
            return m.S[t] + m.R[t] <= initial_storage + I[t] - D[t]
        return m.S[t] + m.R[t] <= m.S[t-1] + I[t] - D[t]

    def S_balance_upper(m, t):
        if t == 0:
            return -(m.S[t] + m.R[t]) <= -(initial_storage + I[t] - D[t])
        return -(m.S[t] + m.R[t]) <= -(m.S[t-1] + I[t] - D[t])
    model.S_lower = pyo.Constraint(model.T, rule=S_balance_lower)
    model.S_upper = pyo.Constraint(model.T, rule=S_balance_upper)
    model.S_final = pyo.Constraint(expr=model.S[N-1] == initial_storage)

    # Solve
    solver = pyo.SolverFactory('scip')
    results = solver.solve(model)

    if results.solver.status == pyo.SolverStatus.ok:
        S_opt = np.array([pyo.value(model.S[t]) for t in model.T])
        R_opt = np.array([pyo.value(model.R[t]) for t in model.T])
        return S_opt, R_opt
    else:        
        raise ValueError('Not able to solve.')

Note that in this implementation, pyomo will optimize all of the reservoir release and storage decisions simultaneously, returning the vectors of length N which prescribe the next N days of operations.

Usage

Now we are ready to use our LP reservoir simulator. In the code block below, I set some specifications for our operational constraints, generate fake inflow and demand timeseries, run the LP solver, and plot the simulated results:

# spcifications
N_t = 30
S_min = 2500
S_max = 5000
R_min = 10
R_max = 1000
eta = 1.2

# Generate simple inflow and demand data
I, D = generate_data(N_t, correlation_factor = -0.70,
                     inflow_mean=500, inflow_std=100,
                     lag_correlation=0.2)


# Run LP operation simulation
S_sim, R_sim = pyomo_lp_reservoir(N_t, S_min, S_max, R_min, R_max, eta, I, D, 
                                  initial_storage=S_max)

# Plot results
plot_simulated_reservoir(I, D,
                         R_sim, S_sim, 
                         S_max, eta=eta)

The operations are shown:

Under this LP formulation, with a perfect inflow forecast, the reservoir operates as a “run of river” with the release rates being very close to the inflow rate.

In practice, operators may need to limit the difference in release volume from day-to-day. I added an optional parameter (R_change_limit) which adds a constraint on the difference subsequent releases from between each day.

The operations, with the daily release change rate limited to 50 is shown below.

Conclusions

The way you formulate the an LP problem dictates the “optimal” decisions that you will generate. The purpose of these past two posts was to make some attempt at giving a practice demo of some basic LP concepts. I hope for some that it might be useful as a starting point.