Infrastructure Investment Selection as a Two-Stage Stochastic Programming Problem (Part 2)

In this post, we will continue where we left off from Part 1, in which we set up a two-stage stochastic programming problem to identify the optimal decision for the type of water supply infrastructure to build. As a recap, our simple case study involves a small water utility servicing the residents of Helm’s Keep (population 57,000) to identify the following:

  • The best new water supply infrastructure option to build
  • Its final built capacity
  • The total bond payment to be made
  • The expected daily deliveries that ‘ its final built capacity
  • The expected daily deliveries that it needs to meet

To identify these values, we will be setting up a two-stage stochastic problem using the Python cvxpy library. In this post, we will first review the formulation of the problem, provide information on the installation and use of the cvxpy library, and finally walk through code to identify the optimal solution to the problem.

Reviewing the infrastructure investment selection problem

In our previous post, we identified that Helm’s Keep would like to minimize their infrastructure net present cost (INPC), giving the following objective function:

min  f_{INPC} = \sum_{i=1}^{N}\sum_{s=1}^{S}p_s\sum_{y=1}^{30}\frac{PMT_{s,i}}{(1+d_{s,i})^y}

where

  • N=3 and S = 3 are the total number of infrastructure options and potential future scenarios to consider
  • p_s is the probability of occurrence for scenario s \in S
  • y is one year within the entire bond term T =[1,30]
  • PMT is the total bond payment, or bond principal
  • d_{s,i} is the discount rate in scenario s for infrastructure option i

In achieving this objective, Helm’s Keep also has to abide by the following constraints:

y_1 + y_2 + y_3 = 1

x_i \leq My_i

D_{s,i} \geq \frac{\delta_{s}}{1-MAX(\rho)}

D_{s,i} \leq \sum_{i=1}^{3}\frac{x_i}{1-\rho_i}

D_{s,i} \leq 8

PMT_{s,i} \leq 1.25R_{s,i}

where

  • x_i is the final built capacity of infrastructure option i
  • y_i is a binary [0,1] variable indicating if an infrastructure option is built (1) or not (0)
  • \delta_{s} is the daily demand in scenario s
  • D_{s,i} is the daily water deliveries from infrastructure option i in scenario s
  • \rho_i is the ratio of non-revenue water (NRW) if infrastructure option i is built
  • R_{s,i} is the net revenue from fulfilling demand (after accounting for NRW) using infrastructure option i in scenario s

For the full formulations of PMT and R, please refer to Part 1 of this tutorial.

In this problem, we our first-stage decision variables are the infrastructure option y_i to build and its final built capacity x_i. Our second-stage decision variables are the daily water deliveries made D_{s,i} in each scenario s.

The CVXPY Python library

To solve this problem, we will be using Python’s cvxpy library for convex optimization. It is one of the many solvers available for performing convex optimization in Python including Pyomo, as demonstrated in Trevor’s earlier blog post. Some other options options include PuLP, scipy.optimize, and Gurobi. For the purposes of our specific application, we will be using cvxpy as it can interpret lists and dictionaries of variables, constraints, or objectives. This allows direct appending and retrieval of said objects. This feature therefore makes it convenient to use with for-loops, which are useful in the case of two- or multi-stage stochastic problems where the decision variable space can exponentially grow with the number of scenarios or options being considered. You can find an introduction to cvxpy, documentation, and examples at the CVXPY website.

If you use pip, you can install cvxpy using the following:

pip install cvxpy

To install specific solver names, you can alternatively install cvxpy using

pip install cvxpy[CBC,CVXOPT,GLOP,GLPK,GUROBI,MOSEK,PDLP,SCIP,XPRESS]

where you can substitute the names of any convex optimization solver in the square brackets.

If you use conda, install cvxpy using

conda create --name cvxpy_env
conda activate cvxpy_env
conda install -c conda-forge cvxpy

Once you’ve installed cvxpy, you’re ready to follow along the next part!

Solving the two-stage stochastic programming problem

First, we import the cvxpy library into our file and define the constants and helper functions to calculate the values of PMT and R

import cvxpy as cp

# define constants
households = 57000  # number of households
T = 30  # bond term (years) for an infrastructure option 
UR = 0.06 # the uniform water rate per MGD

def calculate_pmt(br, C_i, y_i, VCC_i, x_i, T=30):
    """Calculate the annual payment for a loan.

    Args:
        br (float): The annual interest rate.
        C_i (float): capital cost of infrastructure option i 
        y_i (boolean): 1 if option i is built, 0 if not
        VCC_i (float): variable capital cost of infrastructure option i
        x_i (float): final built capacity of infrastructure option i
        T (const): T=30 years, the bond term in years.

    Returns:
        float: The annual payment amount.
    """
    # calculate p_i, the bond principal (total value of bond borrowed) for
    # infrastructure option i
    p_i = (C_i*y_i) + (VCC_i * x_i)

    # Convert the annual interest rate to a monthly rate.
    pmt = p_i*(br*(1+br)**T)/(((1+br)**T)-1)

    # Calculate the monthly payment.
    return pmt

def calculate_R(D_i, rho_i, VOC_i, households=57000, UR=0.06):
    """
    Calculates the potential net revenue from infrastructure option i in a given scenario.

    Args:
        D_i (float): per-capita daily water demand in MGD
        rho_i (float): percentage of water lost during transmission (non-revenue water, NRW) from infrastructure i to the city
        VOC_i (float): variable operating cost of i
        households (const): households=57000 the number of households in the city
        UR (const): UR=0.06/MGD the per-household uniform water rate

    Returns:
        R_i (float): The potential net revenue from infrastructure option i in a given scenario.
    """
    OC_i = VOC_i * (D_i/rho_i)
    R_i = ((D_i * UR * households)*(1-rho_i)) - OC_i
    return R_i

Then, we define all the variables required. We will first define the first-stage decision variables:

# Infrastructure options as boolean (1, 0) variables
y1 = cp.Variable(boolean = True, name='y1')
y2 = cp.Variable(boolean = True, name='y2')
y3 = cp.Variable(boolean = True, name='y3')

# capacity of each infrastructure option
x1 = cp.Variable(nonneg=True, name = 'x1')
x2 = cp.Variable(nonneg=True, name = 'x2')
x3 = cp.Variable(nonneg=True, name = 'x3')

# infrastructure option parameters
C = [15.6, 11.9, 13.9] # capital cost of each infrastructure option in $mil
VCC = [7.5, 4.7, 5.1] # variable capital cost of each infrastructure option in $mil/MGD capacity
VOC = [0.2, 0.5, 0.9] # variable operating cost of each infrastructure option in $mil/MGD
NRW = [0.2, 0.1, 0.12] # non-revenue water (NRW) for each infrastructure option

Next, we define the second-stage decision variable and the parameter values related to each potential scenario:

# volume of water delivered to the city in each scenario
D = {}
for i in range(3):
    D[i] = cp.Variable(nonneg=True)

P = [0.35, 0.41, 0.24]  # probability of each scenario
demand_increase = [4.4, 5.2, 3.9]  # per-capita daily demand increase in MGD
bond_rate = [0.043, 0.026, 0.052]  # bond rate in each scenario
discount_rate = [0.031, 0.026, 0.025]  # discount rate in each scenario

Note the for loop in the Lines 2-4 of the code snippet above. The cvxpy enables variables to be added to and accessed via a dictionary that allows access via both explicit and in-line for loop, as we will see below in the objective function code definition:

min_inpc = sum(P[s]*sum((calculate_pmt(bond_rate[s], C[0], y1, VCC[0], x1)/((1+discount_rate[s])**t)) for t in range(1,T+1)B) for s in range(3)) + \
    sum(P[s]*sum((calculate_pmt(bond_rate[s], C[0], y2, VCC[1], x2)/((1+discount_rate[s])**t)) for t in range(1,T+1)) for s in range(3)) + \
    sum(P[s]*sum((calculate_pmt(bond_rate[s], C[2], y3, VCC[2], x3)/((1+discount_rate[s])**t)) for t in range(1,T+1)) for s in range(3))

Some explanation is required here. Our goal is to find the minimum INPC required to build the supply required to meet potential demand growth. Our objective function formulation therefore is the sum of the INPC of all three potential infrastructure options, each calculated across the three scenarios. As the y_i variable is binary, only the sum across the three scenarios that requires the optimal infrastructure option will be chosen.

To constrain the solution space of our objective function, we define our constraints. Below, we can see the application of the ability of the cvxpy library that allows constraints to be added iteratively to a list:

constraints = []

# set an arbitrarily large value for M
M = 10e12

for s in range(3):
    constraints += [D[s] >= demand_increase[s]]   # daily water deliveries must be more than or equal to daily demand increase
    constraints += [D[s] <= ((x1/0.1) + (x2/0.1) + (x2/0.12))/1.2]   
    constraints += [calculate_pmt(bond_rate[s], C[0], y1, VCC[0], x1) <= 1.25*calculate_R(demand_increase[s], NRW[0], VOC[0])]
    constraints += [calculate_pmt(bond_rate[s], C[1], y2, VCC[1], x2) <= 1.25*calculate_R(demand_increase[s], NRW[1], VOC[1])]
    constraints += [calculate_pmt(bond_rate[s], C[2], y3, VCC[2], x3) <= 1.25*calculate_R(demand_increase[s], NRW[2], VOC[2])]

constraints += [y1 + y2 + y3 == 1]
constraints += [x1 <= M * y1]
constraints += [x2 <= M * y2]
constraints += [x3 <= M * y3]

Finally, we solve the problem using the Gurobi solver. This solver is selected as it comes pre-installed with the cvxpy library and does not require additional steps or licensing during installation. We also print the objective value and the solutions to the problem:

# set up the problem as a minimization
problem = cp.Problem(cp.Minimize(min_inpc), constraints)

# solve using Gurobi
problem.solve(solver=cp.GUROBI, verbose=False)

print(f'Optimal INPC: ${problem.value} mil' )
for variable in problem.variables():
  print(f"{variable.name()} = {variable.value}")

Obtaining the solutions

If you have closely followed the steps shown above, you would have identified that Helm’s Keep should build Infrastructure Option 3 (a new groundwater pumping station), to a total built capacity that allows total expected daily deliveries of 3.27MGD. This will result in a final INPC of USD$35.62 million. There are our first-stage decision variables.

In each scenario, the following daily deliveries (second-stage decision variables) should be expected:

ScenarioScenario probability (%)Demand increase (MGD)Daily deliveries (MGD)
1354.45.5
2415.26.5
3243.94.875

The values from the second and third column can be found in Part 1 of this tutorial. The final daily deliveries account for the maximum possible portion of NRW.

Let’s identify how much Helm’s Keep will require to pay in total annual bond payments and how much their future expected daily deliveries will be:

total_bond_payment = sum(P[s]*calculate_pmt(bond_rate[s], C[1], 1, VCC[1], x2.value) for s in range(3))
expected_daily_deliveries = sum(P[s]*D[s].value for s in range(3))

If you have closely followed the steps shown above, you would have obtained the following values:

Total annual bond paymentUSD$1.55 million
Expected total daily water deliveries5.76 MGD

Conclusion

Congratulations – you just solved a two-stage programming stochastic programming problem! In this post, we reviewed the content of Part 1, and provided a quick introduction to the cvxpy Python library and justified its use for the purpose of this test case. We also walked through the steps required to solve this problem in Python, identified that it should build a new groundwater pumping station with a 3.27MGD capacity. We also identified the total annual amount Helm’s Keep would have to pay annually to fulfill its debt service requirements, and how much it water would, on average, be expected to provide for its constituents.

I hope that both Parts 1 and 2 provided you with some information on what stochastic programming is, when to use it, and some methods that might be useful to approach it. Thank you for sticking around and happy learning!

Infrastructure Investment Selection as a Two-Stage Stochastic Programming Problem (Part 1)

About Stochastic Programming

Stochastic programming (SP) is an approach to decision-making under uncertainty (alongside robust optimization and chance constrained optimization). It is an extension of deterministic optimization techniques such as linear (or non-linear) programming that incorporates stochastic elements (hence the name) in the form of multiple potential future scenarios. Overall, the aim of stochastic programming is to identify optimal solutions to a given problem that perform well under a finite range of possible scenarios. This is done by optimizing the expected value of the objective function over all scenarios. For clarification, a “scenario” in the context of SP is defined as the combined realization (expected outcome) of all future uncertainties.

There are two versions of an SP problem: the two-stage SP problem consists of two decision-making stages, where the first stage decisions represent the decisions taken before the uncertainty (the future scenario, in this case) unfolds and becomes known. The second stage decisions are taken once the future has unfolded and corrective action needs to be taken. Two-stage SP problems are typically used for scenario planning, where a decision-maker’s goal is to identify a solution that will result in good performance across all potential scenarios. Next, the multi-stage SP problem with s-stages is an extension of the two-stage SP problem, where the decision at stage s is dependent on all decisions made up until s. Due to the problem’s high dimensionality, the multi-stage SP problem is typically represented using a scenario tree with each branch associated with a probable outcome and its probability and each level associated with a stage s.

In this post, I will provide a two-stage stochastic programming example discuss the pros and cons of its use. In the following post, I will and demonstrate how to approach and solve such a problem using the Python cvxpy library.

The infrastructure selection example

Disclaimer: All values are arbitrary .

A water utility serving the residents of Helm’s Keep (57,000 households) are deciding on a water supply storage expansion project to meet projected increase in demand and wants to identify a suitable yet affordable project to invest in:

  • Option 1: Water treatment capacity expansion on the distant (but yet untapped) Lake Heleborn
  • Option 2: The construction of small local reservoir on the outskirts of Helm’s Keep
  • Option 3: Construction of a new groundwater pumping station

Each of these infrastructure options are associated with the following capital investment costs. The variable capital cost and variable operating cost are functions of the infrastructure options and expected demand (both in MGD).

Fixed capital cost ($mil)Variable capital cost ($mil/MGD capacity)Variable operating cost ($mil/MGD demand)
Option 115.67.50.2
Option 211.94.70.5
Option 313.95.10.9

Helm’s Keep’s priority in the selection of its new water supply infrastructure is to minimize its total infrastructure net present cost:

f_{INPC}=\sum_{y=1}^{T}\frac{PMT}{(1+d)^y}

where INPC is the infrastructure net present cost, T=30 is bond term for an infrastructure option, and  is the discount rate for a given year y of debt service payment d for an infrastructure option since its bond was issued. The value of PMT is calculated as follows:

\frac{P[BR(1+BR)^T]}{(1+BR)^{T}-1}

where P is the bond principal (the total value of the bond), BR is the bond interest rate to be paid to the creditor throughout the bond term T. All payments are discounted to their present value. The value of the bond principal is dependent on the capacity of the infrastructure option and is calculated as a function of the fixed and variable capital costs:

P_i = C_iy_i + VCC_ix_i

such that C_i is the capital cost of infrastructure option i and y_i is a binary {0,1} variable indicating if option i is chosen. VCC_i is the variable capital cost of infrastructure option i, and x_i is the final built capacity of i.

In addition, there is some water loss in transporting water from either location due to aging infrastructure, which will result in losses caused by non-revenue water (NRW). The losses associated with option are listed below:

  • Option 1: 20% water loss
  • Option 2: 10% water loss
  • Option 3: 12% water loss

Helm’s Keep must also consider the debt is undertaking relative to its potential annual net revenue R_{s,i}, calculated as follows:

R_{s,i}=[(D_{s,i}\times UR \times \text{households})(1-\rho_{i})] - OC_{s,i}

where D_{s,i} is per-capita daily water delivered, UR=\$0.06/MGD is the per-household uniform water rate, and \rho_{i} is the percentage of water lost during transmission given that infrastructure i was chosen. OC_{s,i} is Helm’s Keep’s daily operating costs per capita where

OC_{s,i}=VOC_i \frac{D_{s,i}}{\rho_i}

such that D_{s,i} is the demand in scenario s for infrastructure option i, VOC_{s,i} is the variable operating cost of i, and \rho_i is the percentage of NRW.

However, it must also meet the following constraints:

  • The infrastructure option decision is exclusive (only one option can be selected).
  • The total capacity of the chosen infrastructure option cannot be less than 1.2 times that of daily demand increase (after accounting for NRW).
  • Daily deliveries be at least equal to demand (after accounting for NRW).
  • All water produced (after accounting for NRW) are considered for-revenue water and result in gross revenue.
  • Annual payments PMT_{s,i} should not exceed 1.25 times that of Helm’s Keep’s annual net revenue R_{s,i}.

There are three possible scenarios s \in S = {1,2,3} of water demand increase \delta_s, Federal Reserve bond interest rate, and discount rate. The probability of each scenario is listed in the table below:

ScenariosDemand increase (MGD)Bond rate (%)Discount rate (%)Probability (%)
Scenario 14.44.33.10.35
Scenario 25.22.62.60.41
Scenario 33.95.22.50.24

Helm’s Keep faces the following decisions:

  • Stage 1: Which infrastructure option i to build and to what capacity x_i for i \forall {1,2,3}?
  • Stage 2: How much will Helm’s Keep’s corresponding bond principal be how much water is delivered in each scenario?

Formulating the problem

Let the first-stage decision variables be:

  • Infrastructure option selection y_i where y_1 + y_2 + y_3 = 1 and y_i \in {0,1} \forall i=[1,2,3]
  • Capacity of each infrastructure option x_i \forall i = {1,2,3}

Next, we set up the second stage decision variables. Let D_s, BR_s and d_s be the amount of water delivered, bond rate, and discount rate, in scenario s \in S = {1,2,3}. This will result in the following problem setup:

min  f_{INPC} = \sum_{i=1}^{N}\sum_{s=1}^{S}p_s\sum_{y=1}^{30}\frac{PMT_{s,i}}{(1+d_{s,i})^y}

where

PMT_{s,i}=(C_i y_i + VCC_i x_i) \frac{[BR_{s,i}(1+BR_{s,i})^{30}]}{(1+BR_{s,i})^{30} -1}

with constraints

y_1 + y_2 + y_3 = 1

D_{s,i} \geq \frac{\delta_{s}}{1-MAX(\rho)}

D_{s,i} \geq \sum_{i=1}^{3}\frac{x_i}{1-\rho_i}

D_{s,i} \leq 8

PMT_{s,i} \leq 1.25R_{s,1}

x_i \leq My_i

such that

R_{s,i} = [(3420D_{s,i})(1-\rho_i)] - (VOC_i\frac{D_{s,i}}{\rho_s})

and p_s is the probability of scenario s and UR \times \text{households}=3420.

Some caveats

As we walked through the formulation of this two-stage stochastic programming problem, it quickly becomes apparent that the computational complexity of such an approach can quickly become intractable when the number of uncertain scenarios and the complexity of the model (large number of uncertain parameters) is high. In addition, this approach assumes a priori knowledge of future uncertainty and sufficient underlying expert knowledge to generate 6uncertainty estimates, which cannot guarantee that the theoretical robustness of the identified solution to real world outcomes if the estimated uncertainty does not match the estimated uncertainty. We can see that this method may therefore be best suited for problems where uncertainty can be accurately characterized and modeled as random variables or scenarios. It is nonetheless a useful approach for addressing problems in manufacturing, production line optimization, financial portfolio design, etc, where data is abundant and uncertainty is well-characterized, and then identifying solutions that perform well across a finite number of scenarios.

In this post, we covered an introduction to stochastic programming and provided a 2-stage infrastructure investment selection example to demonstrate the method’s key concepts. In the next post, we will walk through the installation and use of the Python cvxpy library and use it to solve this problem to identify the most suitable infrastructure option to invest in, its final built capacity, and its total demand fulfilled.