In this post, we will continue where we left off from Part 1, in which we set up a two-stage stochastic programming problem to identify the optimal decision for the type of water supply infrastructure to build. As a recap, our simple case study involves a small water utility servicing the residents of Helm’s Keep (population 57,000) to identify the following:
- The best new water supply infrastructure option to build
- Its final built capacity
- The total bond payment to be made
- The expected daily deliveries that ‘ its final built capacity
- The expected daily deliveries that it needs to meet
To identify these values, we will be setting up a two-stage stochastic problem using the Python cvxpy
library. In this post, we will first review the formulation of the problem, provide information on the installation and use of the cvxpy
library, and finally walk through code to identify the optimal solution to the problem.
Reviewing the infrastructure investment selection problem
In our previous post, we identified that Helm’s Keep would like to minimize their infrastructure net present cost (INPC), giving the following objective function:
where
- and are the total number of infrastructure options and potential future scenarios to consider
- is the probability of occurrence for scenario
- is one year within the entire bond term
- is the total bond payment, or bond principal
- is the discount rate in scenario for infrastructure option
In achieving this objective, Helm’s Keep also has to abide by the following constraints:
where
- is the final built capacity of infrastructure option
- is a binary variable indicating if an infrastructure option is built (1) or not (0)
- is the daily demand in scenario
- is the daily water deliveries from infrastructure option in scenario
- is the ratio of non-revenue water (NRW) if infrastructure option is built
- is the net revenue from fulfilling demand (after accounting for NRW) using infrastructure option in scenario
For the full formulations of and , please refer to Part 1 of this tutorial.
In this problem, we our first-stage decision variables are the infrastructure option to build and its final built capacity . Our second-stage decision variables are the daily water deliveries made in each scenario .
The CVXPY Python library
To solve this problem, we will be using Python’s cvxpy
library for convex optimization. It is one of the many solvers available for performing convex optimization in Python including Pyomo, as demonstrated in Trevor’s earlier blog post. Some other options options include PuLP, scipy.optimize, and Gurobi. For the purposes of our specific application, we will be using cvxpy
as it can interpret lists and dictionaries of variables, constraints, or objectives. This allows direct appending and retrieval of said objects. This feature therefore makes it convenient to use with for-loops, which are useful in the case of two- or multi-stage stochastic problems where the decision variable space can exponentially grow with the number of scenarios or options being considered. You can find an introduction to cvxpy
, documentation, and examples at the CVXPY website.
If you use pip
, you can install cvxpy
using the following:
pip install cvxpy
To install specific solver names, you can alternatively install cvxpy
using
pip install cvxpy[CBC,CVXOPT,GLOP,GLPK,GUROBI,MOSEK,PDLP,SCIP,XPRESS]
where you can substitute the names of any convex optimization solver in the square brackets.
If you use conda
, install cvxpy
using
conda create --name cvxpy_env
conda activate cvxpy_env
conda install -c conda-forge cvxpy
Once you’ve installed cvxpy
, you’re ready to follow along the next part!
Solving the two-stage stochastic programming problem
First, we import the cvxpy
library into our file and define the constants and helper functions to calculate the values of and
import cvxpy as cp
# define constants
households = 57000 # number of households
T = 30 # bond term (years) for an infrastructure option
UR = 0.06 # the uniform water rate per MGD
def calculate_pmt(br, C_i, y_i, VCC_i, x_i, T=30):
"""Calculate the annual payment for a loan.
Args:
br (float): The annual interest rate.
C_i (float): capital cost of infrastructure option i
y_i (boolean): 1 if option i is built, 0 if not
VCC_i (float): variable capital cost of infrastructure option i
x_i (float): final built capacity of infrastructure option i
T (const): T=30 years, the bond term in years.
Returns:
float: The annual payment amount.
"""
# calculate p_i, the bond principal (total value of bond borrowed) for
# infrastructure option i
p_i = (C_i*y_i) + (VCC_i * x_i)
# Convert the annual interest rate to a monthly rate.
pmt = p_i*(br*(1+br)**T)/(((1+br)**T)-1)
# Calculate the monthly payment.
return pmt
def calculate_R(D_i, rho_i, VOC_i, households=57000, UR=0.06):
"""
Calculates the potential net revenue from infrastructure option i in a given scenario.
Args:
D_i (float): per-capita daily water demand in MGD
rho_i (float): percentage of water lost during transmission (non-revenue water, NRW) from infrastructure i to the city
VOC_i (float): variable operating cost of i
households (const): households=57000 the number of households in the city
UR (const): UR=0.06/MGD the per-household uniform water rate
Returns:
R_i (float): The potential net revenue from infrastructure option i in a given scenario.
"""
OC_i = VOC_i * (D_i/rho_i)
R_i = ((D_i * UR * households)*(1-rho_i)) - OC_i
return R_i
Then, we define all the variables required. We will first define the first-stage decision variables:
# Infrastructure options as boolean (1, 0) variables
y1 = cp.Variable(boolean = True, name='y1')
y2 = cp.Variable(boolean = True, name='y2')
y3 = cp.Variable(boolean = True, name='y3')
# capacity of each infrastructure option
x1 = cp.Variable(nonneg=True, name = 'x1')
x2 = cp.Variable(nonneg=True, name = 'x2')
x3 = cp.Variable(nonneg=True, name = 'x3')
# infrastructure option parameters
C = [15.6, 11.9, 13.9] # capital cost of each infrastructure option in $mil
VCC = [7.5, 4.7, 5.1] # variable capital cost of each infrastructure option in $mil/MGD capacity
VOC = [0.2, 0.5, 0.9] # variable operating cost of each infrastructure option in $mil/MGD
NRW = [0.2, 0.1, 0.12] # non-revenue water (NRW) for each infrastructure option
Next, we define the second-stage decision variable and the parameter values related to each potential scenario:
# volume of water delivered to the city in each scenario
D = {}
for i in range(3):
D[i] = cp.Variable(nonneg=True)
P = [0.35, 0.41, 0.24] # probability of each scenario
demand_increase = [4.4, 5.2, 3.9] # per-capita daily demand increase in MGD
bond_rate = [0.043, 0.026, 0.052] # bond rate in each scenario
discount_rate = [0.031, 0.026, 0.025] # discount rate in each scenario
Note the for loop in the Lines 2-4 of the code snippet above. The cvxpy
enables variables to be added to and accessed via a dictionary that allows access via both explicit and in-line for loop, as we will see below in the objective function code definition:
min_inpc = sum(P[s]*sum((calculate_pmt(bond_rate[s], C[0], y1, VCC[0], x1)/((1+discount_rate[s])**t)) for t in range(1,T+1)B) for s in range(3)) + \
sum(P[s]*sum((calculate_pmt(bond_rate[s], C[0], y2, VCC[1], x2)/((1+discount_rate[s])**t)) for t in range(1,T+1)) for s in range(3)) + \
sum(P[s]*sum((calculate_pmt(bond_rate[s], C[2], y3, VCC[2], x3)/((1+discount_rate[s])**t)) for t in range(1,T+1)) for s in range(3))
Some explanation is required here. Our goal is to find the minimum INPC required to build the supply required to meet potential demand growth. Our objective function formulation therefore is the sum of the INPC of all three potential infrastructure options, each calculated across the three scenarios. As the variable is binary, only the sum across the three scenarios that requires the optimal infrastructure option will be chosen.
To constrain the solution space of our objective function, we define our constraints. Below, we can see the application of the ability of the cvxpy library that allows constraints to be added iteratively to a list:
constraints = []
# set an arbitrarily large value for M
M = 10e12
for s in range(3):
constraints += [D[s] >= demand_increase[s]] # daily water deliveries must be more than or equal to daily demand increase
constraints += [D[s] <= ((x1/0.1) + (x2/0.1) + (x2/0.12))/1.2]
constraints += [calculate_pmt(bond_rate[s], C[0], y1, VCC[0], x1) <= 1.25*calculate_R(demand_increase[s], NRW[0], VOC[0])]
constraints += [calculate_pmt(bond_rate[s], C[1], y2, VCC[1], x2) <= 1.25*calculate_R(demand_increase[s], NRW[1], VOC[1])]
constraints += [calculate_pmt(bond_rate[s], C[2], y3, VCC[2], x3) <= 1.25*calculate_R(demand_increase[s], NRW[2], VOC[2])]
constraints += [y1 + y2 + y3 == 1]
constraints += [x1 <= M * y1]
constraints += [x2 <= M * y2]
constraints += [x3 <= M * y3]
Finally, we solve the problem using the Gurobi solver. This solver is selected as it comes pre-installed with the cvxpy
library and does not require additional steps or licensing during installation. We also print the objective value and the solutions to the problem:
# set up the problem as a minimization
problem = cp.Problem(cp.Minimize(min_inpc), constraints)
# solve using Gurobi
problem.solve(solver=cp.GUROBI, verbose=False)
print(f'Optimal INPC: ${problem.value} mil' )
for variable in problem.variables():
print(f"{variable.name()} = {variable.value}")
Obtaining the solutions
If you have closely followed the steps shown above, you would have identified that Helm’s Keep should build Infrastructure Option 3 (a new groundwater pumping station), to a total built capacity that allows total expected daily deliveries of 3.27MGD. This will result in a final INPC of USD$35.62 million. There are our first-stage decision variables.
In each scenario, the following daily deliveries (second-stage decision variables) should be expected:
Scenario | Scenario probability (%) | Demand increase (MGD) | Daily deliveries (MGD) |
1 | 35 | 4.4 | 5.5 |
2 | 41 | 5.2 | 6.5 |
3 | 24 | 3.9 | 4.875 |
The values from the second and third column can be found in Part 1 of this tutorial. The final daily deliveries account for the maximum possible portion of NRW.
Let’s identify how much Helm’s Keep will require to pay in total annual bond payments and how much their future expected daily deliveries will be:
total_bond_payment = sum(P[s]*calculate_pmt(bond_rate[s], C[1], 1, VCC[1], x2.value) for s in range(3))
expected_daily_deliveries = sum(P[s]*D[s].value for s in range(3))
If you have closely followed the steps shown above, you would have obtained the following values:
Total annual bond payment | USD$1.55 million |
Expected total daily water deliveries | 5.76 MGD |
Conclusion
Congratulations – you just solved a two-stage programming stochastic programming problem! In this post, we reviewed the content of Part 1, and provided a quick introduction to the cvxpy
Python library and justified its use for the purpose of this test case. We also walked through the steps required to solve this problem in Python, identified that it should build a new groundwater pumping station with a 3.27MGD capacity. We also identified the total annual amount Helm’s Keep would have to pay annually to fulfill its debt service requirements, and how much it water would, on average, be expected to provide for its constituents.
I hope that both Parts 1 and 2 provided you with some information on what stochastic programming is, when to use it, and some methods that might be useful to approach it. Thank you for sticking around and happy learning!