Solving Analytical Algebra/Calculus Expressions with Matlab

Hi All,

This week a few of our colleagues were grinding through some tough calculus for a homework assignment and bemoaning the fact they’d gotten a bit rusty since undergrad.  For help one might turn to Maple or Wolfram, but Matlab is also a useful tool.

Here we want to make use of Matlab’s symbolic variable class.  Mathworks has some nice tutorials on how to create symbolic variables etc. and a nice description of functions which can operate on symbolic variables.  I’ll go through a really brief tutorial here.

First let’s create a symbolic number.  We do this with the sym function.  To see the difference between a symbolic number and floating-point number try the following code:

sym(1/3)
1/3

ans =
1/3

ans =
0.3333

Neat, but how are they different?  Stealing from Mathwork’s tutorial for a moment consider sin(π).  We all learned in middle school that sin(π) = 0. Of course π is irrational so numerically evaluating sin(π) may not return exactly zero.  Let’s see what happens when we evaluate sin(π) with both symbolic and floating-point pi’s:

sin(sym(pi))
sin(pi)

ans =
0

ans =
1.2246e-16

Cool!  Now, you’ve probably guessed that if we can create symbolic variables, we can also create symbolic functions.  Let’s create a simple symbolic function:

syms x a b c
f = symfun(3*x^2+2*x+1,x);

f(x) =
3x^2 + 2x + 1

Note that we could also use symbolic variables a, b, and c for the coefficients in f.  For now, let’s stick with 1, 2, and 3.

We can substitute any constant (say 3) or symbolic (say sin(y)) value of x we want (say x=3) and get the resulting value of f:

Test1 = f(3)

syms y
Test2 = f(sin(y))

Test1 =
34

Test2 =
2sin(y) + 3sin(y)^2 + 1

We can also multiply our function f by a constant or by another variable:

Test3 = 6*f

Test4 = symfun(f*y,[x y])

Test3 (x)=
18x^2 + 12x + 6

Test4(x, y) =
y(3x^2 + 2*x + 1)

Now, getting back to f, let’s do some calculus.  For starters, let’s take the first derivative of f with respect to x:

df = diff(f,x)

df(x) =
6*x + 2

And we can integrate as well:

F = int(f,x)

F(x) =
x*(x^2 + x + 1)

Again recall that we can derive the more general form of the expression using symbolic coefficients.  Of course, this is a simple example, but I’ve used it for much, much more complicated functions.

If we’re having trouble with calculus, maybe you could use a bit of help on the linear algebra front as well.  Let’s start by instantiating a symbolic matrix, X:

X = sym('X',[2,2])

X =
[ X1_1, X1_2]
[ X2_1, X2_2]

Alternatively we could instantiate X as a symbolic function with more readable symbolic elements:

syms a b c d
X=symfun([a,b;c,d],[a b c d])

X(a, b, c, d) =
[ a, b]
[ c, d]

As with the previous examples, we can operate on X.  Let’s take the inverse:

Xinv = inv(X)

Xinv(a, b, c, d) =
[ d/(ad – bc), -b/(ad – bc)]
[ -c/(ad – bc), a/(ad – bc)]

Break out your linear algebra book to confirm that this is right.  Suppose we want to do something a bit more complicated: linear regression!  We instantiate a a Y vector of observations:

syms y1 y2 y3
Y = [y1;y2;y3]

Y =
y1
y2
y3

Now, suppose we want to fit a constant model, we instantiate a new X matrix (or in this case a vector) of ones:

X = sym(ones(3,1))

X =
1
1
1

Now, we fit our model (y=bx+e) using the ordinary least squares (OLS) estimator (b=(XTX)^-1XTY):

b = inv(X'*X)*X'*Y

b =
y1/3 + y2/3 + y3/3

So we have our answer, and we find that the OLS estimator of the constant model is simply the sample mean of the data.

All of these examples were super easy, but should be helpful if you’re stuck on econometrics, statistics, or hydrology homework.  If there is interest I can make later posts on taking the partial derivative of piece-wise polynomial structures in Matlab.

Jon

Simple Python script to create a command to call the Borg executable

Please see my GitHub for a new Python program that may be of interest. If the link becomes broken in the future, look up the ‘misc_scripts’ repository within ‘jrkasprzyk’ GitHub.

It saves you having to type out a long Borg command when using the ‘command line interface’, especially when there are many many decision variables. The script also contains a simple set of Python objects of interest: Objective contains the name and epsilon of an objective, and Decision contains the name, lower bound, and upper bound of an objective.

You can imagine that this would be able to facilitate complicated MOEA analyses such as random seed analysis, and the running of multiple problems. I have also used this to organize my thoughts when doing other analyses in Python such as sampling of the decision space, and different sorts of processing of MOEA output.

Debugging in Python (using PyCharm) – Part 1

Debugging in Python (using PyCharm) – Part 1

This blog post is the first of a multi-part series (here are the second and third debugging blog posts) intended to provide discussion of some basic debugging tools that I have found to be helpful in developing a pure Python simulation model using a Python Integrated Development Environment (IDE) called PyCharm. (Some of this discussion surely applies to other languages and development environments as well. I just want folks to know that I’ve prepared this post with Python/PyCharm in mind).  The post was updated with some additional tips from Joe Kasprzyk, on 10/15/2015.

If you didn’t catch my previous post, I strongly recommend PyCharm as a Python IDE, and have described in that post how to get access to PyCharm. PyCharm can be a bit heavy just for scientific computing, but in developing a large, object-oriented simulation model, I have found PyCharm’s code completion and debugging features to be a major asset for coding quickly and efficiently.

When I am debugging a program, there are often two fundamental issues at hand. The first is to make sure the program runs without any syntax errors (or any other errors generated while interpreting your code). The second is to be confident that your code is doing what you think it is. That is, even if it runs, there may be problems within the implementation of the routines you have coded. Python’s highly intelligent code editor is very helpful for solving the former problem before you even run your program: it will highlight errors on the fly, help you complete code to avoid variable misspellings, etc. However, to handle the latter issue, you may want to glimpse into the code’s functioning (e.g., variable values) as the program executes. I will now describe some of PyCharm’s debugging features that I have found to be very helpful in this regard.

I prepared the instructions and figures that appear below with the Professional version of PyCharm (JetBrains PyCharm 4.5.2), and I’m using it in Windows 7.

I am going to show examples of how to use the Pycharm’s debug features on a very simple Python script, which appears below. However, clearly you can follow these same steps for your own files. This script should work, so it shouldn’t need to be debugged, but I’ll show how I would use debugging tools to check the script anyway.

# My program file
import matplotlib.pyplot as plt
import numpy as np

# Initialize empty arrays
array_len = 10000  # Desired length of arrays
var1 = np.zeros(array_len)
var2 = np.zeros(array_len)
var3 = np.zeros(array_len)
var4 = np.zeros(array_len)

# Define two functions:

def function_1(input):
    # Function 1
    for i in range(array_len):
        var1[i] = 2 + i*input
        var2[i] = 2*var1[i]
    return var1, var2

def function_2(input):
    # Function 2
    for j in range(array_len):
        var3[j] = 3 + j*input
        var4[j] = 2*var3[j]
    return var3, var4

# Add var1 and var3, and plot result:
Output = function_1(1)[0] + function_2(2)[0]
plt.plot(Output)

Step 1. Open your files.

In PyCharm, open a file (or files) that you have coded, or are in the process of coding, and want to debug. For example, a screenshot of my open file appears below.

Fig 1

Step 2. Turn on the debugger.

There are numerous ways to do this. I prefer to open up a Python console to work from, and turn it on from there (as is shown in the screenshot below). To do this, click on the “Python console” button at the bottom of the screen, then click on the green bug icon on the console menu on the left to turn on the debugger. Note that you can actually have open multiple consoles (by clicking the “plus” icon on the console menu), and you can have debugging turned on in whichever consoles you prefer.

If you can’t figure out how to open up a Python console, you can also just click on the green bug icon in the menu at the top of the screen next to the play button, while your script is open. This may actually run the debugger as well, which is a step that comes later on in my post.

Fig 2

Step 3. Create breakpoints.

Breakpoints are spots in your code where you want to temporarily interrupt execution so you can glimpse into the program’s operation (e.g., view values of variables as the program runs). If you left click in the area just to the left of where the code editor stops, a closed red circle will appear, denoting a breakpoint.

In my example, I will set breakpoints on the lines where variables var2 and var4 are defined. (Note that I like to have line numbers turned on, particularly when debugging. You can access this features in PyCharm’s settings menu).

Breakpoints are only used when running the program in debug mode. That is, when you turn debugging off (e.g., by clicking once again on the bug icon in the console menu), the program will execute without stopping on breakpoints. Also, note that breakpoints are useful not only for viewing the program’s variable values and so forth, but also to see if the program is entering particular sections of code. If the program doesn’t stop on a breakpoint you’ve set, then the program is not entering the section of code where that breakpoint is located, which can be very valuable information.

Fig 3

Step 4. Create a run configuration and run the debugger.

There are several ways to do this as well. You can simply click on the green bug icon at the top of the screen again to run the debugger. Or, highlight your code, right click, and select the option to “Debug” your file.

Joe adds: If you want to do a quick debug of an uncomplicated script, the above will work well.  However sometimes you may want to debug a Python file that has command line arguments, such as pareto.py.  In order to create a configuration, right click on the configuration pull down menu in the upper right hand corner of the program, and click Edit Configurations…  Then you can add the command line arguments by typing them into the “Script parameters” box in that dialog window.  Then, you can continue debugging as usual.  Back to your regularly scheduled programming…

I have noticed that my programs execute significantly more slowly in debug mode, so you may experience the same. A debugger pane should appear as below.

Fig 4

Step 5. Set “Watches”.

Note the program execution has been interrupted on the “var1” line. Suppose we want to know the value of this local variable in its current loop. In the debugger pane, go to the “Watches” window, click the “plus” symbol, and enter “var1”. Repeat this process for the rest of the variables.

Fig 5

The debugger still shows no value for var1. This is because when the Pycharm debugger stops on a breakpoint, it interrupts execution before the breakpoint line is run, rather than after.

If you click on the “play” icon to resume execution in the debugger, you will see values appear in your “Watches” pane. (If you hold your cursor over the different icons in the debugging pane, an icon description at the bottom of the window appears, so you can figure out which buttons I’m referring to this way).

PyCharm will also display values of watch variables (as well as the counter i) in the actual code editor as well (note the new text in gray in the figure below). Watches are helpful for more complex programs than this one, where you could have hundreds or thousands of variables, attributes, objects, etc., and want to track the values only of specific ones, so you can check them as you debug.

Fig 6

Joe adds:

Step 6. Add some exception handling to find information about an error.

Sometimes it is unclear why it is that you’re getting a particular error, such as “list index out of range.”  The Python run may give you information about what line number caused the problem, but that may not be enough.  Exception handling provides a way to create a ‘soft landing’ when the program runs into a problem.  Check out the Python documentation of this here.

For example, I had a problem with the function ‘withobjectives’ within pareto.py.  It was telling me list index out of range.  This seemed strange to me, and setting a breakpoint would be a lot of work because I didn’t know when the actual error was happening.  But exception handling saves the day.  Here’s the original offending line:

for oo in oindices:
    objectives.append(float(row[oo]))

I add ‘try’ and ‘except’ statements around the offending line:

for oo in oindices:
    try:
        objectives.append(float(row[oo]))
    except:
        print "You messed up. Here is row:"
        print "%s" % row

Now, there will be a line printed to the console that shows you the variable that caused the problem. In this case, there was an error in the text file, and I can go and use a find command in a text editor to realize that there was an extra column somewhere in the file that should not have been there.

We return to the original post:

I have described just a few basic features. Other neat ones include:

  • If you want to temporarily turn off all breakpoints during the debug run, use “mute breakpoints”.
  • If you want the breakpoints only to turn on when particular conditions are satisfied, click on the “view breakpoints” icon. In the image below I am setting a rule that means the breakpoint will only interrupt execution when the counter (i) is >= 9998.

Fig 7

  • If you’ve been scrolling around in a big file and can’t find where the breakpoint is, click on the “Show Execution Point” icon.
  • You can change your code as you are debugging without any problem. So, if you discover and fix an error, you don’t have to stop the debugger and start over again. (In many cases, though, it obviously might be a good idea to just start again, in which case you can click on the “Stop” and/or “Rerun” icons).
  • In addition to watches, and values displayed by PyCharm in the editor, you can see the values of a variable (or attribute, etc.) by holding your mouse over that variable.
  • If you want to see plots of variables as you are debugging, you can import matplotlib and insert code in your script to generate a plot of those variables. PyCharm will print such figures during the debug. More details on matplotlib integration with the PyCharm debugger can be found here.

In a future post I will demonstrate PyCharm’s Coverage and Profile features, which are very useful for debugging by allowing you to see what parts of your program (e.g., modules) are/are not being accessed, how frequently, and how much time is being spent in them.

MOEA diagnostics for a simple test case (Part 3/3)


From part 1 and  part 2  of this tutorial we evaluated the DTLZ2, 3 objective problem.  We generated a combined reference set between all runs of Borg and NSGAII and calculated the average metrics for each parameter sample.  Now we are ready to analyze our results.

1) Generating control maps

Control maps help gain insight on the algorithmic performance by projecting hypervolume levels for different combinations of population sizes and NFE.

Required files:

☑ NSGAII_DTLZ2_3.average    and/or   NSGAII_DTLZ2_3_LHS.metrics

☑ Borg_DTLZ2_3.average   and/or  Borg_DTLZ2_3_LHS.metrics

☑ NSGAII_Latin

☑ Borg_Latin

The difference between the .average  and the .metrics files is that one represents the averaged metrics across the 15 random seed trials for each parameter sample whereas the latter contains the metrics obtained by the reference set of each parameter sample.  You can use these file interchangeably, depending on what you want to see, either the reference set or the average performance.

The following matlab script generates control maps from your latin hypercube samples and .metrics or .average files.  Here I used NSGAII average metrics.   You can also generate control maps for Borg by changing the file names in lines 8 and 9.

%Hypervolume control maps%projecting the hypervolume value
%relative to population size and NFE

clear all
clc

parameters=load('NSGAII_Latin');
metrics= load('NSGAII_DTLZ2_3.average');

steps = 3;
% compute average of points within each grid
sum = zeros(steps+1, steps+1);
count = zeros(steps+1, steps+1);
entries = min(size(parameters, 1), size(metrics, 1)); % parameter samples
lbNFE= 10000;
ubNFE= 200000;
lbPopulationSize=10;
ubPopulationSize=1000;

for i=1:entries
    index1 = round(steps * ((parameters(i, 1) - lbPopulationSize) / (ubPopulationSize - lbPopulationSize))) + 1; 

    %take each point and assign it to one of the boxes of the grid
    index2 = round(steps * ((parameters(i, 2) - lbNFE) / (ubNFE - lbNFE))) + 1;
    sum(index1, index2) = sum(index1, index2) + metrics(i, 1);
    count(index1, index2) = count(index1, index2) + 1;
end

Z = zeros(steps+1, steps+1);
for i=1:steps+1
    for j=1:steps+1
         if (count(i, j) > 0)
            Z(i, j) = sum(i, j) / count(i, j);
        end
    end
end
Z = Z';

refSetHV= 0.46622538877930825;

Z = Z/refSetHV;

X = zeros(1, steps+1);
Y = zeros(1, steps+1);
for i=1:steps+1
    X(1, i) = (ubPopulationSize - lbPopulationSize)*((i-1)/steps) + lbPopulationSize; % converts the indexes into real nfe and popsize values
    Y(1, i) = (ubNFE - lbNFE)*((i-1)/steps) + lbNFE;
end

% generate contour plot
hold on;
cmin=0;
cmax=1;
[C, h] = contourf(X, Y, Z,50000);
caxis([cmin cmax])
set(h, 'LineColor', 'none');
xlabel('Population Size
ylabel('NFE')<br>title('Borg')
% adjust colormap so that blue indicates best and values are normalized
cmap = colormap(flipud(jet));

hold on

The control map for NSGAII looks like this:

Featured image

For Borg we obtain something like this:

Featured image

In the axis we have the range of population sizes and NFEs specified on our  latin hypercube sampling. The color scale depicts the normalized hypervolume obtained for each combination of these two parameters.  Dark blue  means high performance, alternatively dark red means poor performance.  We could conclude that both algorithms have good controlability for this test case.  We would need to carry another analysis using the analytical reference set, as opposed to using a set generated by the combination of our two tested algorithms.  Now that you know the procedure, you can carry that analysis on your own..

2) Generating attainment plots

Required files:

☑ NSGAII_DTLZ2_3.average    and/or   NSGAII_DTLZ2_3_LHS.metrics

☑ Borg_DTLZ2_3.average   and/or  Borg_DTLZ2_3_LHS.metrics

The attainment plots enable us to see how effectively an algorithm approximates the best known Pareto front, and to see how likely it is to attain high levels of performance.  The following matlab code provides attainment plots of generational distance, epsilon indicator and hypervolume for NSGAII.  You can substitute NSGAII with Borg in line 4 to obtain Borg’s attainment plots.

clc
clear all

m= load('NSGAII_DTLZ2_3.average');

%Best Metric values Borg

%Best GD
BestGD=1-min(m(:,2));

%Best Epsilon normalized
Bestepsilon=1-min(m(:,5));

% Best hypervolume
Besthypervol=max(m(:,1));

%Attainment
%Normalized metrics
% Hypervolume
RFHV=0.48; %Reference set hypervolume
NormalizedHV= Besthypervol/RFHV(1); % Besthypervolume/reference_set

k= 1:-0.01:0.0; % k is the threshold
n=1;

%Hypervolume
count=0;
count1=0;
count2=0;
algorithms=1;
percent_attainment=zeros(length(algorithms),length(k));
percent_attainmentgd=zeros(length(algorithms),length(k));
percent_attainmentei=zeros(length(algorithms),length(k));
for l=1:length(algorithms)
        % load file
        for n=1:length(k)

            for i= 1:length(m(:,2))

                if (m(i,1)/RFHV) &gt;= k(n)
                    count=count+1;
                    percent_attainment(l,n)=(count/length(m(:,1)));
                end
                if (1-m(i,2)) &gt;= k(n)
                    count1=count1+1;
                    percent_attainmentgd(l,n)=(count1/length(m(:,2)));
                end
                if (1-m(i,5)) &gt;= k(n)
                    count2=count2+1;
                    percent_attainmentei(l,n)=(count2/length(m(:,2)));
                end
        end
        count=0;
        count1=0;
        count2=0;
    end
end

percent_attainment= {percent_attainment', percent_attainmentgd', percent_attainmentei'};
Best_metrics=[NormalizedHV, BestGD, Bestepsilon];
Names={'Hypervolume' 'Generational Distance' 'Epsilon Indicator'};

l= [2,4,6];
for i=1:length(percent_attainment)

    subplot(1,3,i)
    imagesc(1, k, percent_attainment{i})
    set(gca,'ydir','normal')
    colormap(flipud(gray))
    hold on
    scatter(1, Best_metrics(i), 80, 'k', 'filled')
    set(gca,'XTick')
    xlabel(Names(i))
    ylabel('Probability of Best Metric Value');
    if i==2
        title('Attainment probabilities for NSGAII, DTLZ2_3')
        legend('Best metric value across all runs')
    end
end

You should obtain something like this for NSGAII:

Featured image

and for Borg:

Featured image

The previous plots give us a sense on how reliable these algorithms are.  The black circle on top of each bar is the best overall metric for a single seed run.    The gray shading is the probability of attaining a given fraction of the best metric value, these fractions are given in the vertical axis.  Ideal performance would be depicted as a completely black bar and a circle at 1.

3) Reference Set Contribution

Required files:

☑ DTLZ2_combined.pf

☑ NSGAII_DTLZ2_3.reference

☑ Borg_DTLZ2_3.reference

☑ MOEAFramework-4.0-Demo.jar

From your terminal, navigate your working directory and make sure that you have the required files.  Type the following command to obtain the reference set contribution.

$ java -cp MOEAFramework-4.0-Demo.jar org.moeaframework.analysis.tools.SetContribution --reference DTLZ2_3_combined.pf NSGAII_DTLZ2_3.reference Borg_DTLZ2_3.reference

NSGAII_DTLZ2_3.reference 0.1696113074204947

Borg_DTLZ2_3.reference 0.8303886925795053

From the combined reference set, Borg contributed with ≈  83% and NSGAII contributed with ≈ 17% of the total solutions. This analysis gets more interesting when we have more algorithms to compare. In upcoming tutorials we’ll test more MOEAs and look at ways to visualize their Pareto front approximation and their reference set contribution.

Go back to ← MOEA diagnostics for a simple test case (Part 2/3)