September 20, 2021 by Rohini Gupta

Teaching Tools for Complex Adaptive Systems

This semester, I am taking a foundational class in the Systems Engineering department here at Cornell and I wanted to use this blog post to relay some cool packages and tools that we have used that hopefully can be useful teaching material for emerging faculty or anyone looking for interactive systems tutorials.

To begin, we have to first define what falls under the umbrella of complex adaptive systems. In a nutshell, these systems tend to (1) have networks of many components, (2) typically involve non-linear interactions between components, (3) exhibit self-organizing behavior, (4) have the potential to exhibit emergent properties. One really beautiful website that explains these properties in more detail is Complexity Explained, which started as a community outreach project to try to explain complex systems to a wider audience within the science community and the public. The website features interactive animations of systems properties and a short booklet that can be downloaded (in many languages) with key concepts.

It is well known that complex systems are hard for humans to understand because many of the characteristics are non-intuitive for us. For example, self-organizing behavior is often contradictory to our own lives (when can you remember a time that a system around you naturally seemed to become more orderly as time passed?). Emergent properties can come about in long time scales that are often far distanced from the original action. We can’t always understand how decisions on the microscale resulted in large macroscale processes. Thus, in order to best approach complex systems, we must have the ability to interact with them, model them, and map out their complex behavior under many conditions. Below, I am introducing some tools that might help foster more understanding of these ideas using simple, yet dynamically rich cases.

PyCX

One of the main creators of the Complexity Explained website and a visiting lecturer to my systems class is Hiroki Sayama, a world-renowned researcher and director of the Center for Collective Dynamics of Complex Systems at Binghamton University. Dr. Sayama has created a python package called PyCX that contains sample Python codes of complex systems that a user can run interactively and then manipulate or build off of. Simply download the package off of GitHub and all of the code and a simulator will be available to you. Figure 1 shows an example interactive simulation of a Turing pattern. In 1952, Alan Turing authored a paper where he described how patterns in animals’ coats such such as stripes and spots, can arise naturally from a chaotic system. He uses a simple set of reaction-diffusion equations to describe this process. Figure 1 shows the python simulator in PyCX, the equation for the Turing pattern, and the evolution from the random initialization to the ordered spots.

PyCX also allows you to toggle the parameters of the problem, which can express how small perturbations in the system can lead to substantially different outcomes. You can adjust these parameters within the source python code (which I believe is more useful for students rather than just clicking a “play” button). Figure 2 shows the difference in behavior across a forest fire model when the initial density is adjusted from 35% to 40% of the space.

Figure 2: The effect of initial conditions in a forest fire agent-based model

Golly- Game of Life Simulator

Golly is an open-source tool for visualizing cellular automata, including Conway’s Game of Life. Golly allows the user to draw different patterns and apply specific rules for how the systems evolve. You can stop the simulation midway and apply different rules to the existing patterns.

Swarm Behavior

Dr. Sayama also developed a really interesting Java application to study swarm behavior, or collective behavior that is exhibited by entities, typically animals. This application, called swarm chemistry creates agents with different kinetic parameters that dictate dynamics. The application allows you to mix agents into a single population and observe how emergent dynamics form. Figure 4 shows the opening interface when you click the .jar executable. The application brings up 6 random agents that exhibit some dynamic behavior. By clicking on any two agents, you will create a new population that shows how the dynamics of the agents interact (Figure 5). You can keep mixing agents and adding more random swarms. You can individually mutate certain swarms or edit the parameters as well. The pictures do not do this application justice. It is super fun (and slightly addicting) and a great way to get students excited about the concepts.

Figure 4: Swarm Chemistry Opening Interface

I had so much fun using these packages in class and I hope that these tools can help you/your students become more engaged and excited about complex systems!

References

My knowledge of these tools came from Hiroki Sayama’s guest lectures in SYSEN 6000 at Cornell University and from:

Sayama, H. (2015) Introduction to the Modeling and Analysis of Complex Systems,Open SUNY Textbooks, Milne Library, State University of New York at Geneseo.

September 13, 2021 by Lillian Lau

A non-intimidating introduction to parallel computing with Numba

This blog post is adapted from material I learned during the 2021 San Diego Supercomputer Center (SDSC) Summer Institute. This was an introductory boot camp to high-performance computing (HPC), and one of the modules taught the application of Numba for in-line parallelization and speeding up of Python code.

What is Numba?

According to its official web page, Numba is a just-in-time (JIT) compiler that translates subsets of Python and NumPy code into fast machine code, enabling it to run at speeds approaching that of C or Fortran. This is becuase JIT compilation enables specific lines of code to be compiled or activated only when necessary. Numba also makes use of cache memory to generate and store the compiled version of all data types entered to a specific function, which eliminates the need for recompilation every time the same data type is called when a function is run.

This blog post will demonstrate a simple examples of using Numba and its most commonly-used decorator, @jit, via Jupyter Notebook. The Binder file containing all the executable code can be found here.

Note: The ‘@‘ flag is used to indicate the use of a decorator

Installing Numba and Setting up the Jupyter Notebook

First, in your command prompt, enter:

pip install numba

Alternatively, you can also use:

conda install numba

Next, import Numba:

import numpy as np
import numba
from numba import jit
from numba import vectorize

Great! Now let’s move onto using the @jit decorator.

Using @jit for executing functions on the CPU

The @jit decorator works best on numerical functions that use NumPy. It has two modes: nopython mode and object mode. Setting nopython=True tell the compiler to overlook the involvement of the Python interpreter when running the entire decorated function. This setting leads to the best performance. However, in the case when:

nopython=True fails
nopython=False, or
nopython is not set at all

the compiler defaults to object mode. Then, Numba will manually identify loops that it can compile into functions to be run in machine code, and will run the remaining code in the interpreter.

Here, @jit is demonstrated on a simple matrix multiplication function:

# a function that does multiple matrix multiplication
@jit(nopython=True)
def matrix_multiplication(A, x):
    b = np.empty(shape=(x.shape[0],1), dtype=np.float64)
    for i in range(x.shape[0]):
        b[i] = np.dot(A[i,:], x)
    return b

Remember – the use of @jit means that this function has not been compiled yet! Compilation only happens when you call the function:

A = np.random.rand(10, 10)
x = np.random.rand(10, 1)
a_complicated_function(A,x)

But how much faster is Numba really? To find out, some benchmarking is in order. Jupyter Notebook has a handy function called %timeit that runs simple functions many times in a loop to get their average execution time, that can be used as follows:

%timeit matrix_multiplication(A,x)

# 11.4 µs ± 7.34 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

Numba has a special .py_func attribute that effectively allows the decorated function to run as the original uncompiled Python function. Using this to compare its runtime to that of the decorated version,

%timeit matrix_multiplication.py_func(A,x)

# 35.5 µs ± 3.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

From here, you can see that the Numba version runs about 3 times faster than using only NumPy arrays. In addition to this, Numba also supports tuples, integers, floats, and Python lists. All other Python features supported by Numba can be found here.

Besides explicitly declaring @jit at the start of a function, Numba makes it simple to turn a NumPy function into a Numba function by attaching jit(nopython=True) to the original function. This essentially uses the @jit decorator as a function. The function to calculate absolute percentage relative error demonstrates how this is done:

# Calculate percentage relative error
def numpy_re(x, true):
    return np.abs(((x - true)/true))*100

numba_re = jit(nopython=True)(numpy_re)

And we can see how the Number version is faster:

%timeit numpy_re(x, 0.66)
%timeit numba_re(x, 0.66)

where the NumPy version takes approximately 2.61 microseconds to run, while the Numba version takes 687 nanoseconds.

Inline parallelization with Numba

The @jit decorator can also be used to enable inline parallelization by setting its parallelization pass parallel=True. Parallelization in Numba is done via multi-threading, which essentially creates threads of code that are distributed over all the available CPU cores. An example of this can be seen in the code snippet below, describing a function that calculates the normal distribution of a set of data with a given mean and standard deviation:

SQRT_2PI = np.sqrt(2 * np.pi)

@jit(nopython=True, parallel=True)
def normals(x, means, sds):
    n = means.shape[0]
    result = np.exp(-0.5*((x - means)/sds)**2)
    return (1 / (sds * np.sqrt(2*np.pi))) * result

As usual, the function must be compiled:

means = np.random.uniform(-1,1, size=10**8)
sds = np.random.uniform(0.1, 0.2, size=10**8)

normals(0.6, means, sds)

To appreciate the speed-up that Numba’s multi-threading provides, compare the runtime for this with:

A decorated version of the function with a disabled parallel pass
The uncompiled, original NumPy function

The first example can be timed by:

normals_deco_nothread = jit(nopython=True)(normals.py_func)
%timeit normals_deco_nothread(0.6, means, sds)

# 3.24 s ± 757 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The first line of the code snippet first makes an uncompiled copy of the normals function, and then applies the @jit decorator to it. This effectively creates a version of normals that uses @jit, but is not multi-threaded. This run of the function took approximately 3.3 seconds.

For the second example, simply:

%timeit normals.py_func(0.6, means, sds)

# 7.38 s ± 759 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Now, compare both these examples to the runtime of the decorated and multi-threaded normals function:

%timeit normals(0.6, means, sds)

# 933 ms ± 155 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The decorated, multi-threaded function is significantly faster (933 ms) than the decorated function without multi-threading (3.24 s), which in turn is faster than the uncompiled original NumPy function (7.38 s). However, the degree of speed-up may vary depending on the number of CPUs that the machine has available.

Summary

In general, the improvements achieved by using Numba on top of NumPy functions are marginal for simple, few-loop functions. Nevertheless, Numba is particularly useful for large datasets or high-dimensional arrays that require a large number of loops, and would benefit from the one-and-done compilation that it enables. For more information on using Numba, please refer to its official web page.

September 8, 2021 by Antonia Hadjimichael

Simple profiling checks for running jobs on clusters

The goal of this short blog post is to share some simple tips on profiling your (to be) submitted jobs on high performance computing resources. Profiling your jobs can give you information about how efficiently you are using your computational resources, i.e., your CPUs and your allocated memory. Typically you would perform these checks on your experiment at a smaller scale, ensuring that everything is working as it should, before expanding to more tasks and CPUs.

Your first check is squeue typically paired with your user ID on a cluster. Here’s an example:

(base) [ah986@login02 project_dir]$ squeue -u ah986
             JOBID PARTITION     NAME      USER  ST       TIME  NODES NODELIST(REASON) 
           5688212    shared <job_name>    ah986  R       0:05      1 exp-4-55

This tells me that my submitted job is utilizing 1 node in the shared partition of this cluster. If your cluster is using the SLURM scheduler, you can also use sacct which can display accounting data for all jobs you are currently running or have run in the past. There’s many pieces of information available with sacct, that you can specify using the --format flag. Here’s an example for the same job:

(base) [ah986@login02 project_dir]$ sacct --format=JobID,partition,state,time,start,end,elapsed,nnodes,ncpus,nodelist,AllocTRES%32 -j 5688212
       JobID  Partition      State  Timelimit               Start                 End    Elapsed   NNodes      NCPUS        NodeList                        AllocTRES 
------------ ---------- ---------- ---------- ------------------- ------------------- ---------- -------- ---------- --------------- -------------------------------- 
5688212          shared    RUNNING   20:00:00 2021-09-08T10:55:40             Unknown   00:19:47        1        100        exp-4-55 billing=360000,cpu=100,mem=200G+ 
5688212.bat+               RUNNING            2021-09-08T10:55:40             Unknown   00:19:47        1        100        exp-4-55          cpu=100,mem=200G,node=1 
5688212.0                  RUNNING            2021-09-08T10:55:40             Unknown   00:19:47        1        100        exp-4-55          cpu=100,mem=200G,node=1

In this case I can see the number of nodes (1) and the number of cores (100) utilized by my job as well as the resources allocated to it (100 CPUs and 200G of memory on 1 node). This information is useful in cases where a task launches other tasks and you’d like to diagnose whether the correct number of cores is being used.

Another useful tool is seff, which is actually a wrapper around sacct and summarizes your job’s overall performance. It is a little unreliable while the job is still running, but after the job is finished you can run:

(base) [ah986@login02 project_dir]$ seff 5688212
Job ID: 5688212
Cluster: expanse
User/Group: ah986/pen110
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 100
CPU Utilized: 1-01:59:46
CPU Efficiency: 68.16% of 1-14:08:20 core-walltime
Job Wall-clock time: 00:22:53
Memory Utilized: 38.25 GB
Memory Efficiency: 19.13% of 200.00 GB

The information here is very useful if you want to find out about how efficiently you’re using your resources. For this example I had 100 separate tasks I needed to perform and I requested 100 cores on 1 node and 200 GB of memory. These results tell me that my job completed in 23mins or so, the total time using the CPUs (CPU Utilized) was 01:59:46, and most importantly, the efficiency of my CPU use. CPU Efficiency is calculated “as the ratio of the actual core time from all cores divided by the number of cores requested divided by the run time”, in this case 68.16%. What this means it that I could be utilizing my cores more efficiently by allocating fewer cores to the same number of tasks, especially if scaling up to a larger number of nodes/cores. Additionally, my allocated memory is underutilized and I could be requesting a smaller memory allocation without inhibiting my runs.

Finally, while your job is still running you can log in the node(s) executing the job to look at live data. To do so, you simply ssh to one of the nodes listed under NODELIST (not all clusters allow this). From there, you can run the top command like below (with your own username), which will start the live task manager:

(base) [ah986@r143 ~]$ top -u ah986

top - 15:17:34 up 25 days, 19:55,  1 user,  load average: 0.09, 12.62, 40.64
Tasks: 1727 total,   2 running, 1725 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.3 us,  0.1 sy,  0.0 ni, 99.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 257662.9 total, 249783.4 free,   5561.6 used,   2317.9 buff/cache
MiB Swap: 716287.0 total, 716005.8 free,    281.2 used. 250321.1 avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                              
 78985 ah986     20   0  276212   7068   4320 R   0.3   0.0   0:00.62 top                                                                                                                                  
 78229 ah986     20   0  222624   3352   2936 S   0.0   0.0   0:00.00 slurm_script                                                                                                                         
 78467 ah986     20   0  259464   8128   4712 S   0.0   0.0   0:00.00 srun                                                                                                                                 
 78468 ah986     20   0   54520    836      0 S   0.0   0.0   0:00.00 srun                                                                                                                                 
 78481 ah986     20   0  266404  19112   4704 S   0.0   0.0   0:00.24 parallel                                                                                                                             
 78592 ah986     20   0  217052    792    720 S   0.0   0.0   0:00.00 sleep                                                                                                                                
 78593 ah986     20   0  217052    732    660 S   0.0   0.0   0:00.00 sleep                                                                                                                                
 78594 ah986     20   0  217052    764    692 S   0.0   0.0   0:00.00 sleep                                                                                                                                
 78595 ah986     20   0  217052    708    636 S   0.0   0.0   0:00.00 sleep                                                                                                                                
 78596 ah986     20   0  217052    708    636 S   0.0   0.0   0:00.00 sleep                                                                                                                                
 78597 ah986     20   0  217052    796    728 S   0.0   0.0   0:00.00 sleep                                                                                                                                
 78598 ah986     20   0  217052    732    660 S   0.0   0.0   0:00.00 sleep

Memory and CPU usage can be tracked from RES and %CPU columns respectively. In this case, for the sake of an example, I just assigned all my cores to sleep a certain number of minutes each (using no CPU or memory). Similar information can also be obtained using the ps command, with memory being tracked under the RSS column.

 (base) [ah986@r143 ~]$ ps -u$USER -o %cpu,rss,args
%CPU   RSS COMMAND
 0.0  3352 /bin/bash /var/spool/slurm/d/job3509431/slurm_script
 0.0  8128 srun --export=all --exclusive -N1 -n1 parallel -j 100 sleep {}m ::: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
 0.0   836 srun --export=all --exclusive -N1 -n1 parallel -j 100 sleep {}m ::: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
 0.1 19112 /usr/bin/perl /usr/bin/parallel -j 100 sleep {}m ::: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
 0.0   792 sleep 3m
 0.0   732 sleep 4m
 0.0   764 sleep 5m
 0.0   708 sleep 6m
 0.0   708 sleep 7m
 0.0   796 sleep 8m
 0.0   732 sleep 9m
 0.0   712 sleep 10m

Water Programming: A Collaborative Research Blog

Tips and tricks on programming, evolutionary algorithms, and doing research

Month: September 2021