Profiling your Python script using cProfile

Profiling refers to performing dynamic analysis on a script to measure its execution time, the execution time of its subcomponents, as well as how many times each subcomponent is being called. This produces data on where the script program is spending the most time, and can help with optimizing your script to minimize its execution time. This blog has had two past posts on profiling, one on C++ using Callgrind and one on Python using PyCharm. PyCharm is a Python IDE that’s very useful but unfortunately not free, so if you’re looking for some freeware profiling functionality in Python, this post is for you.

Python has a module called cProfile. A simple example on timing the multiplication of two matrices with cProfile:

import cProfile
import numpy as np

mat1 = ([1, 6, 3],[3, 6, 3],[2, 8, 3]) 
mat2 = ([2, 7, 6],[5, 4, 7],[6, 1, 9]) 
  
cProfile.run('np.dot(mat1,mat2)')

this should print out something like the following:

         4 function calls in 0.000 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {built-in method numpy.core.multiarray.dot}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

I have recently used cProfile on one of my own scripts which I’ll be using here to demonstrate how it can be used in your own work. I have a function called fish_game, which contained my model and took vars as input. This function also calls function hrvSTR which represented my action policy function (it’s extraneous to this post what these functions do exactly, one represents the system and the other represents a policy that we use to act on the system, you can see the full model here). The fish_game function was called by my optimization algorithm during optimization. Running cProfile on it produces this:

cProfile.run('fish_game(vars)',sort='cumulative')
         282814 function calls in 0.698 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.698    0.698 {built-in method builtins.exec}
        1    0.000    0.000    0.698    0.698 <string>:1(<module>)
        1    0.166    0.166    0.698    0.698 <ipython-input-4-df258d5a749f>:55(fish_game)
    20200    0.385    0.000    0.531    0.000 <ipython-input-4-df258d5a749f>:1(hrvSTR)
    20200    0.016    0.000    0.089    0.000 fromnumeric.py:1821(sum)
    20200    0.021    0.000    0.069    0.000 fromnumeric.py:64(_wrapreduction)
   121208    0.055    0.000    0.055    0.000 {built-in method numpy.core.multiarray.zeros}
    20200    0.046    0.000    0.046    0.000 {method 'reduce' of 'numpy.ufunc' objects}
    20200    0.003    0.000    0.003    0.000 {built-in method builtins.isinstance}
    40400    0.003    0.000    0.003    0.000 {built-in method builtins.len}
    20200    0.002    0.000    0.002    0.000 {method 'items' of 'dict' objects}
        2    0.000    0.000    0.000    0.000 {method 'normal' of 'mtrand.RandomState' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

This tells me one run of my function takes 0.698 seconds in total (this might vary slightly every time, depending on your processor usage at the time and other factors), and that most of that time, 0.531 seconds, are consumed by the hrvSTR function. Even though 0.7 seconds might not seem long, in the context of optimization, where this function would need to be evaluated tens of thousands of times, an additional 0.1 seconds might add hours of process time to your workflow. Trying to bring that down is probably a worthwhile investment of time that will result in time savings later on. As a result I figured there could be something I could do to reduce the time hrvSTR took. I particularly intrigued by the fact that some numpy process numpy.core.multiarray.zeros was called 121208 times, an order of magnitude more than every other method in my script, which prompted me to think that I might be unnecessarily repeating a process.

Looking at my code more closely (this is a script I have been using for more than a year now), I realized that I was ordering and normalizing and creating arrays for my action policy every single time it was called, something that was unnecessary, as the same policy was used for all time steps. I could instead perform all those steps once, save the outputs and use them for every time step instead of recalculating every time. I spent some time to adjust my script to do that and running cProfile again, produced this:

 cProfile.run('fish_game(vars)',sort='cumulative')
         79282 function calls in 0.379 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.379    0.379 {built-in method builtins.exec}
        1    0.000    0.000    0.379    0.379 <string>:1(<module>)
        1    0.150    0.150    0.379    0.379 <ipython-input-16-3c528334eb55>:62(fish_game)
    19800    0.193    0.000    0.229    0.000 <ipython-input-16-3c528334eb55>:35(hrvSTR)
    59414    0.036    0.000    0.036    0.000 {built-in method numpy.core.multiarray.zeros}
        4    0.000    0.000    0.000    0.000 fromnumeric.py:2817(mean)
        4    0.000    0.000    0.000    0.000 _methods.py:58(_mean)
        2    0.000    0.000    0.000    0.000 <ipython-input-16-3c528334eb55>:4(generate_policy)
        6    0.000    0.000    0.000    0.000 {method 'reduce' of 'numpy.ufunc' objects}
        2    0.000    0.000    0.000    0.000 {method 'normal' of 'mtrand.RandomState' objects}
        2    0.000    0.000    0.000    0.000 fromnumeric.py:1821(sum)
        2    0.000    0.000    0.000    0.000 fromnumeric.py:64(_wrapreduction)
        4    0.000    0.000    0.000    0.000 {built-in method builtins.hasattr}
        4    0.000    0.000    0.000    0.000 _methods.py:48(_count_reduce_items)
        2    0.000    0.000    0.000    0.000 {method 'clip' of 'numpy.generic' objects}
        4    0.000    0.000    0.000    0.000 numeric.py:504(asanyarray)
       10    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
        8    0.000    0.000    0.000    0.000 {built-in method builtins.issubclass}
        4    0.000    0.000    0.000    0.000 {built-in method numpy.core.multiarray.array}
        2    0.000    0.000    0.000    0.000 {method 'items' of 'dict' objects}
        4    0.000    0.000    0.000    0.000 {built-in method builtins.len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

I basically shaved 0.3 seconds off my function evaluation time by investing some time to look more closely at my script. If optimizing with 30000 function evaluations, this translates to some 2+ hours of processing time that I am saving (I spent far less figuring this out for the first time).

You can also run cProfile directly from the command line like so:

python -m cProfile -s cumtime fish_game.py

2 thoughts on “Profiling your Python script using cProfile

  1. Pingback: Scaling experiments: how to measure the performance of parallel code on HPC systems – Water Programming: A Collaborative Research Blog

  2. Pingback: Cythonizing your Python code, Part 2: Next steps with reservoir simulation – Water Programming: A Collaborative Research Blog

Leave a comment