If you’ve ever run into software dependency issues when trying to download and run a computer model on a new machine, you’re in good company. Dependency issues are a common headache for modelers, whether you are trying to run someone else’s code or share your own code with collaborators. Problems can arise due to differences in operating system (e.g., Windows vs OSX vs Ubuntu), compilers (e.g., Gnu vs Intel), software versions (e.g., Python 3.3 vs 3.10), or other factors. These challenges are exacerbated on high performance computing (HPC) clusters, which often have highly specialized architectures and software environments that are beyond our control. For example, I regularly develop code on my laptop, test parallel jobs on TheCube cluster at Cornell, and finally run much larger production jobs on the Stampede2 cluster at TACC. Ensuring compatibility across these three environments can be a challenge, especially for models involving complex software interdependencies and/or MPI parallel communication. Thankfully, containers offer an elegant solution to many of these challenges.
Containers such as Docker allow you to package an application in a self-contained, ready-to-run package. The container includes its own operating system and all libraries and dependencies needed to run your code. The application then runs inside this custom container where it is isolated from the individual machine that you are running it on, making it much simpler to build portable applications. This has several benefits for researchers by allowing you to spend more time on scientific research and model building and less time on installing finicky software dependencies. Additionally, it can encourage “open science” and collaboration by streamlining the process of sharing and borrowing code.
However, HPC clusters present some unique challenges to the “build once, run anywhere” mantra of containerization. First, Docker containers generally require root permissions that are disallowed on shared supercomputers. This challenge can be overcome by using a second type of container called Singularity. Second, when running parallel applications with MPI, or running applications on GPUs, the container must be built in a way that is consistent with the hardware and software setup of the cluster. This challenge is harder to address, and does reduce the portability of the container somewhat, but as we shall see it can still be overcome with careful container design.
This blogpost will demonstrate how to build containers to run parallelized code on HPC clusters. We will build two different containers: one which can be run on two different TACC clusters, Stampede2 and Frontera, and one which can be run on two different Cornell Advanced Computing (CAC) clusters, TheCube and Hopper. Both of these can also be run on a personal laptop. If you want to follow along with this post and/or containerize your own code, the first thing you will need to do is to download and install Docker Desktop and set up an account on DockerHub. You can find instructions for doing that, as well as a lot more great information on containerization and HPC, in this documentation from the Texas Advanced Computing Center (TACC). You can also find more information on Docker in earlier Water Programming Blog posts by Lillian Lau, Rohini Gupta, and Charles Rouge.
Containerizing a Cython reservoir model with Docker
All code related to this blogpost can be found in the “WPB_containerization” directory of my “cython_tutorial” GitHub repository, here. If you want to follow along, you can clone this repository and navigate in your terminal to the WPB_containerization directory. This directory contains a simple reservoir simulation model which is built using Cython, a software used to convert Python-like code into typed, compiled C code, which can significantly improve execution time. Cython is beyond the scope of this blog post (though I may cover it in a future post, so stay tuned!), but the interested reader can find an overview in the base directory of this GitHub repository, and references therein. For this post, suffice it to say that Cython introduces non-trivial software dependencies and build complexities beyond a more standard Python application, which makes it a good candidate for containerization. Additionally, this directory contains scripts for running parallel simulations across multiple nodes on an HPC cluster using MPI, which introduces additional containerization challenges.
The first container we will build is for the TACC clusters Stampede2 and Frontera. The main step involved in building a Docker container is writing the Dockerfile (see the file “Dockerfile_tacc”), which outlines the steps needed to build the container:
FROM tacc/tacc-ubuntu18-impi19.0.7-common RUN apt-get update && apt-get upgrade -y && apt-get install -y libjpeg-dev python3-pip RUN pip3 install numpy pandas matplotlib cython mpi4py ADD . /code WORKDIR /code RUN python3 setup.py build_ext --inplace RUN chmod +rx /code/run_reservoir_sim.py ENV PATH "/code:$PATH" RUN useradd -u 8877 <user>
The first line tells Docker which “base image” to start from. There are a wide variety of base images to choose from, ranging from standard operating systems (e.g., Ubuntu 18.04) to smaller specialized images for software languages like Python. Due to the complexity of TACC’s hardware and software infrastructure, they provide special base images that are set up to efficiently interface with the clusters’ particular processors, compilers, communication networks, etc. The base image imported here is designed to operate on both Stampede2 and Frontera. In general, it should also work on your local computer as well, which helps streamline application development.
The second line updates the Ubuntu operating system and installs Python3 and another necessary library. Third, we install Python libraries needed for our application, including Cython and mpi4py. We then add the contents of the current working directory into the container, in a new directory called /code, and make this the working directory inside the container. Next, we run setup.py, which cythonizes and compiles our application. We then make the file “run_reservoir_sim.py” executable, so that we can run it directly from the container, and add the /code directory to the container’s path. Lastly, we change the user from root to (fill in your user profile name on your computer) to avoid permissions issues on some machines.
To build the container, we run the command:
docker build -f Dockerfile_tacc -t <username>/cython_reservoir:0.1 .
where should be replaced with your DockerHub username. This command will run each line in our Dockerfile to build the image of the container, then push the image to Dockerhub and create a new project called “cython_reservoir”, with the version tag 0.1.
To run the container locally, we run the command:
docker run --rm -v <working_directory>/results:/code/results <username>/cython_reservoir:0.1 mpirun -np 4 run_reservoir_sim.py
This command has several parts. First, “docker run” means to create and start a container from the image (this is like instantiating an object from a class). “—rm” means that the container can be removed after running (just the instantiated container, not the image itself). Next, the “-v” flag creates a directory mount from “code/results” inside the container to “/results” on your machine, where should be replaced with the full path of your working directory. This allows us to access the simulation results even after the container is removed. Lastly, “mpirun -np 4 run_reservoir_sim.py” tells the container to run the script across four processors – this will run four independent simulations and store each result as a csv file in the results directory.
Running on TACC clusters with Singularity
Now that we have verified that this container works on a local laptop, let’s move to Stampede2, a supercomputer owned by TACC (with allocations managed through NSF XSEDE). Unfortunately, Docker isn’t allowed on TACC’s clusters due to permission issues. However, another containerization software called Singularity is allowed, and has the ability to run containers created through Docker. After logging into the cluster through the terminal, we need to load the Singularity module:
module load tacc-singularity
We also need to create a directory called “results”. TACC doesn’t allow Singularity to be run directly from the login node, so we need to request an interactive session:
idev -m 30
When the interactive session on a compute node has been granted, we can “pull” our container from DockerHub:
singularity pull docker://ahamilton144/cython_reservoir:0.1
Singularity creates and downloads a single file, cython_reservoir_0.1.sif, as a Singularity representation of the Docker container. Once this has downloaded, you can exit your interactive session. We will run the job itself non-interactively by submitting it to the SLURM scheduler, just like you would with a non-containerized job:
This submission script, in addition to typical SLURM directives, contains the line
mpirun singularity run cython_reservoir_0.1.sif run_reservoir_sim.py
And that’s it! This command will run the simulation across all processors assigned to the job, and store the results in the results folder. These same commands and container can also be used on the Frontera cluster (also owned by TACC), using the “submit_reservoir_frontera.sh” script. As you can see, this makes it straightforward to develop and run code across your local laptop and two different TACC clusters, without having to worry about software dependency issues.
Customizing containers to run on non-TACC clusters
Unfortunately, it’s not quite so simple to port containers to other HPC systems if your application involves MPI parallel processing. This is because it is necessary for the MPI configuration inside the container to be interoperable with the MPI configuration in the cluster’s environment. TACC’s customized base images take care of this complexity for us, but may not port to alternative systems, such as CAC’s TheCube and Hopper. In this case, we need to provide explicit instructions in the Dockerfile for how to set up MPI:
### start with ubuntu base image FROM ubuntu:18.04 ### install basics, python3, and modules needed for application RUN apt-get update && apt-get upgrade -y && apt-get install -y build-essential zlib1g-dev libjpeg-dev python3-pip openssh-server RUN pip3 install Pillow numpy pandas matplotlib cython ### install openMPI version 4.0.5, consistent with Hopper & TheCube RUN wget 'https://www.open-mpi.org/software/ompi/v4.0/downloads/openmpi-4.0.5.tar.gz' -O openmpi-4.0.5.tar.gz RUN tar -xzf openmpi-4.0.5.tar.gz openmpi-4.0.5; cd openmpi-4.0.5; ./configure --prefix=/usr/local; make all install RUN ldconfig ### install mpi4py now that openmpi is installed RUN pip3 install mpi4py ### add all code from current directory into “code” directory within container, and set as working directory ADD . /code WORKDIR /code ENV PATH "/code:$PATH" ### compile cython for this particular application RUN python3 setup.py build_ext --inplace ### set python file as executable so it can be run by docker/singularity RUN chmod +rx /code/run_reservoir_sim.py ### change username from root RUN useradd -u 8877 <username>
For this container, we start with a base Ubuntu image rather than TACC’s specialized image. We then install several important libraries and Python modules. The third block of commands downloads and installs OpenMPI version 4.0.5, which is the version installed on Hopper. TheCube has an older version, 3.1.4, but I found version 4.0.5 to be backwards compatible and to work on both CAC clusters. The rest of the commands are equivalent to the first Dockerfile. We can now build this second container and push it to Dockerhub as a separate version, 0.2.
docker build -f Dockerfile -t <username>/cython_reservoir:0.2 .
We can run this container on our local machine using the same command from above, but substituting 0.2 for 0.1. Similarly, we can log onto TheCube or Hopper and execute the same commands from above, again substituting 0.2 for 0.1. Note that the singularity module on CAC clusters is “singularity” rather than “tacc-singularity”. “submit_reservoir_cube.sh” and “submit_reservoir_hopper.sh” can be used to submit to the SLURM scheduler on these systems.
Thus, while MPI does compromise container portability somewhat, it is relatively straightforward to incorporate separate builds for different HPC systems using custom Dockerfiles. With the Dockerfile in place for each system, a streamlined workflow can be developed that allows for easy project development and deployment across multiple HPC systems.
Many thanks to TACC staff for teaching me all about containers at a recent virtual workshop, and to CAC staff who helped install and troubleshoot Singularity on their system.