Using Docker with Virtual Machines

This post is a continuation of my previous post on accessing virtual machines on RedCloud. In this post, you will learn how to run a Docker container on a VM Ubuntu image, but you can also do this tutorial with Ubuntu on a local machine.

Why Containerize Code?

Let’s use an example: Say that you have three Python applications that you want to run on your computer, but they all use different version of Python or its packages. We cannot host these applications at the same time using Python on our computer…so what do we do? Think of another case- you want to share neural network code with a collaborator, but you don’t want to have to deal with the fact that it’s incredibly difficult for them to get TensorFlow and all its dependencies downloaded on their machine. Containerization allows us to isolate the installation and running of our application without having to worry about the setup of the host machine. In other words, everything someone would need to run your code will be provided in the container. Shown in Figure 1, each container shares the host OS’s kernel but has a virtual copy of some of the components of the OS. This allows the containers to be isolated from each other while running on the same host. Therefore, containers are exceptionally light and take only seconds to start [1].

Picture1

Fig 1. Docker Container Example [1]

Docker

Docker started as a project to build single-application LXC (linux) containers that were more portable and flexible, but is now its own container runtime environment. It is written in GO and developed by DotCloud (a PaaS company). A Docker engine is used to build and manage a Docker image, which is just a template that contains the application and all of its dependencies that are required for it to run. A Docker container is a running instance of a Docker image. A Docker engine is composed of three main parts: a server known as the Docker daemon, a rest API, and a client. The docker daemon creates and manages images and containers, the rest API helps to link the server and applications, and the client (user) interacts with the docker daemon through the command line (Figure 2).

Picture2

Fig.2 Docker Engine [2]

Running a Docker Container on Ubuntu

  1. We will be setting up and running a Docker container that contains code to train a rainfall-runoff LSTM model built in Python using Tensorflow. The Github repo with the LSTM code and data can be downloaded here.
  2. Spin up a VM instance (Ubuntu 18.04) or use your own machine if you have a partition.
  3. I use MobaXterm to SSH into the VM instance and drag the HEC_HMS_LSTM folder into my home directory.
  4. Within the directory, you will see 2 additional files that are not part of the main neural network code: requirements.txt and jupyter.dockerfile. A requirements.txt file contains a list of only the packages that are necessary to run your application. You can make this file with just two lines of code: pip install pipreqs and then pipreqs  /path/to/project. The created file will look like this:
requirements

requirements.txt file

The jupyter.dockerfile is our Dockerfile. It is a text file that contains all the commands a user could call on the command line to assemble an image.

dockerfile

Dockerfile

Here in our Dockerfile, we are using the latest Ubuntu image and setting our working directory to the HEC_HMS_LSTM folder that has the neural network code, Hourly_LSTM.py, that we would like to execute. We start by looking for updates and then install Python, pip, and jupyter. Then we need to take our working directory contents and copy them into the container. We copy the requirements.txt file into the container and then add the whole HEC_HMS_LSTM folder into the container. We then use pip to install all of the packages listed in requirements.txt. Finally, we instruct the docker daemon to run the python script, Hourly_LSTM.py.

5. Next we have to download docker onto Ubuntu using the command line. This is the only thing that anyone will have to download to run your container. The steps to download Docker for Ubuntu through the command line are pretty easy using this link. You may be asked to allow automatic restarts during the installation, and you should choose this option. When Docker is done downloading, you can check to see that the installation was successful by seeing if the service is active.

dockerdownload

Successful Docker Installation

6. Now that Docker is downloaded, we can build our Docker image and then run the container. We build the image using: 

sudo docker build -f jupyter.dockerfile -t jupyter:latest .

In this command, the -f flag denotes the name of the dockerflile and -t is the name that we would like to tag our image with. If you’re running multiple containers, tagging is helpful to distinguish between the containers. The build will go through each step of the dockerfile. Be cognizant of how much space you have on your disk to store the downloaded Docker, the python libraries and the data you will generate; you may have to go back and resize your VM instance. We can then run our image using:

sudo docker run jupyter:latest

You’ll see the neural network training and the results of the prediction.

NNFinal

Successfully training the neural network in a Docker Container on a virtual machine

 

Credits:

[1]https://www.freecodecamp.org/news/docker-simplified-96639a35ff36/

[2]https://docs.docker.com

 

 

 

 

Accessing a Virtual Machine in Red Cloud

This blog post is an introduction to Red Cloud- a cloud computing service that is maintained by Cornell’s Center for Advanced Computing (CAC). Red Cloud is a private research cloud and can only be accessed by those with a subscription, but exploratory accounts are available for free for Cornell students, faculty and staff.

Subscriptions to cloud systems such as Red Cloud allow access to a variety of remote computing sources within seconds. Users can request instances, or virtual machines (VMs), of a variety of configurations ranging from CPUs to GPUs with varying amounts of RAM. In Red Cloud, users can access instances with up to 28 core and 240 GB of RAM. In this post, I’ll go through the very basic steps you need to access a VM through Red Cloud. These steps should generally apply to any cloud system that uses OpenStack as their cloud computing platform.

Step 1: Accessing OpenStack

OpenStack is a cloud operating system that will allow us to access the Red Cloud resources through a simple web interface. Log in with your CAC username and password (for the Reed Group: your credentials to access the Cube). This will lead you to an overview page that shows your usage of the resources.

Figure1

OpenStack Login

Click on the Images tab. This shows the virtual machines that are available for use. You can access machines that have Linux distributions such as Centos (a -cuda means that these images can support GPUs) or Ubuntu. VMs usually have very minimal software installed, so there are also various images with pre-loaded software like Matlab.

OpenStack Overview

OpenStack Overview Page

Figure3

Available Images

Step 2: Creating a Key Pair

A key pair needs to be set up before launching to allow secure access to your instance through SSH authentication. You can create a new key pair under the Key Pairs tab.

Figure4

Creating a Key Pair (Source: CAC)

Give it a meaningful name and copy the private key to a text file. Change the extension to a .pem file and store it somewhere convenient on your computer.

Step 3: Creating a Security Group

A security group allows you to control how you to specify what internet traffic can come from (ingress) or go to (egress) the instance. You can create your own, but for now, we will use Red Cloud’s default security group and add rules to that. Click on “Manage Rules”.

Figure5

Overview of Security Groups

You’ll see a variety of Ingress/Egress rules already in the group.

Figure6

Adding Rules to the Security Group

However, if you’re accessing a Linux-based VM, you will need to also allow access through an SSH command. Click on “Add Rule” and then choose “SSH” in the first drop-down menu and then “Add”. The SSH rule will now be listed as one of your rules. There are many options for rules including restricting access to only Cornell IP addresses etc.

Figure7

Adding an SSH Rule

 Step 3: Launch an Image

Now we have all the tools to launch an instance. Under the Compute and then Instances tab, you will have the option to launch an instance.

Figure8

Launching an Instance

Under the Details tab, give your instance a name.

Figure9

Naming Your Instance

Under the Source tab, choose your instance. I’ll go with the latest stable version of Ubuntu and then click the up arrow.

Figure10

Choosing an Image

Then, choose your flavor. It’s recommended to start with the lowest RAM (8GB), especially if you are just starting to explore cloud computing. You can always scale up when your image is launched if you need to.

Figure11

Choosing a Flavor

Under the Security Group tab, check to see that the default security group is selected. Then choose your key pair under the Key Pair tab. Great, now we can launch the instance by clicking the blue “Launch Instance” button.

The instance will go through a variety of tasks until it stabilizes at “Running”.

Figure12

Instance Status

Now we can SSH into our remote server using the IP address that is listed for the instance. I’ll use MobaXterm to start a local terminal and navigate into the directory where I saved my private key. Use the following command, inserting in the IP address of your instance and the name of your key.

Figure13

SSH-ing into the Ubuntu VM

Now we’ve entered into our Ubuntu machine and we can interact with it using the command line. Enjoy!

Figure14

Ubuntu Image!

Once you are done, make sure to either shelve (if you want your disk contents to be unchanged) or delete your instance. Even if the machine is idle, this prevents it from being used by other users, so you will still be billed.

Figure15

Shelving an Instance

In the next tutorial, I’ll describe Docker and how you can use VMs to run containerized code.

Acknowledgements: Much of the information shared in this tutorial comes from many conversations with staff at CAC, particularly Peter Vaillancourt and the Red Cloud wiki.