Make your Git repository interactive with Binder

Have you ever tried to demo a piece of software you wrote only to have the majority of participants get stuck when trying to configure their computational environment? Difficulty replicating computational environments can prevent effective demonstration or distribution of even simple codes. Luckily, new tools are emerging that automate this process for us. This post will focus on Binder, a tool for creating custom computing environments that can be distributed and used by many remote users simultaneously. Binder is language agnostic tool, and can be used to create custom environments for R, Python and Julia. Binder is powered by BinderHub, an open source service in the cloud. At the bottom of this post, I’ll provide an example of an interactive Python Jupyter Notebook that I created using BinderHub.

BinderHub

BinderHub combines two useful libraries: repo2docker and JupyterHub. repo2docker is a tool to build, run and push Docker images from source code repositories. This allows you to create copies of custom environments that users can replicate on any machine. These copies are can be stored and distributed along with the remote repository. JuptyerHub is a scalable system that can be used to spawn multiple Jupyter Notebook servers. JuptyerHub takes the Docker image created by repo2docker and uses it to spawn a Jupyter Notebook server on the cloud. This server can be accessed and run by multiple users at once. By combining repo2docker and JupyterHub, BinderHub allows users to both replicate complex environments and easily distribute code to large numbers of users.

Creating your own BinderHub deployment

Creating your own BinderHub deployment is incredibly easy. To start, you need a remote repository containing two things: (1) a Jupyter notebook with supporting code and (2) configuration files for your environment. Configuration files can either be an environment.yml file (a standard configuration file that can be generated with conda, see example here) or a requirements.txt file (a simple text file that lists dependencies, see example here).

To create an interactive BinderHub deployment:

  1. Push your code to a remote repository (for example Github)
  2. Go to mybinder.org and paste the repository’s URL into the dialoge box (make sure to select the proper hosting service)
  3. Specify the branch if you are not on the Master
  4. Click “Launch”

The website will generate a URL that you can copy and share with users. I’ve created an example for our Rhodium tutorial, which you can find here:

https://mybinder.org/v2/gh/dgoldri25/Rhodium/master?filepath=DMDU_Rhodium_Demo.ipynb

To run the interactive Jupyter Notebook, click on the file titled “Rhodium_Demo.ipynb”. Happy sharing!

Introducing Julia: A Fast and Modern Language

In this blog post, I am introducing Julia, a high-level open-source dynamic programming language. While it has taken root in finance, machine learning, life sciences, energy, optimization, and economic realms, it is certainly not common within the water programming realm. Through this post, I will give an overview of the features, pros, and cons of Julia before comparing its performance to Python. Following this, I give an overview of common Julia development environments as well as linking to additional resources.

Julia: What’s Going On?

Julia is a high-level open-source dynamic programming language built for scientific computing and data processing that is Pythonic in syntax to ensure accessibility while boosting computational efficiency.  It is parallelizable on high performance computing resources utilizing CPUs and GPUs, allowing for its use in large-scale experiments.

With Version 1.1.0 recently released, Julia now stands amongst the established and stable programming languages. As such, Julia can handle matrices with ease while performing at speeds nearly comparable to C and Fortran. It is comparable to Cython/Numba and faster than Numpy, Python, Matlab, and R. Note that further benchmarking is required on a project-specific basis due to the speed of individual packages/libraries, but a simple case is shown in the following section.

The biggest case for implementing Julia over C/C++/Fortran is its simplicity in writing and implementation. Its language is similar Python, allowing for not only list comprehension but also ‘sloppy’ dynamic variable type declarations and use. Julia has the ability to move between dynamic and static variable types. Furthermore, Julia allows users to create unique variable types, allowing for maximum flexibility while maintaining efficiency.

Installing and implementing packages from the start is nearly seamless, simply requiring the name of the package and an internet connection. Most packages and libraries are hosted on GitHub and are relatively straightforward to install with just a couple lines of code. On the same hand, distributing and contributing to the opensource community is straightforward. Furthermore, Julia can access Python modules with wrappers and C/Fortran functions without, making it a very versatile language. Likewise, it can easily be called from Python (PyJulia) to speed up otherwise cumbersome operations.

Julia has relatively straightforward profiling modules built in. Beyond being able to utilize Jupyter Notebook commands, Julia has a range of code diagnostic tools, such as a @Time command to inform the user of wall time and memory allocation for a function.

One of the most obvious downsides to Julia when compared to Python is the relatively small community. There is not nearly the base of documentation that I am accustomed to; nearly every issue imaginable in the Python world has been explored on Stack Overflow, Julia is much more limited. However, because much of its functionality is based on MATLAB, similar issues and their corresponding solutions bleed over.

Additional transition issues that may be encountered is that unlike Python, Julia’s indexing begins with 1. Furthermore, Julia uses column-major ordering (like Matlab, Fortran, and R) unlike Python and C (row-major ordering), making iterating column-by-column substantially faster. Thus, special care must be taken when switching between languages in addition to ensuring consistent strategies for speeding up code. Another substantial transition that may be required is that Julia is not directly object-oriented since objects do not directly have embedded methods; instead, a given object must be passed to a function to be modified.

Overall, I see Julia as having the potential to improve computational efficiency for computationally-intensive models without the need to learn a more complex language (e.g. Fortran or C/C++). Between its ease of integration between other common languages/libraries to its ability to properly utilize HPC CPU/GPU resources, Julia may be a useful tool to utilize alongside Python to create more efficient large-scale models.

Julia Allows for Faster Matrix Operations

As a simple test to compare scalability of Julia and Python, I computed the dot product of increasingly large matrices using the base Julia language and Numpy in Python (both running in Jupyter Notebooks on a Windows desktop). You can find the code at the following link on Github.

julia_vs_python_timing.png

As you can see in the generated figure, Julia outperforms Python for large matrix applications, indicating that Julia is more efficient and may be worth using for larger and larger operations. If you have any suggestions on improving this code for comparison, please let me know; I am absolutely game for improvement.

Development Environments for Julia

The first and most obvious variables on what development environment is best is you, the user. If you’re wanting all the bells and whistles versus a simple text-editing environment, the range of products to fulfill you desires exists.

Juno—an extension to Atom—is the default IDE that is incorporated with JuliaPro, a pre-packaged version of Julia that includes a range of standard packages—Anaconda is the parallel in the Python world.

Jupyter Notebooks  are my go-to development environment for experimenting with code in both Python and Julia, especially for testing small sections of code.  Between being able to easily create visuals and examples inline with your code and combining code with Markdown, it allows for clean and interactive approach to sharing code or teaching. Note that installation generally requires IJulia but can allow for easy integration into already existing Jupyter workflows.

JetBrains IDEs (e.g. CLion, PyCharm) support a Julia plugin that supports a wide range of the same features JetBrains is known for. However, it is still in beta and is working to improve its formatter and debugging capabilities.

JuliaBox is an online web interface using Jupyter Notebooks for code development on a remote server, requiring no local installation of Julia. Whether you are in a classroom setting, wanting to write code on your phone, or are just wanting to experiment with parallel computing (or don’t have access to HPC resources), JuliaBox allows for a nearly seamless setup experience. However, note that this development environment limits each session to 90 minutes (up to 8 hours on a paid subscription) and requires a subscription to access any resources beyond 3 CPU cores. Note that you can access GPU instances with a paid subscription.

Julia Studio is a default IDE used by a range of users. It is a no-frills IDE based on Qt Creator, resulting in a clean look.

For anyone looking to use Visual Studio, you can install a VS Code extension.

Vim is not surprisingly available for Julia.  And if you’re feeling up the challenge, Emacs also has an interface.

Of course, you can just use a texteditor of your choice (e.g. Sublime Text with an extension) and simply run Julia from the terminal.

Julia Resources

Acknowledgements

Thanks to Dave Gold for his assistance in giving guidance for benchmarking and reminding me of the KISS principle.

Launching Jupyter Notebook Using an Icon/Shortcut in the Current Working Directory Folder

Launching Jupyter Notebook Using an Icon/Shortcut in the Current Working Directory Folder

A petty annoyance I’ve encountered when wanting to open Jupyter Notebook (overview) is that I couldn’t find a way to instantly open it in my current Windows Explorer window. While one trick you can use to open the Command Prompt in this folder is by typing ‘cmd’ in the navigation bar above (shown below) and pressing Enter/Return, I wanted to create a shortcut or icon I could double-click in any given folder and have it open Jupyter Notebook in that same working directory. cmd.png

This method allows you to drag-and-drop the icon you create into any folder and have it launch Jupyter Notebook from the new folder. It works for Windows 7, 8, and 10. Please feel free to let me know if you encounter any errors!

A great application for this shortcut may be to include this shortcut in GitHub folders where you wish to direct someone to launch Jupyter Notebook with minimal confusion. Just direct them to double-click on the icon and away they go!

Creating Your Own Jupyter Notebook Shortcut

new_shortcut.png

To begin, we must have already installed Jupyter Notebook or Jupyter Lab. Next, navigate to the folder we want to create your shortcut. Right-click, select ‘New’, then create a shortcut. 

shortcut_34.PNG

In the Create Shortcut Windows prompt, type the location of the item you want the Shortcut Icon to direct to. In this case, we are wanting direct this shortcut to the Command Prompt and have it run the command to open Jupyter Notebook. Copy/paste or type the following into the prompt:

cmd /k “jupyter notebook”

Note that cmd will change to the location of the Command Prompt executable file (e.g. C:\Windows\System32\cmd.exe), and ‘/k’ keeps the Command Prompt window open to ensure Jypyter Notebook does not crash. You can edit the command in the quotation marks to any command you would want, but in this case ‘jupyter notebook’ launches an instance of Jupyter Notebook.

You can then save this shortcut with whatever name you wish!

At this point, double-clicking the shortcut will open Jupyter Notebook in a static default directory (e.g. ‘C:\Windows\system32’). To fix this, we need to ensure that this shortcut instead directs to the current working directory (the location of the shortcut).

Picture1321.png

Next, we need to edit the location where the Command Prompt will run in. Right-click on your newly-created icon and select ‘Properties’ at the bottom of the menu to open the window shown on the left. One thing to note is that the ‘Target’ input is where we initially put in our ‘location’ prompt from above.

At this point, change the ‘Start in:’ input (e.g. ‘C:\Windows\system32’) to the following:

%cd%

By changing this input, instead of starting the Command Prompt in a static default directory, it instead starts the command prompt  in the current working directory for the shortcut.

At this point, you’re finished! You can drag and drop this icon to any new folder and have Jupyter Notebook start in that new folder.

If you wish to download a copy of the shortcut from Dropbox. Note that for security reasons, most browsers, hosting services, and email services will rename the file from ‘jupyter_notebook_shortcut.lnk’ to ‘jupyter_notebook_shortcut.downloads’.

Many thanks to users on superuser for helping develop this solution!

Please let me know if you have any questions, comments, or additional suggestions on applications for this shortcut!

 

Jupyter Notebook: A “Hello World” Overview

Jupyter Notebook: A “Hello World” Overview

Jupyter Notebook: Overview

When first learning Python, I was introduced to Jupyter Notebook as an extremely effective IDE for group-learning situations. I’ve since used this browser-based interactive shell for homework assignments, data exploration and visualization, and data processing. The functionality of Jupyter Notebook extends well past simple development and showcasing of code as it can be used with almost any Python library (except for animated figures right before a deadline). Jupyter Notebook is my go-to tool when I am writing code on the go.

As a Jupyter Notebook martyr, I must point out that Jupyter Notebooks can be used for almost anything imaginable. It is great for code-oriented presentations that allow for running live code, timing of lines of code and other magic functions, or even just sifting through data for processing and visualization. Furthermore, if documented properly, Jupyter Notebook can be used as an easy guide for stepping people through lessons. For example, check out the structure of this standalone tutorial for NumPy—download and open it in Jupyter Notebook for the full experience. In a classroom setting, Jupyter Notebook can utilize nbgrader to create quizzes and assignments that can be automatically graded. Alas, I am still trying to figure out how to make it iron my shirt.

1_XMB2FXE3sN4FTwaO8dMMeA

A Sample Jupyter Notebook Presentation (credit: Matthew Speck)

One feature of Jupyter Notebook is that it can be used for a web application on a server-client structure to allow for users to interact remotely via ssh or http. In an example is shown here, you can run Julia on this website even if it is not installed locally. Furthermore, you can use the Jupyter Notebook Viewer to share notebooks online.  However, I have not yet delved into these areas as of yet.

For folks familiar with Python libraries through the years, Jupyter Notebook evolved from IPython and has overtaken its niche. Notably, it can be used for over 40 languages—the original intent was to create an interface for Julia, Python and R, hence Ju-Pyt-R— including Python, R, C++, and more. However, I have only used it for Python and each notebook kernel will run in a single native language (although untested workaround exist).

Installing and Opening the Jupyter Notebook Dashboard

While Jupyter Notebook comes standard with Anaconda, you can easily install it via pip or by checking out this link.

As for opening and running Jupyter Notebook, navigate to the directory (in this case, I created a directory in my username folder titled ‘Example’) you want to work out of in your terminal (e.g. Command Prompt in Windows, Terminal in MacOS) and run the command ‘jupyter notebook’.

command_prompt_start

Opening the Command Prompt

Once run, the following lines appear in your terminal but are relatively unimportant. The most important part is being patient and waiting for it to open in your default web browser—all mainstream web browsers are supported, but I personally use Chrome.

 

 

If at any time you want to exit Jupyter Notebook, press Ctrl + C twice in your terminal to immediately shut down all running kernels (Windows and MacOS). Note that more than one instance of Jupyter Notebook can be running by utilizing multiple terminals.

Creating a Notebook

Once Jupyter Notebook opens in your browser, you will encounter the dashboard. All files and subdirectories will be visible on this page and can generally be opened or examined.

jupyter_start

Initial Notebook Dashboard Without Any Files

If you want to create a shiny new Notebook to work in, click on ‘New’ and select a new Notebook in the language of your choice (shown below). In this case, only Python 3 has been installed and is the only option available. Find other language kernels here.

jupyter_open

Opening a New Notebook

Basic Operations in Jupyter Notebook

Once opened, you will find an untitled workbook without a title or text. To edit the title, simply left-click on ‘Untitled’ and enter your name of choice.

jupyter_new

Blank Jupyter Notebook

To write code, it is the same as writing a regular Python script in any given text editor. You can divide your code into separate sections that are run independently instead of running the entire script again. However, when importing libraries and later using them, you must run the corresponding lines to import them prior to using the aforementioned libraries.

To run code, simply press Shift + Enter while the carat—the blinking text cursor—is in the cell.

jupyter_hello_world

Jupyter Notebook with Basic Operations

After running any code through a notebook, the file is automatically backed up in a hidden folder in your working directory. Note that you cannot directly open the notebook (IPYNB File) by double-clicking on the file. Rather, you must reopen Jupyter Notebook and access it through the dashboard.

jupyter_folder_windows

Directory where Sample Jupyter Notebook Has Been Running

As shown below, you can easily generate and graph data in line. This is very useful when wanting to visualize data in addition to modifying a graphic (e.g. changing labels or colors). These graphics are not rendered at the same DPI as a saved image or GUI window by default but can be changed by modifying matplotlib’s rcParams.

jupyter_graphic

Example Histogram in Jupyter Notebook

Conclusion

At this point, there are plenty of directions you can proceed. I would highly suggest exploring some of the widgets available which include interesting interactive visualizations. I plan to explore further applications in future posts, so please feel free to give me a yell if you have any ideas.