Using drawdata and Mercury in Jupyter Notebooks

I wrote a post last year on Enhancing Jupyter Notebooks for Teaching. Through my time at Cornell as a TA or mentoring our undergrad researchers, I’ve learned how important it is to find fun and interesting ways for students to play with data and to feel more comfortable diving into traditionally difficult topics like machine learning. This post features two cool new libraries that were brought to my attention recently that can help jazz up your tutorials:

(1) drawdata: allows a student to interactively draw a dataset (lines, histograms, scatterplots) with up to four different labels. The dataset and labels can be saved as JSONs and CSVs and also directly copied into Pandas dataframes. This can be useful to facilitate interactive machine learning tutorials.

(2) Mercury: converts a Jupyter notebook to web app. This is especially useful for classroom tutorials because it doesn’t require that students have Jupyter installed or even Python.

I’ve combined both of these functionalities into a notebook that is focused on classification of a dataset with 2 labels using support vector machines (SVMs) that can be found here.

drawdata

First let’s install drawdata by typing pip install drawdata into our command line. Next, let’s follow through the steps of the Jupyter notebook. We’ll import the rest of our libraries and draw out a simple linearly-separable dataset.

By clicking “copy csv” we can copy this exact dataset to a Pandas dataframe. Upon inspection, we see that each datapoint has 2 features (an x and y coordinate) and a label that is either “a” or “b”. Let’s also adjust the labels to be [-1,1] for the purposes of using some classifiers from scikit- learn. We then fit a very basic support vector classifier and plot the decision boundary.

data=pd.read_clipboard(sep=",")

#Rename the labels to integers

for i in range(0, len(data)):
    # checking if the character at char index is equivalent to 'a'
    if(data.iloc[i,2] == 'a'):
        # append $ to modified string
        data.iloc[i,2] = -1
    else:
        # append original string character
        data.iloc[i,2] = 1
data.iloc[:,2]=data.iloc[:,2].astype('int')


#Create our datasets

X=np.array(data.iloc[:,0:2])
y=np.array(data.iloc[:,2])


#Create a 60/40 training and test split 

X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.4, random_state=42)
    
#Fit classifier 

clf=svm.SVC(kernel='linear',C=0.025)
clf.fit(X_train, y_train)
score = clf.score(X_test, y_test)
print("The classification accuracy is", score)
#The classification accuracy is 1.0


#Plot original dataset that you drew 

cm = plt.cm.get_cmap('cool_r')
cm_bright = ListedColormap(["purple", "cyan"])
figure = plt.figure(figsize=(8, 6))
ax = plt.subplot(1,1,1)
#Plot decision boundary
ax.set_title("Classification Boundary")
DecisionBoundaryDisplay.from_estimator(clf, X, cmap=cm, alpha=0.8, ax=ax, eps=0.5)
# Plot the training points
ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train,cmap=cm_bright , edgecolors="k")
# Plot the testing points
ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright, alpha=0.6, edgecolors="k")

We can also plot the margin and support vectors.

This is a straightforward classification example, but we can also look at non-linearly separable examples.

Ruh roh! Our linear classifier is never going to work for this dataset! We’ll have to use the kernel trick- that is, we need to map our data to a higher dimensional space with an additional feature that may be able to better help us separate out the two classes. We can create an additional feature that captures the distance from a central point (300, 300). This allows us to find a hyperplane that can separate the points in a higher dimensional space.

#Create an additional dimension
r = np.sqrt((X[:, 0]-300)**2+(X[:, 1]-300)**2)

def plot_3D(elev=30, azim=30, X=X, y=y):
    ax = plt.subplot(projection='3d')
    ax.scatter3D(X[:, 0], X[:, 1], r, c=y, s=50, cmap=cm_bright,edgecolors="k")
    ax.view_init(elev=elev, azim=azim)
    ax.set_xlabel('x')
    ax.set_ylabel('y')
    ax.set_zlabel('r')
    

interact(plot_3D, elev=(-90, 90), azip=(-180, 180),
         X=fixed(X), y=fixed(y));

Thus, we can implement a similar radial basis function kernel in our SVC rather than using a linear kernel to define our non-linear classification boundary.

Mercury

Once you have a notebook that you are satisfied with, you can make it into an interactive web app by adding a YAML header. The YAML header also facilitates the ability interact with certain variables in the code and then run it. For example, users can select a classifier from a drop down menu or slide through different values of C. Finally they click the run button to execute the notebook.

A YAML header is added to the first RAW cell in the notebook. Here I want the user to be able to slide through the hardness of the margin (between 0 and 100).

---
title: SVM Classifier
description: Implement linear and non-linear classifiers
show-code: True
params:
    C_margin:
        input: slider 
        label: Value of Margin 
        value: 0.025
        min: 0
        max: 100
---

Then we install mercury (pip install mljar-mercury) and go to the command line and type mercury watch Classification_Tutorial.ipynb. The prompt will create a local address that you can put in your browser. The users can then draw their own dataset, adjust the hardness of the margin, and run the notebook.

Resources

Some of the SVM plotting code was borrowed from: https://jakevdp.github.io/PythonDataScienceHandbook/05.07-support-vector-machines.html

More info on Mercury can be found in this post: https://medium.com/@MLJARofficial/mercury-convert-jupyter-notebook-to-a-web-app-by-adding-yaml-header-872ce4e53676

Structuring a Python Project: Recommendations and a Template Example

Motivation:

The start of a new year is a good (albeit, relatively arbitrary) time to reassess aspects of your workflow.

I, like many people, taught myself Python by jumping into different projects. The consequence of this ad-hoc learning was that I did not learn some of the fundamentals until much later in my project development.

At the end of the Fall ’23 semester, I found myself spending a lot of time cleaning up repositories and modules that I had constructed in the preceding months. I ended up restructuring and reorganizing significant portions of the code base, implementing organizational practices that I had learned after it’s conception.

This post is intended to save the reader from making the same mistake. Here, I present a recommended structure for new Python projects, and discuss the main components. This is largely targeted at Python users who have not had a formal Python training, or who are working with their first significantly sized project.

I provide an example_python_project, hosted on GitHub, as a demonstration and to serve as a template for beginners.

Content:

  1. Recommended structure
  2. Example project repository
  3. Project components (with example project)
    1. Modules, packages, __init__.py
    2. Executable
    3. README
    4. Documentation
    5. Tests
    6. Requirements
    7. LICENSE
  4. Conclusion

Recommended project structure

I’ll begin by presenting a recommended project structure. Further down, I provide some explanation and justification for this structure.

My example_python_project follows recommendations from Kenneth Reitz, co-author of The Hitchhiker’s Guide to Pythton (1), while also drawing from a similar demo-project, samplemod, by Navdeep Gill (2).

The project folder follows this structure:

example_python_project/ 
├── sample_package/ 
│   ├── subpackage/ 
│   │   ├── __init__.py 
│   │   └── subpackage_module.py 
│   ├── __init__.py 
│   ├── helpers.py 
│   └── module.py 
├── docs/ 
│   └── documentation.md 
├── tests/ 
│   ├── __init__.py 
│   └── basic_test.py 
├── main.py 
├── README.md 
├── requirements.txt 
└── LICENSE

If you are just starting a project, it may seem unnecessary complex to begin with so much modularity. It may seem easier to open a .py file and start freewheeling. Here, I am trying to highlight the several reasons why it is important to take care when initially constructing a Python project. Some of these reasons include:

  1. Maintenance: A well-structured project makes it easier to understand the code, fix bugs, and add new features. This is especially important as the project grows in size and complexity.
  2. Collaboration: When working on a project with multiple developers, a clear structure makes it easier for everyone to understand how the code is organized and how different components interact with each other.
  3. Scalability: A well-structured project allows to scale up the codebase, adding new features and sub-components, without making the codebase hard to understand or maintain.
  4. Testing: A well-structured project makes it easier to write automated tests for the code. This helps to ensure that changes to the code do not break existing functionality.
  5. Distribution: A well-structured project makes it easier to distribute the code as a package. This allows others to easily install and use the code in their own projects.

Overall, taking the time to structure a Python project when starting can save a lot of time and heartache in the long run, by making the project easier to understand, maintain, and expand.


Example Project Repository

The repository containing this example project is available on GitHub here: example_python_project.

The project follows the recommended project structure above, and is designed to use modular functions from the module, helpers, and subpackage_module. It is intended to be a skeleton upon which you can build-up your own project.

If you would like to experiment with your own copy of the code, you can fork a copy of the repository, or Download a ZIP version.

Project overview

The project is a silly riddle program with no real usefulness other than forming the structure of the project. The bulk of the work is done in the main_module_function() which first prints a riddle on the screen, then iteratively uses the helper_function() and subpackage_function() to try and “solve” the riddle. Both of these functions simply return a random True/False, and are repeatedly called until the riddle is solved (when status == True).

Below is a visual representation of how the different functions are interacting. The green-box functions are contained within the main sample_package, while the blue-box function is stored in the subpackage.

The program can then be executed from a command line using the main.py executable:

C:\<your-local-directory>\example_python_project> python main.py

The output will first print out the riddle, then print statements indicating which functions are being used to “solve” the riddle. This is simply a means of demonstrating how the different functions are being activated, and not necessarily a recommended “Best Practice”.

A normal output should resemble something similar to the below, although there may be more or less print statements depending upon how many times it takes the random generator to produce a “True” solution:

Here is a riddle, maybe `sample_package` can help solve it:

   What runs but has no feet, roars but has no mouth?

Lets see if the helper can solve the riddle.
The helper_function is helping!
The helper could not solve it.
Maybe the subpackage_module can help.
The subpackage_function is being used now.
The subpackage solved it, the answer is "A River"!

Project components

Modules, packages, __init__.py, oh my!

Before going any further, I want to take time to clarify some vocabulary which is helpful for understanding component interactions.

Module:
A module is simply a file ending in .py, which contains functions and or variables.

Package:
A package is a collection of modules (.py files) which relate to one another, and which contains an __init__.py file.

__init__.py
Inclusion of a __init__.py file (pronounced “dunder in-it”) within a folder will indicate to Python that the folder is a package. Often, the __init__ module is empty, however it can be used to import other modules, or functions which will then be stored in namespace, making it available for use later.

For example, in my sample_package/__init__.py, I import all contents of the module.py and subpackage_module.py:

# Import the all functions from main and sub modules
from .module import *
from .subpackage.subpackage_module import *

This allows all of the functions stored within module to be callable from the primary sample_package directly, rather than specifying the various sub-structures needed to access various functions. For example, by including from .subpackage.subpackage_module import *, I able to run:

# IF __init__ imports all content from main and sub modules then you can do this:
import sample_package
sample_package.subpackage_module_function()

Rather than requiring the following fully-nested call, which is necessary when the __init__.py is empty:

# IF __init__ is EMPTY, then you need to do this:
import sample_package
sample_package.subpackage.subpackage_module.subpackage_module_function()

Notably, an __init__.py is not necessary to use modules and functions within a folder… however, customizing the imports present in the packages __init__.py will provide increased customization to your projects use. As the project increases in complexity, strategic usage of imports within the __init__ can keep your main executable functions cleaner.

Executables

So, you’ve crafted a Python project with a sleek, modular package design. The next step is to setup a single file which will execute the package.

Inclusion of a single executable has the benefit of providing a single-entry point for other users who want to run the program without getting lost in the project.

In the example_python_project, this is done with main.py:

# Import the main package
import sample_package

def run():
    solved = sample_package.main_module_function()
    return solved

# Run the function if this is the main file executed
if __name__ == "__main__":
    run()

The program then can then be executed from a command line:

C:\<your-local-directory\example_python_project> python run_program.py

README

The README.md file is typically someone’s first encounter with your project. This is particularly true if the project is hosted on GitHub, where the README.md is used as the home-page of a repository.

A README.md file should include, at minimum, a brief description of the project, it’s purpose, and clear instructions on how to use the code.

Often, README files are written in Markdown, which includes simple text-formatting options. You can find a basic Markdown Cheat Sheet here. Although reStructuredText is often used, and even .txt files may be suitable.

Documentation

Great code requires great documentation. Initializing a new project with a dedicated docs/ folder may help hold you accountable for documenting the code along the way.

For information on how to use Sphinx and reStructuredText to create clean webpage-based documentation, you can see Rohini Gupta’s post on Using Python, Sphinx, and reStructuredText to Create a Book (and Introducing our eBook: Addressing Uncertainty in Multisector Dynamics Research!).

Tests

Bugs aren’t fun. They are even less fun when a code was bug-free yesterday but contains bugs today. Implementing automated tests in your project can help verify functionality throughout the development process and catch bugs when they may arise.

It is recommended to implement Unit Tests which verify individual components of the project. These tests should assert that function output properties align with expectations. As you develop your project in a modular way, you can go in and progressively add consecutive tests, then run all of the tests before sharing or pushing the project to others.

A standard Python instillation comes with the unittest package, which is intended to provide a framework for these tests. I provide an example test below, but deeper-dive into the unittest framework may require a dedicated future posts.

In the example_python_project, I include the basic_test.py to verify that the solution generated by main_module_function() is True using the unittest package:

import sample_package
import unittest

# Define a test suite targeting specific functionality
class BasicTestSuite(unittest.TestCase):
    """Basic test cases."""
    def test_that_riddle_is_solved(self):
        solved = sample_package.module.main_module_function()
        self.assertTrue(solved)


if __name__ == '__main__':
    unittest.main()

Running the basic_test module from the command line produce an “OK” if everything runs smoothly, otherwise will provide information regarding which tests are failing.

----------------------------------------------------------------------
Ran 1 test in 0.004s
OK

Currently, the example_python_project requires the basic_test module to be executed manually. To learn more about automating this process, you can see Andrew Dirck’s 2020 post: Automate unit testing with Github Actions for research codes.

Requirements

The requirements.txt is a simple text file which lists the dependencies, or necessary packages that are required to run the code.

This can be particularly important if your code requires a specific version of a package, since the package verison can be specified in the requirements.txt. Specifying a particular package version (e.g., numpy==1.24.1) can improve the reliability of your code, since different versions of these packages may operate in different ways in the future.

Here is an example of what might be inside a requirements.txt, if the numpy and random packages are necessary:

numpy==1.24.1
random==3.11.1

Users can easily install all the packages listed in requirements.txt using the command:

pip install -r requirements.txt

License

I’ll keep this section brief, since I am far from legally qualified to comment much on Licensing. However, general advice seems to suggest that if you are sharing code publicly, safest to include a license of some sort.

Inclusion of an open-source license allows other users to comfortably use and modify your code for their own purposes, allowing you to contribute and benefit the broader community. At the same time, protecting the original author from future liabilities associated with its use by others.

The GNU General Public License is the most common open-source license, however if you would like to know more about the different options, you can find some guidance here: https://choosealicense.com/

Conclusions

If you are an experienced Python user, there may not be anything new for you here but at the least I hope it serves as a reminder to take care in your project design this year.

Additionally, this is likely to be one part in a multi-part Introduction to Python series that I will be writing for future members of our research group. With that in mind, check back here later this spring for the subsequent parts if that interests you.

Best of luck!

References

(1) Reitz, K., & Schlusser, T. (2016). The Hitchhiker’s guide to Python: best practices for development. ” O’Reilly Media, Inc.”.
(2) Navdeep Gill. 2019. samplemod. https://github.com/navdeep-G/samplemod. (2023).

Ringing in the year with a lil’ bit of Tkinter

Happy 2023! For the first post of the new year, we will be learning how to make a simple graphic user interface (GUI) using a nifty Python package called Tkinter, the default interface framework built into the Python standard library. Today, I will demonstrate a few Tkinter functions that I recently found useful to achieve the following tasks:

  1. Setting up the GUI window
  2. Partitioning the window into frames
  3. Creating and updating user entry spaces
  4. Creating buttons
  5. Creating and launching popup windows
  6. Creating dropdown lists

At the end of this post, I will use the tools in all five tasks to create a GUI that enables the user to generate a universal input file that, in turn, is used as input to a Python script that will run WaterPaths (Trindade et al., 2020). WaterPaths is an open-source water portfolio and infrastructure investment pathways management and planning simulation tool. The GitHub repository that includes both WaterPaths and the GUI Python script can be found here. Documentation for Tkinter can be found here if you are interested in further details on GUI-making.

Before beginning, install the Tkinter library on your machine. This can be done by entering the following line of code into your command line:

pip install tkinter

Now, let’s get started!

Setting up the GUI window

As per the Python tradition, using any library requires that we import it into our main Python script:

from tkinter import *
from tkinter import font

Next, we will initialize the main window:

# create the main window 
root = Tk()
root.title("WaterPaths")
# load an icon to use
p1 = PhotoImage(file = 'wpaths.png')
root.iconphoto(False, p1)
# set the initial size of the window, allow it to be resizeable
root.geometry("850x780")
root.resizable(True, True)

The Tk() function initializes the main GUI window and the title() function names it. This is the window within which we will create two frames. This GUI also has an icon that will appear in the taskbar using the photoimage() and iconphoto() function when the GUI is launched. We also set the initial size of the window with the geometry() function when it first pops up on the screen, and allow the user to resize the window to better suit their preferences by enabling both its width and height to very in length with the resizable() function.

Partitioning the window into frames and understanding the grid system

Before exploring more functions, let’s take a look at how the Tkinter grid system works:

The system is pretty intuitive where a window consists of frames. Each frame is structured using a rows and columns where widgets (entry spaces, labels, dropdown lists, buttons, etc.) can be entered into. Columns and rows do not have to be explicitly added into a frame, and a frame’s number of rows and columns do not have to pre-allocated. However, a widget’s position should be specified when adding it to a frame.

Now, let’s use the LabelFrame() function to create a frame within the window. This function has a couple of useful parameters as shown in the code snippet below:

frame_setup_model = LabelFrame(root, text="1-Setup WaterPaths", 
    bg='lightcyan', font=('HelvLight', 20), width=800, height=400)

Here, we are making a frame within the main root window and calling it ‘1-Setup WaterPaths’. We are also setting the color of the frame to a light cyan color and specifying the font type and size of the title of the frame. We are also setting the width and height of the frame.

Next, we specify the position of the frame:

frame_setup_model.grid(row=0,column=0,padx=10,pady=10, sticky="ew")
frame_setup_model.grid_propagate(True)

Using the grid() function, we placed the frame in the first row and column of the main window, and set it to span the entire length of the frame using the sticky parameter. We then use the grid_propagate() function to ensure that the frame changes size when the window is also resized.

We now have a functioning frame – let’s populate it with some widgets!

Creating widgets

We will be covering three types of widgets today: user entry spaces, buttons, and dropdown lists. Each widget should be labeled using the Label() function:

data_dir_label = Label(frame_setup_model, text="Data directory", justify=LEFT, 
    bg='lightcyan').grid(sticky = W, row=1, column=0)

In the code snippet above, we are created a label indicating to the user to enter the string for the folder’s full filepath in the adjacent entry space (demonstrated immediately below). We name this label ‘Data directory’ and left justify it. We also ensure that the grid cell containing this label has the same background color as the frame that hosts it. Using the grid() function, we left justify this label and place it in the second row of the column.

User entry spaces

The first widget we’ll use is the user entry widget, initialized using the Entry() function, shown below:

# Create and place the user entry widget
data_dir = Entry(frame_setup_model, width=60)
data_dir.grid(row=1, column=1, sticky=W)
# Insert a default value in the entry space
data_dir.insert(0, '/home/fs02/pmr82_0001/lbl59/Implementation_Uncertainty/WaterPaths_duReeval/)

We place the widget in the second row and column of the frame we first created. The insert() function then provides the option to enter a default value in the entry space, which can be later changed by the user. If this function is not used, the entry space will appear blank when the GUI is launched.

Buttons

Next, the button widget can be setup using the Button() function. This function requires that it be associated with another function written in the same script, as it launches that associated function. An example is shown below:

def install_req():
    '''
    Installs all Python libraries required to run the GUI 
    Runs the WaterPaths makefile
    '''
    os.system("pip install -r requirements.txt")
    os.system("make gcc")
    text_out = "Program requirements installed. Do not run again."

    open_popup(text_out)

reqs_button = Button(frame_setup_model, text="Install libraries", padx=10, pady=5, command=install_req, fg='darkslategrey',
    bg='lightblue', font=['HelvLight', 12, 'bold']).grid(row=13, column=0, sticky='we')

Here, we create a button widget called ‘Install libraries’ that, when clicked, runs the install_req() function defined immediately above it. There are two new parameters here that we have not seen previously. The fg parameter allows the user to speficy the font color, and the command parameter links the install_req() function to the button we are creating.

Popup windows

In the install_req() function we see above, notice the internal function called open_popup() that takes a string parameter. This function is defined as follows:

def open_popup(text_out):
    '''
    Opens a popup window
    '''
   top= Toplevel(root)
   top.geometry("600x120")
   top.title("WaterPaths")
   Label(top, text=text_out, justify=CENTER).place(x=10,y=10)

This function uses Tkinter’s Toplevel() function to open a popup window when run. Overall, this function creates a popup window that is 600-by-120 pixels in size and titled ‘WaterPaths’. The message contained within the popup window is specified by the use, and is placed at the (10,10) position within the popup box.

Dropdown lists

The last widget that we will learn to make before putting everything together is the dropdown widget using the OptionMenu() function. The implementation is shown below:

# Define a list of options
mode_dropdown = ["Optimize", "Re-evaluate", "Reduced"]
mode = StringVar()
# Set the default value
mode.set("Reduced")

# Create the dropdown label
mode_select_label = Label(frame_setup_model, text="Select WaterPaths run mode", anchor="w", justify=LEFT,
    bg='lightcyan').grid(sticky = W, row=9, column=0)
# Create and position the dropdown widget
mode_select = OptionMenu(frame_setup_model, mode , *mode_dropdown )
mode_select.grid(row=9, column=1, columnspan=1, sticky='W')
mode_select.config(width=10, font=['HelvLight','10', 'normal'], bg='lightcyan')

We first create a list of options called mode_dropdown. In this case, the list contains options for the modes in which WaterPaths can be run. We then create a string variable using the StringVar() function to indicate that the value of the widget is variable and contingent upon the user’s choice. We also set a default value for the widget.

Next, we define a label for the dropdown widget called “Select WaterPaths run mode”. Directly to the right of the label, we use the OptionMenu() create the dropdown widget and associate it with the mode string variable that responds to the mode_dropdown list. We finalize the creation of the dropdown list by positioning it within its grid cell.

Now that we’ve covered some of the fundamental Tkinter tools, let’s create a simple GUI!

The WaterPaths Input File GUI

Before continuing, note that the WaterPaths GUI should be used with a HPC resource using a Linux interface. While you will be able to view the GUI if it is run on a personal machine, it can only be used at full functionality if paired with WaterPaths on a HPC resources.

Putting together the tools shown above, we can structure a GUI that looks like this:

You can view this GUI by entering the following into the command line:

python WaterPaths_GUI.py

In the first frame (1-Setup WaterPaths), you can enter the path of the main WaterPaths directory and the location where the solutions are stored. You can also specify the number of deeply uncertain states of the world (DU SOWs) you would like to explore, and the number of hydroclimatic realizations you would like to pair with these SOWs. You can also specify the number of solutions you would like to run. If you would like to run one specific solution number, you can enter in the solution number in the ‘First solution to run’ and add one to it in the ‘Last solution to run’ space. The dropdown lists enable you to choose between generating (export) or using existing (import) risk of failure (ROF) tables. You can also choose to generate and use ROF tables as you run the WaterPaths simulation (do nothing). This option is not recommended if large experiments of more than 10 DU SOWs are being conducted, as ROF table generation is computationally expensive. In addition, you can select if you would like to run WaterPaths in Optimization mode using Borg MOEA (Optimize) or perform DU Re-evaluation (Re-evaluate). You can also choose to run a short demonstrative WaterPaths simulation (Reduced) that should take approximately 2 minutes to run. Note that the ‘Optimize’ option should only be used if Borg is installed. If you would like to download Borg to explore optimization, please submit a download request here.

If this the first time you are running WaterPaths, please click on ‘Install Libraries’, which will install all libraries required to run WaterPaths, as well as run its Makefile. You only have to click this button once immediately after downloading WaterPaths. The ‘Setup WaterPaths’ button writes all your input in the entry spaces and the dropdown lists into a text file called input_parser_file.txt that is read by the run_waterpaths.py script found in the same repository.

In the second frame (2-Optimization/Re-evaluation), you can enter the number of threads, nodes, and task that your WaterPaths simulation requires. Further explanation on these terms can be found at Cornell CAC’s glossary of HPC terms. You can also modify the output frequency of your function evaluations, and the number of seeds that you would like to run. This step is optional and should only be completed if you selected the ‘Optimize’ option in the first frame. Clicking on the ‘Make input file’ button will add on these new entries into the input_parser_file.txt file.

As previously hinted at, this GUI links to two external scripts:

  1. run_waterpaths.py This Python script take and interprets the input from the input_parser_file.txt file and generates the command to be submitted to the Simple Linux Utility Resource Management (SLURM) job scheduler to be run on the HPC resource that you are using. More in-depth explanation on how SLURM works can be found in Dave Gold’s post here.
  2. run_waterpaths.sh This is the Bash script that submits the command from run_waterpaths.py to SLURM. The parameters on Line 2 should match the parameters entered under ‘Enter HPC submission requirements’ in the second frame.

Both these scripts can be found in the linked GitHub repository. If you have successfully run the GUI, you should generate the input_parser_file.txt that looks a little like this:

Conclusion

In this post, we walked through tools to initialize, structure, and implement a simple Python GUI. We then applied these tools to generate a GUI for WaterPaths. Hope this was a useful tidbit for the new year, and congratulations on learning a new skill!