I wanted to write this blogpost because I’ve seen great improvements to my workflow when I transitioned to this system and thought others might benefit also. My everyday research tasks require the following:
- a Python development environment on my local machine
- management of project-specific dependencies
- version control my changes
- execution on some high-performance computing resource.
My local machine runs on Mac OS, but everything I show here should be directly translatable to Windows or other operating systems. My setup is the following:
- Anaconda – to manage my Python environments and packages
- PyCharm – the Python development environment
- Git(Hub) – for version control
These are the steps I follow every time I start a new project:
- Create an empty repository on GitHub
- Clone the empty repository on my local machine
- Open PyCharm and select the directory of the repository I just created
When it opens, the PyCharm project will be empty and will have a default Python interpreter associated with it. What I do is I create a separate Conda environment for each of my projects, so there’s a clean separation between the packages used by each.
4. Create python environment specific to this project, by going to Preferences and selecting your current project. There, you can define your project’s (Python) interpreter. Clicking on it just shows the default Python 2.7 interpreter, which we would like to change.
As you can see, I have a separate Conda environment for each of my projects, so I manage packages and dependencies for each one.
Here I create a new environment for my new project.
5. Manage packages needed. There’s two ways for this: either through PyCharm or through Anaconda. Through PyCharm, you can use the same page to install, uninstall or update packages as needed.
Through Anaconda, you can use the Navigator, which also allows you to customize several other things about your environment, like which applications you’d like to work with.
6. Set up version control and use code on other computing resources. PyCharm has Git features integrated (overviewed already in this blog here and here) and creating a project the way I showed also ensures that PyCharm knows which repository you’re working with, without you having to set it manually. I use the built-in PyCharm functionality to commit my changes to my repository, but you can also do it through the Terminal or other means.
7. Set up project on computing resources. To do so, you need two main components. A clone of your repository in the cluster you’re working on and an environment .yml file (I explain what this is and how to generate it with one command here), listing all your environment’s dependencies. Create a virtual environment for the project in the cluster and pull any updates from your local machine.
This is more or less all I do. I have virtual environments for each of my projects both locally and on the clusters I am working on and use PyCharm and Git to manage all the dependencies and versions. I have been using this setup for the past 5-6 months and I have seen a lot of improvements in my organization and productivity, so hopefully others will find it helpful also.