Converting Latex to MS Word docx (almost perfectly)

Many of students and academics write academic papers and reports on Latex due to the ease of formatting and managing citations and bibliography. However, collaborative editing Latex tools have not reached the level of versatility and convenience of Microsoft Word and Google Docs. We then often begin writing the paper in Word when major comments, additions and changes are made, and later manually translate it to Latex, which is a tedious task and just a waste of time.

A way around this hurdle is to use Pandoc to automatically convert Latex files to Word documents. To install Pandoc on Linux, open a terminal run sudo apt-get install pandoc pandoc-citeproc texlive, while Windows executables and MacOS instructions are available here. To run Pandoc on a Latex document:

  1. Open a terminal (on Windows, hold the Windows key and press “r,” then type “cmd” in the command bar)
  2. Use the “cd” command to navigate to the folder where your Latex document it.
  3. Type pandoc -s latex_document.tex --bibliography=bib_file.bib -o output_word_document.docx.

Now you should have a Word document with all your bitmap (png, jpeg, bmp, etc.) figures, equations in Word format and with a bibliography based on your citations and bib file. If the latest version of Pandoc does not work, try version 1.19.1.

There are some known limitations, though. Numbering and referencing equations, figures and tables, and referencing sections will not work and you will have to number those by hand. This limitation seems to apply only to outputting Word documents and you can circumvent them by outputting to other formats with Pandoc reference filters plus the appropriate Pandoc calls and specific latex syntax (see filters’ pages for details). Vectorized figures will not be included in the output document (svg, eps, pdf, etc.). On the other hand, bitmaps will appear in the output document and will preserve their captions.

Hopefully some skilled programmer feeling like generously volunteering his time for the Latex academic cause will address the mentioned limitations soon. Until then, we will have to go around them by hand.

Recommended Software for the Kasprzyk Group

In this post, I provide a list of recommended software for multi-objective optimization research and a bit of context about each item. This is an update of two posts (for Windows users and Mac-users, respectively) that Joe made several years ago and is intended for Windows users. Although this list is catered toward members of the Kasprzyk Group at the University of Colorado Boulder (CU), it should be relevant to most readers of this blog.

Please feel free to make comments and give additional suggestions!

Text Editors

Text editors are a great way to view data (e.g., csv or space-delimited data) or review some code. If you are looking to run code in an interactive manner, you’ll want to get an interactive development environment (IDE) which corresponds to the programming language you are working with. I’ll get into that more in the “Programming Languages” section. If you have a Windows machine, you are probably used to opening things with the default text editor, Notepad. Notepad is the worst.

Notepad++

Notepad++ is infinitely better! The formatting is great, it has a bunch of plug-ins you can download, it improves readability, and runs quickly. Notepad++ is my preferred text editor if you want to look at a couple files. If you are looking to manage a larger project with several files and multiple directories, Atom is the way to go.

Atom

Atom is extremely powerful and customizable. It is made for software developers in mind and is basically the modern version of Notepad++ (which has been around for a while). It integrates easily with Git and GitHub (see “Version Control and Open Source Repositories” for explanation of these tools), it has an extensive library of packages–similar to Notepad++ plug-ins–and its growing constantly. Atom also blurs the line between text editor and IDE, because with the Atom-IDE packages, it can have IDE-like functionality.

Programming Languages

You never know what languages you might work with in your research, but the main languages we use are Python, R, and C/C++. If Joe asks your preference, tell him that Python >> R and C/C++ >> R. Although if you catch him at a moment of weakness, he might admit that R can do some stats things, I guess.

Python

If you are working in Python (download), it will likely depend on the project whether you are working in 2.7 or 3.X (the X’s being whatever version they are on). A lot of scientific computing research still uses Python 2.7, but more people are transitioning to Python 3. In fact, they have decided to stop maintaining Python 2.7 at midnight on January 1, 2020. So remember to pour one out for your homie, Python 2.7, on New Years 2020. Although there are plenty of other python IDEs  (e.g., Spyder, Rodeo), we generally use PyCharm Community (download) in the group. For ease of installing packages, download Anaconda (download) which installs Python and over 150 scientific packages automatically.

R

If you are working it R (download), most everyone uses Rstudio (download Open Source Licence) as an IDE. You will also want to download Rtools, which is helpful to have installed for building some packages which require it. Generally, these packages require command line tools and compiling languages other than R.

C/C++

Although most projects are focused on Python and R, we have some work in C and C++ as well. These very interrelated languages, unlike Python and R, need to be compiled before you can run the code. To do this, you’ll need to download a compiler. We generally recommend installing MinGW (download). This will allow you to compile programs that will work across different platforms (e.g., Windows, Linux) while working on a Windows machine. Pro tip: if you have Rtools installed, it comes with MinGW so you don’t need to download it separately! If you want to compile things in for POSIX application deployment for Windows, you’ll want Cygwin. If you have no idea what that means, that’s okay. It’s my belief that you shouldn’t download Cygwin unless you need it because it takes up a lot of space on your system. But if you don’t care about storage than go ahead! For most cases I run into, MinGW does the job.

Why do we even care about compiling things across platforms? Well, you may be running code on a Windows machine, but if you are doing any work with the supercomputer, you will also need to run that code in a Linux environment. So you want to make sure your code will work on both platforms.

Although it isn’t perfect, CodeLite (download) is a nice (and free) IDE for C/C++ work.

Version Control and Online Repositories

Version control is the best way to your manage your code. It allows you to track your changes, make notes, and even revert to older versions of your code.

Git

Git (download) is the most popular way to implement version control but other methods do exist. Learning Git takes a bit of time, but it is an essential skill to learn which will pay dividends in the future. Most IDE (interactive development environments) interface with Git and even some text editors (I’m looking at you, Atom).

GitHub

GitHub is where people post their code and data (bundled into so-called repositories) to share with the world. It is amazing collaborative environment, and since we do computational research, it is best practices to create a GitHub repository for code related to papers that you publish. Soon it will be a requirement in most journals!

Both Git and GitHub have stellar documentation and tutorials about how to get started, so you will have lots of support when you start learning the ropes.

Command Line Interfaces, Supercomputing, and File Transfer

If you haven’t used a command line interface (CLI) before, you will definitely learn in this line of work. CLIs can be useful for interacting with Git (I prefer it for most version control tasks), installing open source software, and tasks like copying/moving files or creating/moving directories on your local machine. My apologies in advance if I butcher the language in this section related to Linux, shell, bash, etc!

Git Bash

For these tasks, you can use the Command Prompt which is a default program in Windows. But more likely, you’ll want to use a CLI which accepts Linux commands (that way you can use the same commands for your local CLI–which runs Windows–and for interacting with the supercomputer–which runs Linux). I like using Git Bash since it gets installed automatically when you install Git. You can also use Cygwin, but like I said before, this is generally overkill if you are just interested in writing Linux commands if you have Git Bash or something else already installed. I’m sure there are many other CLIs to choose from as well that I’m not aware of.

While its a nifty skill to know how to use the command line while working on your local computer, it is absolutely essential when working on the supercomputer (i.e., cloud or remote computing). Although I’ve seen some graphical interfaces and interactive environments for using cloud computing resources, it is far more common to perform tasks from the command line. You can even connect to the supercomputer with a CLI using the ‘ssh’ command.

MobaXterm

To make your life a bit easier though when connecting to remote computing resources, you can download MobaXterm or Putty and WinSCP. As David explains in his post, MobaXterm is probably the best way to go. It does the job of both Putty (connecting to remote computing) and WinSCP (moving files between your local computer and a remote resource).

If you’re doing any work on the supercomputer at CU, check out Research Computing for tutorials and other details about our supercomputers.

Cloud Storage

University of Colorado has unlimited Google Drive storage which is linked to your CU Gmail account; therefore, it is the cloud storage of choice for the group. Google Drive File Stream allows you to access files stored on your Drive on your local computer without having to download your whole drive. Meaning it won’t take up a ton of memory but it will ‘feel’ like the files have been downloaded onto your computer (as long as you are connected to the internet). If you know you won’t be connected to the internet, you can easily download certain files/folders or even your whole Drive if you would like.

The supercomputer can serve as cloud storage; however, it is best to keep those files backed up locally, if possible. I’ve heard too many horror stories about people storing important data on the supercomputer and it getting erased! Although this can be avoided by storing things in the right place, you might sleep better if you’ve got another copy.

Multi-objective Optimization and Visualization

Borg

You’ve probably heard of Borg Multi-objective Evolutionary Algorithm (MOEA). If not, you will soon! There’s no direct download link for Borg, but you can fill out a form on its website to request the source code.

DiscoveryDV

Once you’ve performed an optimization, you will want to visualize the results. You can do this in your favorite programming language, but it is often difficult to interact with the data that way. For an interactive visualization experience, we generally use DiscoveryDV. Just like Borg, DiscoveryDV is not available for download directly, but you can request it on their website.

Open Source Projects

Additionally, here are a few open source projects related to multi-objective optimization, robust decision making, and visualization that may be useful to be familiar with: Project Platypus, OpenMORDM, and Exploratory modeling workbench.

Reference Managers

Reference managers are amazing things. Finding the right one will save you a lot of time and effort in the future. As Jazmin mentions in her post on research workflows, there are a ton to choose from. Check out this comparison table of reference management software, if you want to go down that rabbit hole.

Zotero

Joe’s group uses one called Zotero (download both the standalone and Chrome connector) which is free, easy to use, and integrates well with Microsoft Word.

Graphic Design

There are many ways to create custom figures for papers–PowerPoint is an easy choice because you likely already have Microsoft Office on your computer. However, Adobe Illustrator is much more powerful. Since Illustrator requires a license, ask Joe for more details if you need the software.

Happy downloading!