Writing a Paper in Markdown Using Pandoc

I’ve struggled up to now with the tension between drafting papers in Word (easy for co-authors to use for marking up revisions) and using LaTeX to prepare them for publication (because Word fights you and actively thwarts your efforts the whole time if you try to make a paper look half-decent.) When I start in Word and switch to LaTeX, there’s an awkward phase in the middle where I have to fix all of my quotation marks and em-dashes, and all of my equations, tables, and citations are completely broken.

Recently I discovered Pandoc, and I think it will streamline the transition quite a lot.  Pandoc is a document converter that converts several input formats to many output formats.  Here’s the list from running pandoc --help:

Input formats:  native, json, markdown, markdown_strict, markdown_phpextra, markdown_github, markdown_mmd, rst,  mediawiki, docbook, textile, html, latex

Output formats: native, json, docx, odt, epub, epub3, fb2, html, html5, s5, slidy, slideous, dzslides, docbook, opendocument, latex, beamer, context, texinfo, man, markdown, markdown_strict, markdown_phpextra, markdown_github, markdown_mmd, plain, rst, mediawiki, textile, rtf, org, asciidoc

That’s a lot of document formats!  In particular, it supports a “native” dialect of Markdown that it does a great job of translating both to LaTeX and to docx (Microsoft Word). Other nifty things you can do include:

  • Convert LaTeX to Word, including your BibTeX citations
  • Making PDFs from html (if you have LaTeX installed)
  • Writing Beamer presentations in Markdown (and exporting the LaTeX sources for the slides)
  • Use BibTeX citations in Markdown

I’m using Pandoc Markdown to draft my next paper, and while it’s not as full-featured as LaTeX for things like internal references, I find that it’s easier to write Word documents in Markdown than it is to write them in Microsoft Word!  To give just one example, it lets you caption figures properly.  Try making a figure in Word, adding a caption, and then moving or deleting the figure. The caption stays put.  Why on earth would I want that to happen? If I delete a figure in Pandoc Markdown, it takes extra effort to leave the caption behind.  In addition, when I switch my output format from Word to LaTeX source, Pandoc makes a figure environment with a \caption{} automatically.

My planned workflow is:

  1. Draft in Pandoc Markdown
  2. Convert Markdown to docx and share with co-authors
  3. Update Markdown sources based on revisions to the Word document
  4. Repeat 1-3 until the paper is mostly done
  5. Convert Markdown to LaTeX
  6. Final revisions and formatting in LaTeX

I’ll follow up to this post as I progress with drafting the paper. Right now I’m enjoying Pandoc Markdown quite a lot, and I highly recommend it.


9 thoughts on “Writing a Paper in Markdown Using Pandoc

  1. Hello Matt,

    Your post has recently inspired me to write a paper following your workflow- which worked quite enjoyably, thank you!. Today, I have shared it with a co-author (who will work in word) and asked him to include some references. Do you have any suggestions how to handle these when converting back to markdown?

    Generally speaking, a lot of people I work with use endnote for bibliography management, and I have not yet found a simple way to interface to bibtex/markdown. Do have an opinion on that?

  2. I’d say that’s the biggest weakness of my workflow — I haven’t figured out how to integrate Endnote citations from co-authors. I ended up entering their citations into my BibTeX database by hand, which was a pain. Googling Endnote to BibTeX conversion seems to indicate that it’s a pretty laborious process for the owner of the Endnote database. The advantages I’d attribute to all of that manual data entry are that I avoided accidental duplication in my BibTeX database, and I feel more prepared to use those citations later. But that’s cold comfort when you’re spending hours on a tedious manual data entry task.

    Anyhow, I’m glad the markdown authoring process worked for you — I appreciate the positive feedback!

  3. wow, that was a quick reply, thank you! I have somehow expected that it will turn out like that… But then, manual conversion and adding a few entries to a zotero database is not so much work after all (It shouldn’t take hours).

    All in all, I think all the benefits of markdown editing are still worth the effort. I have the impression that I wrote much more efficient than I did before.

  4. One other thing I’ll mention, since I’m over here next to a comment field, is that step 5 ended up being pretty gradual. At a certain point, I stopped producing docx output, but my source was still in Markdown. As revisions progressed, I started dropping figure and table environments into the markdown source, and doing citations with \cite or \citep. I even had an \input for some LaTeX that pandoc didn’t recognize as LaTeX. Eventually, my Markdown source turned into a LaTeX document and I switched over.

    There’s another trick that eased this transition: if I wanted to convert a table to LaTeX, I would convert the whole document over and copy the LaTeX output for the table back into my markdown source. I still needed to do some hand-editing because pandoc uses longtable, but it was easier than rewriting the table from scratch. The same thing works for figures, too.

    • I like it! This looks like it will definitely ease the transition between markdown and LaTeX. Maybe someday fixing up the tex sources by hand won’t even be part of my workflow!

  5. Pingback: Water Programming Blog Guide (3) – Water Programming: A Collaborative Research Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s