# Setup for LaTeX and Sublime

I found this link pretty useful when I was getting started with LaTeX.  It walks you through the setup of LaTeX and the text editor Sublime 2 and it takes you by the hand until you compile your first pdf.  Here are the Mac instructions:  Latex for Mac , and the instructions for Windows:  Latex for Windows.

# Embedding figure text into a Latex document

Often times we have to create plots and schematic drawings for our publications. These figures are then included in the final document either as bitmaps (png, jpeg, bmp) or as vectorized images (ps, eps, pdf). Some inconveniences that arise due to this process and are noticed in the final document are:

• Loss of image quality due to resizing the figure (bitmaps only)
• Different font type and size from the rest of the text
• Limited resizing possibility due to text readability
• No straight-forward method to add equations to the figure

If the document is being created in LaTeX, it is possible to overcome all these inconveniences by exporting your figure into either svg or postscript formats and converting it into pdf+Latex format with Inkscape. This format allows the LaTeX engine to understand and treat figure text as any other text in the document and the lines and curves as a vectorized image.

EXPORTING FIGURE

The process for creating of a PDF+LaTeX figure is described below:

1 – Create your figure and save it in either svg or postscript format. Inkscape, Matlab, GNUPlot, and Python are examples of software that can export at least one of these formats. If your figure has any equations, remember to type them in LaTeX format in the figure.

2 – Open your figure with Inkscape, edit it as you see necessary (figure may need to be ungrouped), and save it.

3.0 – If you are comfortable with using a terminal and the figure does not need editing, open a terminal pointing to the folder where the figure is and type the following the command (no $). If this works, you can skip steps 3 and 4 and go straight to step 5. $ inkscape fig1.svg --export-pdf fig1.pdf --export-latex


3 – Click on File -> Save As…, select “Portable Document Format (*.pdf)” as the file format, and click on Save.

4 – On the Portable Document Format window that will open, check the option “PDF+LaTeX: Omit text in PDF, and create LaTeX file” and click on OK.

Inkscape will then export two files, both with the same name but one with pdf and the other with pdf_tex extension. The pdf file contains all the drawing, while the pdf_tex contains all the text of the figure and calls the pdf file.

5 – On your latex document, include package graphicx with the command \usepackage{graphicx}.

6 – To include the figure in your document, use \input{your_figure.pdf_tex}. Do not use the conventional \includegraphics command otherwise you will end up with an error or with a figure with no text. If you want to scale the figure, type \def\svgwidth{240bp} (240 is the size of your figure in pixels) in the line before the \input command. Do not use the conventional [scale=0.5] command, which would cause an error. Some instructions are available at the first lines of the pdf_tex file, which can be opened with a regular text editor such as notepad.

Below is a comparison of how the same figure would look like in the document if exported in PDF+LaTeX and png formats. It can be seen that the figure created following the procedure above looks smoother and its text style matches that of the paragraphs above and below, which is more pleasant to the eyes. Also, the text can be marked and searched with any pdf viewer. However, the user should be aware that, since text font size is not affected by the scaling of the figure, some text may end up bigger than boxes that are supposed to contain it, as well as too close or to far from lines and curves. The former can be clearly seen in the figure below. This, however, can be easily fixed with software such as Inkscape and/or with the editing tips described in the following section.

TIPS FOR TEXT MANIPULATION AFTER FIGURE IS EXPORTED

If you noticed a typo of a poorly positioned text in the figure after the figure has been exported and inserted in your document, there is a easier way of fixing it other than exporting the figure again. If you open the pdf_tex file (your_figure.pdf_tex) with a text editor such as notepad, you can change any text and its position by changing the parameters of the \put commands inside the \begin{picture}\end{picture} LaTeX environment.

For example, it would be better if the value 1 in the y and x axes of the figures above would show as 1.0, so that its precision is consistent with that of the other values. The same applies to 2 vs. 2.0 in the x axis. This can be fixed by opening file fig1.pdf_tex and replacing lines:

\put(0.106,0.76466667){\makebox(0,0)[rb]{\smash{1}}}%
\put(0.53916667,0.0585){\makebox(0,0)[b]{\smash{1}}}%
\put(0.95833333,0.0585){\makebox(0,0)[b]{\smash{2}}}%


by:

\put(0.106,0.76466667){\makebox(0,0)[rb]{\smash{1.0}}}%
\put(0.53916667,0.0585){\makebox(0,0)[b]{\smash{1.0}}}%
\put(0.95833333,0.0585){\makebox(0,0)[b]{\smash{2.0}}}%


Also, one may think that the labels of both axes are too close to the axes. This can be fixed by replacing lines:

\put(0.02933333,0.434){\rotatebox{90}{\makebox(0,0)[b]{\smash{$x\cdot e^{-x+1}$}}}}%
\put(0.539,0.0135){\makebox(0,0)[b]{\smash{x}}}%


by:

\put(0.0,0.434){\rotatebox{90}{\makebox(0,0)[b]{\smash{$x\cdot e^{-x+1}$}}}}%
\put(0.0,0.0135){\makebox(0,0)[b]{\smash{x}}}%


With the modifications described above and resizing the legend box with Inkscape, the figure now would look like this:

Don’t forget to explore all the editing features of inkscape. If you export a figure form GNUPlot or Matlab and ungroup it with Inkscape into small pieces, Inkscape would give you freedom to rearrange and fine tune your plot.

# PDFExtract: Get a list of BibTeX references from a scholarly PDF

So you’ve found a review article with a great list of references that you’d like to include in your own paper/thesis/etc. You could look them up, one-by-one, on Google Scholar, and export the citation format of your choice. (You could also retype them all by hand, but let’s assume you’re savvy enough to use some kind of citation manager).

This is not a great use of your time.

Check out PDFExtract, a Ruby library written by folks at CrossRef. Its goal is to read text from a PDF, identify which sections are “references”, and return this list to the user. As of recently, it has the ability to return a list of references in BibTeX format after resolving the DOIs over the web. When the references in the PDF are identified correctly (about 80-90% of the time in my experience), you’ll now have all the references from that paper to do with as you please—to cite in LaTeX, or import to Zotero, etc.

How to use it

You will need a recent version of Ruby and its gem package manager. Search around for how to do this on your particular OS. As usual, this will be a lot easier on *nix, but I have it working in Cygwin too so don’t despair.

The latest version of PDFExtract (with BibTeX output) is not on the central gem repository yet, but for now you can build and install from source:

git clone https://github.com/CrossRef/pdfextract
cd pdfextract
gem build pdf-extract.gemspec
gem install pdf-extract-0.1.1.gem  # check version number


You should now have a program called pdf-extract available from the command line. Navigate to a directory with a PDF whose references you’d like to extract, and run the following:

pdf-extract extract-bib --resolved_references MyFile.pdf


It will take a minute to start running, and then it will begin listing the references it finds, along with their resolved DOIs from CrossRef’s web API, like so:

Found DOI from Text: 10.1080/00949659708811825 (Score: 5.590546)
Found DOI from Text: 10.1016/j.ress.2011.10.017 (Score: 4.6864557)
Found DOI from Text: 10.1016/j.ssci.2008.05.005 (Score: 0.5093678)
Found DOI from Text: 10.1201/9780203859759.ch246 (Score: 0.6951939)
Found DOI from Text: 10.1016/s0377-2217(96)00156-7 (Score: 5.2922735)
...


Note that not all resolutions are perfect. The score reflects the degree of confidence that the reference extracted from the PDF matches the indicated DOI. Scores below 1.0 will not be included in the final output, as they are probably incorrect.

Go make yourself a coffee while it searches for the rest of the DOIs. Eventually it will move to the second phase of this process, which is to use the DOI to obtain a full BibTeX entry from the web API. Again, this will not be done for DOIs with scores below 1.0.

Found BibTeX from DOI: 10.1080/00949659708811825
Found BibTeX from DOI: 10.1016/j.ress.2011.10.017
Found BibTeX from DOI: 10.1016/s0377-2217(96)00156-7
Found BibTeX from DOI: 10.1016/j.ress.2006.04.015
Found BibTeX from DOI: 10.1111/j.1539-6924.2010.01519.x
Found BibTeX from DOI: 10.1002/9780470316788.fmatter
...


Finish your coffee, check your email, and chuckle at the poor saps out there gathering their references by hand. When the program finishes, look for a file called MyFile.bib—the same filename as the original PDF—in the same directory from which you invoked the pdf-extract command. Open it up in a text editor or reference manager and take a look. Here’s the output from my example:

@article{Archer_1997,
doi = {10.1080/00949659708811825},
url = {http://dx.doi.org/10.1080/00949659708811825},
year = 1997,
month = {May},
publisher = {Informa UK Limited},
volume = {58},
number = {2},
pages = {99-120},
author = {G. E. B. Archer and A. Saltelli and I. M. Sobol},
title = {Sensitivity measures,anova-like Techniques and the use of bootstrap},
journal = {Journal of Statistical Computation and Simulation}
}
@article{Auder_2012,
doi = {10.1016/j.ress.2011.10.017},
url = {http://dx.doi.org/10.1016/j.ress.2011.10.017},
year = 2012,
month = {Nov},
publisher = {Elsevier BV},
volume = {107},
pages = {122-131},
author = {Benjamin Auder and Agn\es De Crecy and Bertrand Iooss and Michel Marqu\es},
title = {Screening and metamodeling of computer experiments with functional outputs. Application to thermal$\textendash$hydraulic computations},
journal = {Reliability Engineering \& System Safety}
}

... (and many more!)


A few extra-nice things: (1) it includes all DOIs, which journals sometimes require and are pesky to track down, and (2) it attempts to escape all BibTeX special characters by default. Merge this with your existing library, and be happy! (You could even use this to recover or develop a reference library from your own papers!)

Caveats

• This works a lot better on journal articles than on longer documents like theses and textbooks. It assumes that the “Reference” section is toward the end, so a chapter-based or footnote-based reference format will cause it to choke.

• It will not work on non-digital articles—for example, older articles which were scanned and uploaded to a journal archive.

• Careful with character encoding when you are importing/exporting BibTeX with other applications (like Zotero), or even managing the file yourself. You may want to look for settings in all of your applications that allow you to change the character encoding to UTF-8.

• Lots of perfectly good references do not have DOIs and thus will not be resolved by the web API. This includes many government agency reports, for example. In general do not expect to magically BibTeXify things other than journal articles and the occasional textbook.

• Reading a PDF is tricky business—there are some journal formats that just won’t work. You will notice failures based on (1) consistently bad DOI resolution scores, (2) complete failure with an error message from the PDF reader (very hard to trace these), or (3) if your BibTeX file contains bizarre entries at the end. I’ve accidentally “extracted” references about ornithology, for example—just delete these and move on.

# Writing a Paper in Markdown Using Pandoc

I’ve struggled up to now with the tension between drafting papers in Word (easy for co-authors to use for marking up revisions) and using LaTeX to prepare them for publication (because Word fights you and actively thwarts your efforts the whole time if you try to make a paper look half-decent.) When I start in Word and switch to LaTeX, there’s an awkward phase in the middle where I have to fix all of my quotation marks and em-dashes, and all of my equations, tables, and citations are completely broken.

Recently I discovered Pandoc, and I think it will streamline the transition quite a lot.  Pandoc is a document converter that converts several input formats to many output formats.  Here’s the list from running pandoc --help:

Input formats:  native, json, markdown, markdown_strict, markdown_phpextra, markdown_github, markdown_mmd, rst,  mediawiki, docbook, textile, html, latex

Output formats: native, json, docx, odt, epub, epub3, fb2, html, html5, s5, slidy, slideous, dzslides, docbook, opendocument, latex, beamer, context, texinfo, man, markdown, markdown_strict, markdown_phpextra, markdown_github, markdown_mmd, plain, rst, mediawiki, textile, rtf, org, asciidoc

That’s a lot of document formats!  In particular, it supports a “native” dialect of Markdown that it does a great job of translating both to LaTeX and to docx (Microsoft Word). Other nifty things you can do include:

• Convert LaTeX to Word, including your BibTeX citations
• Making PDFs from html (if you have LaTeX installed)
• Writing Beamer presentations in Markdown (and exporting the LaTeX sources for the slides)
• Use BibTeX citations in Markdown

I’m using Pandoc Markdown to draft my next paper, and while it’s not as full-featured as LaTeX for things like internal references, I find that it’s easier to write Word documents in Markdown than it is to write them in Microsoft Word!  To give just one example, it lets you caption figures properly.  Try making a figure in Word, adding a caption, and then moving or deleting the figure. The caption stays put.  Why on earth would I want that to happen? If I delete a figure in Pandoc Markdown, it takes extra effort to leave the caption behind.  In addition, when I switch my output format from Word to LaTeX source, Pandoc makes a figure environment with a \caption{} automatically.

My planned workflow is:

1. Draft in Pandoc Markdown
2. Convert Markdown to docx and share with co-authors
3. Update Markdown sources based on revisions to the Word document
4. Repeat 1-3 until the paper is mostly done
5. Convert Markdown to LaTeX
6. Final revisions and formatting in LaTeX

I’ll follow up to this post as I progress with drafting the paper. Right now I’m enjoying Pandoc Markdown quite a lot, and I highly recommend it.

# Beginner’s LaTeX Guide

What are TeX and LaTeX?

TeX is a low-level markup and programming language used to typeset documents, created by Donald Knuth. TeX is a powerful typesetting tool, but can be difficult to use because of the time it takes to create custom text formatting macros.  To get around this difficulty, there are programs, like LaTeX, that come with pre-built macros. LaTeX is more user-friendly, but lacks the flexibility of TeX.

Installing a TeX System: MiKTeX

MiKTeX is an implementation of Knuth’s TeX system. You’ll need a TeX system on your computer so the LaTeX commands are recognized by your machine. My decision to personally use MiKTeX is based on its compatibility with the WinEdt software we use in the Reed group, you can also use TeX Live as your TeX system, but I have no experience with that software.  Once you have a TeX system installed on your computer, you can compile LaTeX documents using a command line and text files (saved with the proper file extension). Most people find this difficult, which is why many people use a TeX editing software.

Installing a TeX Editor: WinEdt

The software that we have a license for in the Reed group is WinEdt. There are other free options such as TeXnicCenter and many, many others. For a whole discussion on pros and cons of different editors see Wikipedia’s article comparing different TeX editors. Once you’ve installed WinEdt, you can go to Documents -> Current Work (Samples) within the program to compile one of the sample documents included in the program to ensure your software is properly installed/configured.

(Reed Members: Talk to Josh for license information. I believe the Reed license is only valid for WinEdt 5.5, which is not the latest version.)

Learning some Basic Commands

Luckily, there are MANY MANY sources for learning LaTeX commands. A good place to start is Wikipedia’s LaTeX Wikibook. Starting under the tab “Absolute Beginners” will walk you through very simple document creation. Another good place to start is Andre Heck‘s short course in LaTeX called Learning LaTeX by Doing. Within this course, there are 24 exercises designed to get you familiar with commands, and typing your own LaTeX documents. If you’re just interested in trying out these exercises without installing software, you can use Latexlabs.org to compile your LaTeX documents online. Once you become familiar with the commands, a good place to start with a unique document is putting together a LaTeX resume/CV.  This will get you familiar with simple document commands such as tables and lists.

Some Resources

Winston Chang has written a comprehensive document that compresses most of the major LaTeX commands to two pages: http://www.stdout.org/~winston/latex/latexsheet.pdf.

If you’re interested in using LaTeX to write a Penn State thesis/dissertation, Gary L. Gray and Francesco Costanzo have written a thesis template to use: http://www.esm.psu.edu/psuthesis/

There’s even a LaTeX template that makes your documents look like MS Word!

# How to cite packages in R

R is a nice statistical tool or language to use, because it is free and provides many useful packages for data analysis.  I just found out about a neat way that R will actually generate a BibTeX citation for you regarding a specific package.  It’s explained here:

http://astrostatistics.psu.edu/su07/R/html/utils/html/citation.html

Do you have tips on using R?  If so edit this post or provide a comment below.

# Web-based Free Options for Bibliography Management and LaTeX Editing

I often find myself switching between computers with different operating systems, so I try to use free tools on the web as often as I can. The purpose of this post is to make you aware of two free options that I’ve had success with.

Bibliography Management – Zotero.org

Zotero is a free bibliography management resource that works as a plug-in for Mozilla Firefox along with plug-ins that work with Microsoft Office and Open Office. You edit your citations within Firefox, and insert them into documents using the Office plug-ins. You can import and export BibTeX into or out of Zotero and it is compatible with the RIS format, so you can move your citations back and forth between Zotero and Endnote. When you sign up for Zotero, it will ask you to create a user account. Your web account serves as an online backup for your citations, as well as a collaborative space. You can create a profile based on your area of expertise, so you can search for users with similar research interests as you and share your citations with them. (Perhaps this would be a good way to create a Pat Reed Group citation database?)

If this piqued your interest, I recommend checking out the quick start guide which shows some of the cool stuff you can do with Zotero.

My only warning is make sure you’re running the latest version of Firefox or you might have some compatibility issues with the plug-ins, especially with Word and Open Office. According to the website, there is a beta release for standalone Zotero as well as plug-ins for Safari and Chrome, but I haven’t used any of those options. It is also important to note that there is a 100MB limit for free Zotero service. I have about 2,000 citations total stored online and I’m only using about 1.0MB according to the website, so I imagine that the free service will be sufficient for everyone. It is \$20/year for 1GB of Zotero storage.

LaTeX – Latexlab.org