Get your research workflow on

I have done some research on research workflow and that includes interviewing some of my peers at Cornell grad school to get a sense of what increases their productivity and what are their strategies for accomplishing long-term research goals.  In addition to this, I also gathered good advice from my PI, who is the most ultra-efficient human that I know.  Had I taken the following advice, I would’ve written this blog post a week ago.

The Get your research workflow on series, consists of two parts:

Part 1 covers General research workflow tips and Part 2. Setting up your technical workflow for people training in with the Decision Analytics crew (a.k.a. the best crew in town).

General research workflow tips

Disclosure: some of the contents of this list may be easier said than done.

First of all, a research workflow can be very personal and it is definitely tailored to each person’s requirements, personality and interests,  but here are some general categories that I think every researcher can relate to:

Taking notes, organizing and reflecting on ideas

I was gifted with the memory of a tuna fish, so I need to take notes for everything.  Unfortunately, taking notes with paper notebooks resulted in disaster for me in the past, it was very hard to  keep information organized, and occasionally my notebooks would either disappear or get  coffee stains all over.   Luckily, my office mate Dave, introduced me to the best application for note taking ever: Evernote,  this app allows you to keep your notes categorized, so you can keep the information that you need indexed and searchable across every single platform you have, that means that you can keep your notes synchronized  with your smartphone, laptop, desktop, etc, and have it accessible anywhere you go.

In addition, the Evernote web clipper tool allows you to save and categorize articles or webpages within your notes and make annotations on them.  Additionally, you can tag your notes, this is useful if you have notes that could fit into multiple notebooks.  You can also share  and invite people to edit notes and you can connect it with Google drive.  I would probably still flock to Google docs or Dropbox Paper for collaborative documents, but for personal notes, I prefer the Evernote interface.   There’s no limit on the amount of notebooks that you can have.  I’ve found this app very useful for brainstorming and developing ideas, I also use it to keep a research log to track my research progress.

Reading journal papers and reference management 

Keeping up with the scientific literature can be very challenging specially with the overwhelming amount of journal papers out there, but you can make things manageable for yourself if you find a reference manager that allows you to build a library that makes it easy to find, add, organize, read, prioritize and annotate papers  that you can later cite.  Additionally, you may want to set up smart notifications about new papers on topics that interest you, and get notified via e-mail. A couple of popular free and open source reference managers that allow you to do the previous are Zotero and Mendeley,  also Endnote basic, its free but you would need to upgrade to Endnote Desktop for unlimited storage.  These reference managers also allow you to export BibTex files for its integration with LaTeX.  You can check out the Grand Reference Management Comparison  table for all the reference management software available out there.

In addition to reference manager software,  a couple of popular subscription-based multidisciplinary databases are Web of Science and Scopus, they  differ from Google scholar,  by the fact that these are human curated databases, they are selected by scholarly and quality criteria by literature review committees, and they let you build connections between topics.

Finally,  I came across this article on How to keep up with the scientific literature, where a number of scientists were interviewed on the subject, and they all agree that it can be overwhelming but it is key to stay up to date with the literature as its the only way to contextualize your work and identify the knowledge gaps.  The article provided advice on how to prioritize what to read despite the overwhelming amount of scientific literature.

 

Time management and multi-tasking

This is my Achilles heel, and its a skill that requires a lot of practice and discipline.   Sometimes, progress in research can seem hard to accomplish, specially when you are involved in several projects, dealing with hard deadlines, taking many classes, TA-ing, or you’re simply busy being a socialité,  but there are several tricks to be on top of research while avoiding getting overwhelmed in a multi-tasking world.   Some, or most of this tips came from a time-manager master-mind:

Tip # 1.  Schedule everything and time everything

Schedule everything from hard, set-in-stone deadlines to casual meetings, that way you’ll know for sure how much time you’ll have to spare on different projects and you can block time for those projects in a weekly basis.  Keep track of the time that you spend on different projects/tasks.   There’s a very popular app among 3 economists, Julie’s brother and my friend Justyna called be focused that allows you to manage tasks and time them.  You can use it to keep track, for instance, of the time it takes you to read a paper,  some people use it to time the time it takes them to write a paper till completion, right now I’m tracking the amount of time its taking me to write this blogpost.  Timing everything will allow you to get better at predicting the time it will take you to accomplish something and reflect on how you can improve.   I always tend to underestimate my timings but this app is giving me a reality check.. very annoying.

Tip # 2. Different mindsets for different time slots

When your schedule is full of small time gaps, fill them doing tasks that involve less concentration, probably reading, answering e-mails, organizing yourself, and leave larger time slots for the most creative and challenging part of your work.

Also, a general recommendation of multi-tasking is don’t do it,  trying to do multiple things at once can hurt your productivity, instead,  block times to carry specific tasks, were you focus on that one thing, and then move on to the next. Remember to prioritize and tackle the most important thing first.

Tip #4. Visualize long-term research goals and work backwards

Picture what you want to accomplish in a year or in a semester, and work your way backwards, till you refine the accomplishments of each month, each week and each day to hit your long-term target.  Setup to-do lists for the week and for the day.

Tip #3. Set aside time for new skills that you want to acquire

Even if you set aside one or two hours a week devoted to that skill that you want to develop, it will pay off, you’ll have come a long way at the end of the year.  Challenge yourself and continue to develop new skills.

Tip #5. Don’t leave e-mails sitting in your inbox

There are a couple of strategies for this, you can either allocate time each day specifically for replying to e-mails or you can tackle each e-mail as it comes.   If it’s something that will require you more time, move it to a special list, if it’s a meeting, put it in your calendar, if it’s for reference, save it. No matter what your strategy is,  take action on an e-mail as soon as you read it.

Collaborative work

Some tools for collaborative work :

Overleaf– for writing LaTeX files collaboratively and visualizing the changes live, the platform has several journal templates and can track changes easily.

Github– platform for collaborative code development and management.

Slack – organize conversations with teams, and organize your collaborative workflow

A final recommendation is to have a consistent and intuitive organization of your research.  Document everything, and have reproducible code.  If you get hit by a bus and your colleagues are able to continue research were you left off in less than a week, then you’re in good shape, organization-wise.

I hope this helps, let me know if there are some crucial topics that I missed, I can always come back and edit.

Special thanks to all of my grad/postdoc friends that participated in the brief research workflow interview.

 

 

 

 

Advertisements

PDFExtract: Get a list of BibTeX references from a scholarly PDF

So you’ve found a review article with a great list of references that you’d like to include in your own paper/thesis/etc. You could look them up, one-by-one, on Google Scholar, and export the citation format of your choice. (You could also retype them all by hand, but let’s assume you’re savvy enough to use some kind of citation manager).

This is not a great use of your time.

Check out PDFExtract, a Ruby library written by folks at CrossRef. Its goal is to read text from a PDF, identify which sections are “references”, and return this list to the user. As of recently, it has the ability to return a list of references in BibTeX format after resolving the DOIs over the web. When the references in the PDF are identified correctly (about 80-90% of the time in my experience), you’ll now have all the references from that paper to do with as you please—to cite in LaTeX, or import to Zotero, etc.

How to use it

You will need a recent version of Ruby and its gem package manager. Search around for how to do this on your particular OS. As usual, this will be a lot easier on *nix, but I have it working in Cygwin too so don’t despair.

The latest version of PDFExtract (with BibTeX output) is not on the central gem repository yet, but for now you can build and install from source:

git clone https://github.com/CrossRef/pdfextract
cd pdfextract
gem build pdf-extract.gemspec
gem install pdf-extract-0.1.1.gem  # check version number

You should now have a program called pdf-extract available from the command line. Navigate to a directory with a PDF whose references you’d like to extract, and run the following:

pdf-extract extract-bib --resolved_references MyFile.pdf

It will take a minute to start running, and then it will begin listing the references it finds, along with their resolved DOIs from CrossRef’s web API, like so:

Found DOI from Text: 10.1080/00949659708811825 (Score: 5.590546)
Found DOI from Text: 10.1016/j.ress.2011.10.017 (Score: 4.6864557)
Found DOI from Text: 10.1016/j.ssci.2008.05.005 (Score: 0.5093678)
Found DOI from Text: 10.1201/9780203859759.ch246 (Score: 0.6951939)
Found DOI from Text: 10.1016/s0377-2217(96)00156-7 (Score: 5.2922735)
...

Note that not all resolutions are perfect. The score reflects the degree of confidence that the reference extracted from the PDF matches the indicated DOI. Scores below 1.0 will not be included in the final output, as they are probably incorrect.

Go make yourself a coffee while it searches for the rest of the DOIs. Eventually it will move to the second phase of this process, which is to use the DOI to obtain a full BibTeX entry from the web API. Again, this will not be done for DOIs with scores below 1.0.

Found BibTeX from DOI: 10.1080/00949659708811825
Found BibTeX from DOI: 10.1016/j.ress.2011.10.017
Found BibTeX from DOI: 10.1016/s0377-2217(96)00156-7
Found BibTeX from DOI: 10.1016/j.ress.2006.04.015
Found BibTeX from DOI: 10.1111/j.1539-6924.2010.01519.x
Found BibTeX from DOI: 10.1002/9780470316788.fmatter
...

Finish your coffee, check your email, and chuckle at the poor saps out there gathering their references by hand. When the program finishes, look for a file called MyFile.bib—the same filename as the original PDF—in the same directory from which you invoked the pdf-extract command. Open it up in a text editor or reference manager and take a look. Here’s the output from my example:

@article{Archer_1997,
doi = {10.1080/00949659708811825},
url = {http://dx.doi.org/10.1080/00949659708811825},
year = 1997,
month = {May},
publisher = {Informa UK Limited},
volume = {58},
number = {2},
pages = {99-120},
author = {G. E. B. Archer and A. Saltelli and I. M. Sobol},
title = {Sensitivity measures,anova-like Techniques and the use of bootstrap},
journal = {Journal of Statistical Computation and Simulation}
}
@article{Auder_2012,
doi = {10.1016/j.ress.2011.10.017},
url = {http://dx.doi.org/10.1016/j.ress.2011.10.017},
year = 2012,
month = {Nov},
publisher = {Elsevier BV},
volume = {107},
pages = {122-131},
author = {Benjamin Auder and Agn\`es De Crecy and Bertrand Iooss and Michel Marqu\`es},
title = {Screening and metamodeling of computer experiments with functional outputs. Application to thermal$\textendash$hydraulic computations},
journal = {Reliability Engineering \& System Safety}
}

... (and many more!)

A few extra-nice things: (1) it includes all DOIs, which journals sometimes require and are pesky to track down, and (2) it attempts to escape all BibTeX special characters by default. Merge this with your existing library, and be happy! (You could even use this to recover or develop a reference library from your own papers!)

Caveats

  • This works a lot better on journal articles than on longer documents like theses and textbooks. It assumes that the “Reference” section is toward the end, so a chapter-based or footnote-based reference format will cause it to choke.

  • It will not work on non-digital articles—for example, older articles which were scanned and uploaded to a journal archive.

  • Careful with character encoding when you are importing/exporting BibTeX with other applications (like Zotero), or even managing the file yourself. You may want to look for settings in all of your applications that allow you to change the character encoding to UTF-8.

  • Lots of perfectly good references do not have DOIs and thus will not be resolved by the web API. This includes many government agency reports, for example. In general do not expect to magically BibTeXify things other than journal articles and the occasional textbook.

  • Reading a PDF is tricky business—there are some journal formats that just won’t work. You will notice failures based on (1) consistently bad DOI resolution scores, (2) complete failure with an error message from the PDF reader (very hard to trace these), or (3) if your BibTeX file contains bizarre entries at the end. I’ve accidentally “extracted” references about ornithology, for example—just delete these and move on.

Zotero introduction (video)

I made a short Screenr video describing how to install and use Zotero for citation management in Word. Specifically, the Zotero standalone program plus the Chrome plugin. (Sorry about the breathing sounds in the microphone … I’ll work on that next time.)

http://screenr.com/aZ5s

EDIT: As far as I can tell, WordPress only allows embedding from certain video sites which do not include Screenr. So I guess you have to just open the link. Sorry about that.

Edit by Rachel: Here is a second Screenr video I created about Zotero that talks about importing/exporting citations as well as using the PDF search capability in Zotero. I recommend watching this video using a full screen so you can read the text.

Web-based Free Options for Bibliography Management and LaTeX Editing

I often find myself switching between computers with different operating systems, so I try to use free tools on the web as often as I can. The purpose of this post is to make you aware of two free options that I’ve had success with.

Bibliography Management – Zotero.org

Zotero is a free bibliography management resource that works as a plug-in for Mozilla Firefox along with plug-ins that work with Microsoft Office and Open Office. You edit your citations within Firefox, and insert them into documents using the Office plug-ins. You can import and export BibTeX into or out of Zotero and it is compatible with the RIS format, so you can move your citations back and forth between Zotero and Endnote. When you sign up for Zotero, it will ask you to create a user account. Your web account serves as an online backup for your citations, as well as a collaborative space. You can create a profile based on your area of expertise, so you can search for users with similar research interests as you and share your citations with them. (Perhaps this would be a good way to create a Pat Reed Group citation database?)

If this piqued your interest, I recommend checking out the quick start guide which shows some of the cool stuff you can do with Zotero.

My only warning is make sure you’re running the latest version of Firefox or you might have some compatibility issues with the plug-ins, especially with Word and Open Office. According to the website, there is a beta release for standalone Zotero as well as plug-ins for Safari and Chrome, but I haven’t used any of those options. It is also important to note that there is a 100MB limit for free Zotero service. I have about 2,000 citations total stored online and I’m only using about 1.0MB according to the website, so I imagine that the free service will be sufficient for everyone. It is $20/year for 1GB of Zotero storage.

LaTeX – Latexlab.org

Latexlab.org is a Google Docs based LaTeX editor. You sign in using your Google Docs account, so all your files are stored on your Google profile.  Those familiar with WinEdt or other LaTeX editing software should have no trouble using the LaTeX Lab interface.  You can upload images to your Google-docs account to insert them into your LaTeX document. I’d recommend using this if you’re on the go and need to put together a LaTeX document quickly.

I’ve never tried to compile anything complicated within LaTeX Lab, but if you need to put together an equation-heavy document quickly, this is a good alternative. I certainly wouldn’t try to put your thesis together using LaTeX Lab. You can compile different documents together into a project, but I’ve never used that functionality. Again, I would shy away from trying to put together complicated documents in LaTeX Lab.