Manuscript Workflow with R Markdown and GIT

March 16, 2021

As part of my Masters of Public Health program I needed to complete a capstone. Working on a manuscript is a lot of back and forth: You need to make edits, fix your words and figures, and sometimes re-work entire sections. If you are like me, the thought of doing this process over a long period of time in Word makes me nauseous. Two main issues that cause this nausea for me are:

  1. I frequently forget to make a record of my writing and often overwrite work 

  2. Copying and pasting figures while arguing with Word’s formatting options is not fun

My solution? R Markdown and Git! R Markdown was an ideal choice as an R user and its ability to knit to Word docs (as well as PDFs and HTMLs) made me want to try writing my entire capstone in an R Markdown. I could easily generate my drafts and share them with collaborators for feedback. Moreover, since I already generated most of my tables in figures with R, embedding them in my R Markdown was an easy process. R Markdown is also great for incorporating LaTeX and bibliographies/citations. Using Git, I could keep track of any changes to both my code AND my writing: at the end of the day, Git is tracking text and anything written in an R Markdown is text. 

Below I will walk through the main steps of my workflow. 

  1.  Connect RStudio to Git and GitHubThe process for this step can be found in this great guide by Jenny Bryan’s Happy Git and GitHub for the useR. Not only will the git repository you create be helpful for your future self, it will also be helpful for scientific reproducibility. CAUTION: If you are working with sensitive data, make sure you are following guidelines for data safety. Consider keeping your data local but your code in the repository, as well as setting up two-factor authentication for your GitHub account.

  2. Modify your YAML to knit to a centralized locationGenerally speaking, we only want the main pieces of code (and sometimes data) in the repository and not our many outputs. I first set up an output folder, which was also embedded in a desktop cloud synced Box account (keeping even more records of your work!). Then, in my YAML, or “YAML Ain’t Markup Language”, which is by default at the top of your R Markdown document, I used the following syntax to designate the output location AND append the current date. 

The knit option looks like a lot, but it’s not too bad! We are calling a function and using the rmarkdown::render() function within it, and specifying the output file location and name by using the paste0() command to concatenate strings. Within those strings we specify the (1) location of our output folder, (2) the name of the file, and (3) the current date. Now every time you knit you’ll find your document in the designated location with a nice date appended. No more fussing around with saving a new copy of your draft!

  1. Citations with R MarkdownUsing your favorite citation manager, export a .bib file of your citations for the given project. You will now need to choose a CSL (Citation Style Language). You can pick from the Zotero Style Repository (in most cases other types of citation mangers will still work with the same CSLs). Now let’s add two lines to the YAML: (1) the location of our .csl file and (2) the option to make the citations double as a hyperlink to its spot in your document’s reference list.

Each citation you add to your .bib file will have a unique identifier. I found these by opening up the .bib file in a text editor. Below is an example of what you might see, with the ID inside the curly bracket after the @type of entry on the first line:

Once you have the unique identifier, you can add it to your R Markdown file by typing within a bracket [@citation]. If the citation ID is in your linked .bib file, when you knit you’ll see the citation generated! 

  1. Incorporating tables and figuresFor the most part, I completed the preprocessing and analysis in a separate R file and saved the relevant variables in .Rds files to save knitting time of my manuscript. Within my R Markdown file I can load these .Rds files, and with minimal data wrangling, create my tables and figures. These tables, figures, images, and so on can be added within the text portion of R Markdown or within R Chunks. With R Markdown, you have a lot of control with how to position everything, shrink or enlarge outputs, and so on! 

There's a lot of pieces to this workflow that may work for you -- or not -- hopefully at least one piece of this is helpful!