In R Markdown



Note: This book has been published by Chapman & Hall/CRC. The online version of this book is free to read here (thanks to Chapman & Hall/CRC), and licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

The document format “R Markdown” was first introduced in the knitr package (Xie 2015, 2021b) in early 2012. The idea was to embed code chunks (of R or other languages) in Markdown documents. In fact, knitr supported several authoring languages from the beginning in addition to Markdown, including LaTeX, HTML, AsciiDoc, reStructuredText, and Textile. Looking back over the five years, it seems to be fair to say that Markdown has become the most popular document format, which is what we expected. The simplicity of Markdown clearly stands out among these document formats.

In R Markdown, you will learn about R Markdown, a tool for integrating prose, code, and results. You can use R Markdown in notebook mode for analyst-to-analyst communication, and in report mode for analyst-to-decision-maker communication. Thanks to the power of R Markdown formats, you can even use the same document for both purposes.

  1. R markdownis a particular kind of markdown document. Authors should be cautious about following formatting advice for other types of markdown when working on R markdown. The distinguishing feature of R markdownis that it cooperates with R. Like LATEX with Sweave, code chunks can be included. When the document is compiled, the code is executed.
  2. The first official book authored by the core R Markdown developers that provides a comprehensive and accurate reference to the R Markdown ecosystem. With R Markdown, you can easily create.

However, the original version of Markdown invented by John Gruber was often found overly simple and not suitable to write highly technical documents. For example, there was no syntax for tables, footnotes, math expressions, or citations. Fortunately, John MacFarlane created a wonderful package named Pandoc (http://pandoc.org) to convert Markdown documents (and many other types of documents) to a large variety of output formats. More importantly, the Markdown syntax was significantly enriched. Now we can write more types of elements with Markdown while still enjoying its simplicity.

In a nutshell, R Markdown stands on the shoulders of knitr and Pandoc. The former executes the computer code embedded in Markdown, and converts R Markdown to Markdown. The latter renders Markdown to the output format you want (such as PDF, HTML, Word, and so on).

The rmarkdown package (Allaire, Xie, McPherson, et al. 2021) was first created in early 2014. During the past four years, it has steadily evolved into a relatively complete ecosystem for authoring documents, so it is a good time for us to provide a definitive guide to this ecosystem now. At this point, there are a large number of tasks that you could do with R Markdown:

  • Compile a single R Markdown document to a report in different formats, such as PDF, HTML, or Word.

  • Create notebooks in which you can directly run code chunks interactively.

  • Make slides for presentations (HTML5, LaTeX Beamer, or PowerPoint).

  • Produce dashboards with flexible, interactive, and attractive layouts.

  • Build interactive applications based on Shiny.

  • Write journal articles.

  • Author books of multiple chapters.

  • Generate websites and blogs.

There is a fundamental assumption underneath R Markdown that users should be aware of: we assume it suffices that only a limited number of features are supported in Markdown. By “features,” we mean the types of elements you can create with native Markdown. The limitation is a great feature, not a bug. R Markdown may not be the right format for you if you find these elements not enough for your writing: paragraphs, (section) headers, block quotations, code blocks, (numbered and unnumbered) lists, horizontal rules, tables, inline formatting (emphasis, strikeout, superscripts, subscripts, verbatim, and small caps text), LaTeX math expressions, equations, links, images, footnotes, citations, theorems, proofs, and examples. We believe this list of elements suffice for most technical and non-technical documents. It may not be impossible to support other types of elements in R Markdown, but you may start to lose the simplicity of Markdown if you wish to go that far.

Epictetus once said, “Wealth consists not in having great possessions, but in having few wants.” The spirit is also reflected in Markdown. If you can control your preoccupation with pursuing typesetting features, you should be much more efficient in writing the content and can become a prolific author. It is entirely possible to succeed with simplicity. Jung Jae-sung was a legendary badminton player with a remarkably simple playing style: he did not look like a talented player and was very short compared to other players, so most of the time you would just see him jump three feet off the ground and smash like thunder over and over again in the back court until he beats his opponents.

Please do not underestimate the customizability of R Markdown because of the simplicity of its syntax. In particular, Pandoc templates can be surprisingly powerful, as long as you understand the underlying technologies such as LaTeX and CSS, and are willing to invest time in the appearance of your output documents (reports, books, presentations, and/or websites). As one example, you may check out the PDF report of the 2017 Employer Health Benefits Survey. It looks fairly sophisticated, but was actually produced via bookdown(Xie 2016), which is an R Markdown extension. A custom LaTeX template and a lot of LaTeX tricks were used to generate this report. Not surprisingly, this very book that you are reading right now was also written in R Markdown, and its full source is publicly available in the GitHub repository https://github.com/rstudio/rmarkdown-book.

R Markdown documents are often portable in the sense that they can be compiled to multiple types of output formats. Again, this is mainly due to the simplified syntax of the authoring language, Markdown. The simpler the elements in your document are, the more likely that the document can be converted to different formats. Similarly, if you heavily tailor R Markdown to a specific output format (e.g., LaTeX), you are likely to lose the portability, because not all features in one format work in another format.

Last but not least, your computing results will be more likely to be reproducible if you use R Markdown (or other knitr-based source documents), compared to the manual cut-and-paste approach. This is because the results are dynamically generated from computer source code. If anything goes wrong or needs to be updated, you can simply fix or update the source code, compile the document again, and the results will be automatically updated. You can enjoy reproducibility and convenience at the same time.

  • Components of a .Rmd file
    • Text

If you have spent some time writing code in R, you probably have heard of generating dynamic reports incorporating R code, R outputs (results) and text or comments. In this article, I will explain how R Markdown works and give you the basic elements you need to get started easily in the production of these dynamic reports.

R Markdown allows to generate a report (most of the time in PDF, HTML, Word or as a beamer presentation) that is automatically generated from a file written within RStudio. The generated documents can serve as a neat record of your analysis that can be shared and published in a detailed and complete report. Even if you never expect to present the results to someone else, it can also be used as a personal notebook to look back so you can see what you did at that time. A R Markdown file has the extension .Rmd, while a R script file has the extension .R.

In R Markdown

The first main advantage of using R Markdown over R is that, in a R Markdown document, you can combine three important parts of any statistical analysis:

  • R code to show how the analyses have been done. For instance, the data and the functions you used. This allows readers to follow your code and to check that the analyses were correctly performed.
  • Results of the code, that is, the output of your analyses. For example, the output of your linear model, plots, or results of the hypothesis test you just coded. This allows readers to see the results of your analyses.
  • Text, comments and interpretations of the results. For instance, after computing the main descriptive statistics and plotting some graphs, you can interpret them in the context of your problem and highlight important findings. This enables readers to understand your results thanks to your interpretations and your comments, delivered as if you wrote a document explaining your work.

Another advantage of R Markdown is that the reports are dynamic and reproducible by anyone who has access to the .Rmd file (and the data if external data are used of course), making it perfectly suited to collaboration and dissemination of results. By dynamic, we mean that if your data changes, your results and your interpretations will change accordingly, without any work from your side.

The production of the reports is done in two stages:

  1. The .Rmd file which contains blocks of R code (called chunks) and text is provided to the {knitr} package which will execute the R code to get the output, and create a document in markdown (.md) format. This document then contains the R code, the results (or outputs), and the text.
  2. This .md file is then converted to the desired format (HTML, PDF or Word), by the markdown package based on pandoc (i.e., a document conversion tool).

To create a new R Markdown document (.Rmd), you first need to install and load the following packages:

Then click on File -> New File -> R Markdown or click on the small white sheet with a green cross in the top left corner and select R Markdown:

A window will open, choose the title and the author and click on OK. The default output format is HTML. It can be changed later to PDF or Word.

After you have clicked on OK, a new .Rmd file which serves as example has been created. We are going to use this file as starting point to our more complex and more personalized file.

To compile your R Markdown document into a HTML document, click on the Knit button located at the top:

Knit a R Markdown document

A preview of the HTML report appears and it is also saved in your working directory (see a reminder of what is a working directory).

Below the main components of a R Markdown document:

These components are detailed in the following sections.

Latex In R Markdown

YAML header

A .Rmd file starts with the YAML header, enclosed by two series of ---. By default, this includes the title, author, date and the format of the report. If you want to generate the report in a PDF document, replace output: html_document byoutput: pdf_document. These information from the YAML header will appear at the top of the generated report after you compile it (i.e., after knitting the document).

To add a table of contents to your documents, replace output: html_document by

Here are my usual settings regarding the format of a HTML document (remove everything after number_sections: true if you render the document in PDF, as PDF documents do not accept these options in the YAML header):

In addition to adding a table of contents, it sets its depth, adds a section numbering, the table of contents is floating when scrolling down the document, the code is hidden by default, the flatly theme is used and it adds the possibility to download the .Rmd document.

You can visualize your table of contents even before knitting the document, or go directly to a specific section by clicking on the small icon in the top right corner. Your table of contents will appear, click on a section to go to this section in your .Rmd document:

In addition to this enhanced table of contents, I usually set the following date in the YAML header:

Dynamic date in R Markdown

This piece of code allows to write the current date, without having to change it myself. This is very convenient for projects that last several weeks or months to always have an updated date at the top of the document.

Code chunks

Below the YAML header, there is a first code chunk which is used for the setup options of your entire document. It is best to leave it like this at the moment, we can change it later if needed.

Code chunks in R Markdown documents are used to write R code. Every time you want to include R code, you will need to enclose it with three backwards apostrophes. For instance, to compute the mean of the values 1, 7 and 11, we first need to insert a R code chunk by clicking on the Insert button located at the top and select R (see below a picture), then we need to write the corresponding code inside the code chunk we just inserted:

Example of code chunk

In the example file, you can see that the firs R code chunk (except the setup code chunk) includes the function summary() of the preloaded dataset cars: summary(cars). If you look at the HTML document that is generated from this example file, you will see that the summary measures are displayed just after the code chunk.

The next code chunk in this example file is plot(pressure), which will produce a plot. Try writing other R codes and knit (i.e., compile the document by clicking on the knit button) the document to see if your code is generated correctly.

If you already wrote code in a R script and want to reuse it in your R Markdown document, you can simply copy paste your code inside code chunks. Do not forget to always include your code inside code chunks or R will throw an error when compiling your document.

As you can see, there are two additional arguments in the code chunk of the plot compared to my code chunk of the mean presented above. The first argument following the letter r (without comma between the two) is used to set the name of the chunk. In general, do not bother with this, it is mainly used to refer to a specific code chunk. You can remove the name of the chunk, but do not remove the letter r between the {} as it tells R that the code that follows corresponds to R code (yes you read it well, that also means you can include code from another programming language, e.g., Python, SQL, etc.).

After the name of the chunk (after pressure in the example file), you can see that there is an additional argument: echo = FALSE. This argument, called an option, indicates that you want to hide the code, and display only the output of the code. Try removing it (or change it to echo = TRUE), and you will see that after knitting the document, both the code AND the output will appear, while only the results appeared previously.

You can specify if you want to hide or display the code alongside the output of that code for each code chunk separately, for instance if you want to show the code for some code chunks, but not for others. Alternatively, if you want to always hide/display the code together with the output for the entire document, you can specify it in the setup code chunk located just after the YAML header. The options passed to this setup code chunk will determine the options for all code chunks, except for those that have been specifically modified.

By default, the only setup option when you open a new R Markdown file is knitr::opts_chunk$set(echo = TRUE), meaning that by default, all outputs will be accompanied by its corresponding code. If you want to display only the results without the code for the whole document, replace it by knitr::opts_chunk$set(echo = FALSE). Two other options often passed to this setup code chunk are warning = FALSE and message = FALSE to prevent warnings and messages to be displayed on the report. If you want to pass several options, do not forget to separate them with a comma:

You can also choose to display the code, but not the result. For this, pass the option results = 'hide'. Alternatively, with the option include = FALSE, you can prevent code and results from appearing in the finished file while R still runs the code in order to use it at a later stage. If you want to prevent the code and the results to appear, and do not want R to run the code, use eval = FALSE. To edit the width and height of figures, use the options fig.width and fig.height. Another very interesting option is the tidy = 'styler' option which automatically reformats the R code shown in the output.

See all options and their description here or see the list of the default options by running str(knitr::opts_chunk$get()).

Tip: When writing in R Markdown, you will very often need to insert new R code chunks. To insert a new R code chunk more rapidly, press CTRL + ALT + I on Windows or command + option + I on Mac. If you are interested in such shortcuts making you more efficient, see other tips and tricks in R Markdown.

Note that a code chunk can be run without the need to compile the entire document, if you want to check the results of a specific code chunk for instance. In order to run a specific code chunk, select the code and run it as you would do in a R script (.R), by clicking on run or by pressing CTRL + Enter on Windows or command + Enter on Mac. Results of this code chunk will be displayed directly in the R Markdown document, just below the code chunk.

Text

Markdown

Text can be added everywhere outside code chunks. R Markdown documents use the Markdown syntax for the formatting of the text. In our example file just below the setup code chunk, some text has been inserted. To insert text, you simply write text without any enclosing. Try adding some sentences and knit the document to see how it appears in the HTML document.

Markdown syntax can be used to change the formatting of your text appearing in the output file, for example to format some text in italics, in bold, etc. Below some common formatting commands:

  • Title: # Title
  • Subtitle: ## Subtitle
  • Subsubtitle: ### Subsubtitle. These headings will automatically be included in the table of contents if you included one.
  • italics: *italics* or _italics_
  • bold: **bold** or __bold__
  • Link: [link](https://statsandr.com/) (do not forget https:// or http:// if it is an external URL)
  • Equations:

Enclose your equation (written in LaTeX) with one $ to have it in the text:

This is a well-know equation (A = pi*r^{2}).

Enclose your LaTeX equation with two $$ to have it centered on a new line:

Matrix In R Markdown

This is another well-known equation:

[E = mc^2]

Lists:

  • Unordered list, item 1: * Unordered list, item 1
  • Unordered list, item 2: * Unordered list, item 2
  1. Ordered list, item 1: 1. Ordered list, item 1
  2. Ordered list, item 2: 2. Ordered list, item 2

Math Symbols In R Markdown

Code inside text

Before going further, I would like to introduce an important feature of R Markdown. It is often the case that, when writing interpretations or detailing an analysis, we would like to refer to a result directly in our text. For instance, suppose we work on the iris dataset (preloaded in R). We may want to explain in words, that the mean of the length of the petal is a certain value, while the median is another value.

Without R Markdown, the user would need to compute the mean and median, and then report it manually. Thanks to R Markdown, it is possible to report these two descriptive statistics directly in the text, without manually encoding it. Even better, if the dataset happens to change because we removed some observations, the mean and median reported in the generated document will change automatically, without any change in the text from our side.

We can insert results directly in the interpretations (i.e., in the text) by placing a backward apostrophe, the letter r, a space, the code, and then close it with another backward apostrophe:

Here is an illustration with the mean and median of the length of the sepal for the iris dataset integrated in a sentence:

Example of inline code and text

This combination of text and code will give the following output in the generated report:

The mean of the length of the sepal is 5.8433333 and the standard deviation is 0.8280661.

This technique, referred as inline code, allows you to insert results directly into the text of a R Markdown document. And as mentioned, if the dataset changes, the results incorporated inside the text (the mean and standard deviation in our case) will automatically be adjusted to the new dataset, exactly like the output of a code chunk is dynamically updated if the dataset changes.

This technique of inline code and the fact that it is possible to combine code, outputs of code, and text to comment the outputs makes R Markdown my favorite tool when it comes to statistical analyses. Since I discovered the power of R Markdown (and I am still learning as it has a huge amount of possibilities and features), I almost never write R code in scripts anymore. Every R code I write is supplemented by text and inline code in a R Markdown document, resulting in a professional and complete final document ready to be shared, published, or stored for future usage. If you are unfamiliar to this type of document, I invite you to learn more about it and to try it with your next analysis, you will most likely not go back to R scripts anymore.

Highlight text like it is code

Alternatively to the inline code technique, you may want to make some text appears as if it is a piece of code in the generated report, without actually running it.

For this, surround your text with back ticks (the same backward apostrophe used for inline code) without the letter r. Writing the following:

will produce this in the generated report:

For example, in this sentence I would like to highlight the variable name Species from the dataframe iris as if it is a piece of code.

The word “Species” and “iris” appear and are highlighted as if it is a piece of code.

Images

In addition to code, results and text, you can also insert images in your final document. To insert an image, place it in your current working directory, and outside a code chunk write:

![](path_to_your_image.jpg)

Note that the the file/url path is NOT quoted. To add an alt text to your image, add it between the square brackets []:

![alt text here](path_to_your_image.jpg)

Tables

There are two options to insert tables in R Markdown documents:

  1. the kable() function from the {knitr} package
  2. the pander() function from the {pander} package

Latex In R Markdown

Here are an example of a table without any formatting, and the same code with the two functions applied on the iris dataset:

The advantage of pander() over kable() is that it can be used for many more different outputs than table. Try on your own code, with results of a linear regression or a simple vector for example.

For more advanced users, R Markdown files can also be used to create Shiny apps, websites (this website is built thanks to R Markdown and the {blogdown} package), to write scientific papers based on templates from several international journals (with the {rticles} package), or even to write books (with the {bookdown} package).

To continue learning about R Markdown, see two complete cheat sheets from the R Studio team here and here, and a more complete guide here written by Yihui Xie, J. J. Allaire and Garrett Grolemund.

Thanks for reading. I hope this article convinced you to use R Markdown for your future projects. See more tips and tricks in R Markdown to increase even further your efficiency in R Markdown.

As always, if you have a question or a suggestion related to the topic covered in this article, please add it as a comment so other readers can benefit from the discussion.


Related articles


Liked this post?

Get updates

Equations In R Markdown

every time a new article is published.
No spam and unsubscribe anytime.Share on: