By Giuliano Maciocci, Michael Aufreiter and Nokome Bentley
In September 2017 eLife announced the start of the Reproducible Document Stack (RDS) project, a collaboration between Substance, Stencila and eLife to support the development of an open-source technology stack aimed at enabling researchers to publish reproducible manuscripts through online journals. Reproducible manuscripts enrich the traditional narrative of a research article with code, data and interactive figures that can be executed in the browser, downloaded and explored, giving readers a direct insight into the methods, algorithms and key data behind the published research.
Today eLife, in collaboration with Substance, Stencila and Tim Errington, Director of Research ar the Center for Open Science, US, published its first reproducible article, based on one of Errington’s papers in the Reproducibility Project: Cancer Biology. This reproducible version of the article showcases some of what’s possible with the new RDS tools, and we invite researchers to explore the newly available opportunities to tell their story.
Getting started with our reproducible article
Our reproducible article demo showcases just some of the functionality that RDS tools will permit, and it’s intended as an easy starting point for exploring the technology. Here’s what you can do:
- Look out for the round blue ‘R script’-labelled buttons below Figure 1. Click it to reveal the code that generated that figure.
- Edit that code, and press shift–enter to re-run it.
- Observe the results in the figure in real time.
Future iterations will enable fully downloadable datasets and table data, more interactive figure types, and the ability to download a pre-packaged DAR source file to make it much easier to fully replicate the whole reproducible manuscript in your local environment.
You can see more of the RDS’ potential in action by taking a look at this demo of Stencila, one of the platforms behind the project.
How you can create a reproducible article
The simplest way to author a reproducible article is with Stencila Desktop, a free, user-friendly manuscript editor. This combines the traditional authoring workflows of Word and Excel with the ability to embed R and Python code blocks that can analyse table data and generate live interactive plots using the powerful Plotly.js library – features that will be very familiar to researchers who use Jupyter Notebooks. For non-coders, Stencila also provides a powerful formula language, Mini, that will be very familiar to those who prefer Excel for their data analysis.
Once an article is authored, Stencila Desktop saves it as a DAR file (for Document ARchive), essentially a portable archive containing the article’s text, media, code and data. The DAR file also stores the article’s metadata (such as author details, affiliations and references) in a publisher-friendly format based on JATS XML. Sharing a DAR archive with a colleague, or making it available for download on a journal, GitHub repository or an author’s own website, lets scientists share their research in full like never before.
But what makes reproducible articles special is that they can be viewed, edited and executed from within a web browser. Should an author allow it, visitors to a reproducible article published online can, for example, see how a statistical analysis was performed to generate a figure, and actually alter the code generating that analysis and see the results change in real time. Should they wish to take that exploration further, readers can also download the original DAR file and have access to the exact same source code, formulae and datasets the original author embedded in their work.
What about publishing?
Our goal is to create a scalable publishing workflow for reproducible articles, and feedback from the community will help us build this functionality into our submission, review and production systems. For the time being, eLife will consider publication of reproducible articles as companions to any article, new or existing, that has been approved for publication.
If you’re a researcher with a published paper on eLife, or if your paper has just been approved for publication and you’re interested in giving others a deeper look into your work, please email us at innovation [at] elifesciences [dot] org and we will be happy to discuss.
If you’re a publisher interested in reproducible articles, please get in touch. All of the tools and technologies behind the RDS project are open, and we welcome the opportunity to help other publishers offer reproducible articles to their own authors and readers.
This is only the beginning
Reproducible articles are just starting out, and while they are a powerful evolution of the concept of a traditional research manuscript, they still have some limitations with regards to what they can do to make research truly reproducible. As we continue to work on the project, we will explore solutions to let authors reference very large external datasets in their code, cope better with long computation times, and expand built-in coverage for more popular R and Python modules. We’re also working on streamlining the author and reader user experience, adding support for supplementary and multi-panel figures and improved cross-compatibility with existing tools such as Jupyter and RMarkdown.
Over the past few years Stencila has been researching, and experimenting on, alternative models and protocols for modular, composable, executable documents with an aim of creating more accessible interfaces for reproducible research. This has included developing ways to take the spreadsheet-like model of reactive, functional programming and apply it to articles containing open-source languages including R and Python. Stencila is now in the process of taking these prototypes and formalising them into a set of open schemas and specifications that others can develop against.
Along with finalising the DAR specification, Substance is working on generalising both code-editing and code-execution functions, in order to provide spreadsheet and word processor interfaces that can support formulae and source code in any language. While the code-editing interfaces are tied to Stencila runtime features, such as code analysis and remote execution, we want to draw a clear separation, defining an abstract interface for runtime environments. This will allow any DAR-compliant publication to be run on any suitable execution environment, ranging from a pure in-browser execution to container-backed runtimes to specialised environments which are tied to custom hardware. While we can’t serve every use case with one tool, we can maximise the technology’s value by opening up the interfaces. A next step will be the publication of a white paper document describing these new interfaces.
We will be sharing more about the project’s roadmap as our work continues.
The technical magic behind the reproducible article
The article that eLife published today builds on years of work by the community towards the long-standing vision of a fully reproducible, living research article. It represents a ‘bringing together’ of two of the primary frameworks currently used for reproducible research, RMarkdown and Jupyter. Under the hood it relies on many open-source applications and packages. It was made possible because the authors of the article went the extra mile in making their work reproducible, and registered their code and source data on the Open Science Framework (OSF).
We were able to download the author’s RMarkdown, R Scripts and data from the OSF. We used Stencila’s converter software to convert the article from the RMarkdown to JATS, the Journal Article Tag Suite – an XML format used by many publishers and upon which the DAR format is based. The converter uses Pandoc, a powerful and popular document conversion tool, and ensures that code cells in the RMarkdown are converted into the JATS extensions for reproducibility that DAR introduces. The result is an archivable article, in a format widely used by publishers, which also embeds the original source code. Stencila is continuing to work on improving its conversion software to ease the transition from existing formats to DAR.
Having source code embedded in the published article allows for transparency. But to achieve live execution from within the browser, we need a way to have R code executed remotely. To do this, Stencila’s interfaces connect to ‘execution contexts’ hosted within R or Python sessions. The execution contexts perform two steps. First, they analyse a chunk of code to determine the variables it depends upon, and any variable that is creates. This provides the automatic dependency analysis that enables Stencila’s reactive interfaces. Secondly, they actually ‘execute’ the chunk of code and return the result as data that can be rendered in document or re-used in another cell.
To reproduce this article, it is not only necessary to have the original source code embedded in the article; we also need to have an R session with the necessary R packages used by the code (e.g. ggplot2, cowplot) and the source data (CSV files which can be downloaded from OSF). Docker images provide a way of bundling source code, packages and data so that they can be re-executed. For this demo, rather than writing our own Docker images, we made use of the recently completed integration between Stencila and Binder – a project that was started by Daniel Nust and Min Ran-Kelley at the eLife Innovation Sprint in May 2018 (read more about this in Daniel’s blog post).
We created a folder for the eLife article in a GitHub repository which we could point Binder at. This folder contains two configuration files, runtime.txt and install.R, which tell Binder which version of R, and which R packages, to install into the Docker image. Thanks to Daniel and Min’s integration you can click on the “Launch Binder” button in the README of that folder to open an editable, interactive version of the article hosted by Binder.
To create a fully reproducible journal article, we wanted to go one step further and take a ‘progressive enhancement’ approach in which the reader starts with a static document but can choose to ‘turn on’ the executable parts. In this approach, the content of the article, including the reproducible figures, are pre-rendered, hosted by the publisher, and can be delivered to the reader just like any web page. However, when the user chooses to edit any of the embedded code, the document is connected to a Docker container hosted by Binder. To do this, we created a tool called ‘Bindilla’ which acts as a bridge between the article in the browser and the container. Bindilla asks Binder to create a container for the image and then proxies requests for code execution to the execution contexts running in that container.
If you'd like to know more about the RDS project, or are a research or developer wishing to contribute to the project, here are some key resources to get you started:
- Stencila Desktop installation instructions and code repository
- Texture repository, the open-source manuscript editor that Stencila is based on
- See the guide on the Stencila Converters and the relevant repository
- And join Stencila’s chat channel and Community Forum.
We welcome comments, questions and feedback. Please annotate publicly on the article or contact us at innovation [at] elifesciences [dot] org.
Do you have an idea or innovation to share? Send a short outline for a Labs blogpost to innovation [at] elifesciences [dot] org.
For the latest in innovation, eLife Labs and new open-source tools, sign up for our technology and innovation newsletter. You can also follow @eLifeInnovation on Twitter.